Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-28 Thread أحمد المحمودي
On Tue, Jun 29, 2010 at 01:24:12AM +0200, Marc Dequènes (Duck) wrote:
> This new strategy works well too. It can be quite resource
> consuming, so i limited to 3+ pattern length, and the result comes
> in a reasonable amount of time.
---end quoted text---

How do you limit to 3+ pattern length ? Is there a setting in conf file 
? Btw, I didn't find any documentation regarding the stratall nor substr 
strategies in the .info documentation

-- 
 ‎أحمد المحمودي (Ahmed El-Mahmoudy)
  Digital design engineer
 GPG KeyID: 0xEDDDA1B7
 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8  B176 BC19 6A94 EDDD A1B7



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-28 Thread Marc Dequènes (Duck)

Coin,

Quoting أحمد المحمودي :


On Sun, Jun 27, 2010 at 10:27:08PM +0300, Sergey Poznyakoff wrote:

It took me a little longer than expected. I have moved the `all'
strategy into a loadable module, so that it is not enabled unless
the admin explicitly loads it in the configuration file.  Please,
try this tarball:


I tested using the preview package, and this new version works pretty well.

The conditional loading is working too.


Apart from this changes, this version also implements the
`substr' strategy, which matches a substring anywhere in the
headword.  This, too, is implemented as a module.


This new strategy works well too. It can be quite resource consuming,  
so i limited to 3+ pattern length, and the result comes in a  
reasonable amount of time.


You can do some testing with my server runing this version if you need.


For the splitting question, i do support Sergey's view: split only if  
it is big and/or request non-trivial dependencies (medium or large  
libraries other than the ones installed by default for this type of  
installation (desktop/server/...)).



Thanks both of you for you work :-).

--
Marc Dequènes (Duck)


pgp8HNELDnG8E.pgp
Description: PGP Digital Signature


Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-28 Thread Sergey Poznyakoff
=?utf-8?B?2KPYrdmF2K8g2KfZhNmF2K3ZhdmI2K/Zig==?=  ha 
escrit:

> The packages of the other modules do depend on the dicod package. But I 
> separated them, as it may occur that a user may not want to install them 
> (especially that some of them pulls some dependencies).

I certainly agree with this sort of policy for modules requiring extra
dependencies. But (1) neither of the modules in question requires
anything, and (2) the overall size of the two modules together is 6K,
which is ridiculous for a separate module.

Regards,
Sergey



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-28 Thread أحمد المحمودي
On Mon, Jun 28, 2010 at 11:22:12AM +0300, Sergey Poznyakoff wrote:
> In my opinion they defininitely *do not* qualify for separate packages,
> just as the rest of modules for dicod.  They are part of the server and
> should be distributed with it,
---end quoted text---

The packages of the other modules do depend on the dicod package. But I 
separated them, as it may occur that a user may not want to install them 
(especially that some of them pulls some dependencies).

-- 
 ‎أحمد المحمودي (Ahmed El-Mahmoudy)
  Digital design engineer
 GPG KeyID: 0xEDDDA1B7
 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8  B176 BC19 6A94 EDDD A1B7



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-28 Thread Sergey Poznyakoff
ãí¥ï ç¤¥í¥¨ïª  ha escrit:

> Especially I need your opinion regarding install substr & 
> stratall modules in dicod package, instead of creating separate packages 
> for them.

In my opinion they defininitely *do not* qualify for separate packages,
just as the rest of modules for dicod.  They are part of the server and
should be distributed with it,

Regards,
Sergey



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-28 Thread أحمد المحمودي
On Sun, Jun 27, 2010 at 10:27:08PM +0300, Sergey Poznyakoff wrote:
> It took me a little longer than expected. I have moved the `all'
> strategy into a loadable module, so that it is not enabled unless
> the admin explicitly loads it in the configuration file.  Please,
> try this tarball:
> 
>   ftp://download.gnu.org.ua/pub/alpha/dico/dico-2.0.91.tar.gz
> 
> Apart from this changes, this version also implements the
> `substr' strategy, which matches a substring anywhere in the
> headword.  This, too, is implemented as a module.
---end quoted text---

Thanks, I prepared the debian package in git [1], Marc can you please 
review it ? Especially I need your opinion regarding install substr & 
stratall modules in dicod package, instead of creating separate packages 
for them.

[1] git://git.debian.org/git/collab-maint/dico.git

-- 
 ‎أحمد المحمودي (Ahmed El-Mahmoudy)
  Digital design engineer
 GPG KeyID: 0xEDDDA1B7
 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8  B176 BC19 6A94 EDDD A1B7



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-06-27 Thread Sergey Poznyakoff
Hi Marc,

Some time ago I wrote:

> > On the other hand, I agree that a mechanism for disabling arbitrary
> > strategies is needed (both on database level and globally). I will
> > provide a solution for this latter.

It took me a little longer than expected. I have moved the `all'
strategy into a loadable module, so that it is not enabled unless
the admin explicitly loads it in the configuration file.  Please,
try this tarball:

  ftp://download.gnu.org.ua/pub/alpha/dico/dico-2.0.91.tar.gz

Apart from this changes, this version also implements the
`substr' strategy, which matches a substring anywhere in the
headword.  This, too, is implemented as a module.

As usual, your feedback is welcome.

Regards,
Sergey



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-24 Thread أحمد المحمودي
On Mon, May 24, 2010 at 02:53:27PM +0200, Marc Dequènes (Duck) wrote:
> Quoting Sergey Poznyakoff :
> >Please apply the attached patch. It will fix the response procedure
> >for all types of the queries (both "match" and "define") in dictorg
> >databases. This will make single-database "all" matches feasible in
> >terms of time usage.
> 
> أحمد المحمودي is testing it soon, and i should have a look tonight
> when back from office.
---end quoted text---

I tested with:

$ time dico --host=localhost --noauth -d gcide -s all "sproutchploufpiou" 

Result before applying the patch:
real14m31.965s
user0m1.912s
sys 0m6.128s


Result after applying the patch:
real0m3.486s
user0m1.340s
sys 0m1.968s


Quite impressive ! Thanks !

-- 
 ‎أحمد المحمودي (Ahmed El-Mahmoudy)
  Digital design engineer
 GPG KeyID: 0xEDDDA1B7
 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8  B176 BC19 6A94 EDDD A1B7



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-24 Thread Marc Dequènes (Duck)

Quoting Sergey Poznyakoff :


I see your point, but what you report is another bug, not related to
that particular search strategy. It becomes prominent when "all" is
used (because of a huge number of elements involved), but it affects
other searches as well.


Yes, sure. But besides protecting my server from evil, i thought  
people would be tempted to experiment different strategies, and by  
mistake use this one (which is of no use for real search). That would  
cause a real mess on this machine, so i cannot advertise this service  
as is.



Please apply the attached patch. It will fix the response procedure
for all types of the queries (both "match" and "define") in dictorg
databases. This will make single-database "all" matches feasible in
terms of time usage.


أحمد المحمودي is testing it soon, and i should have a look tonight  
when back from office.



On the other hand, I agree that a mechanism for disabling arbitrary
strategies is needed (both on database level and globally). I will
provide a solution for this latter.


Thanks a lot :-).

--
Marc Dequènes (Duck)


pgprQhV3ncKjh.pgp
Description: PGP Digital Signature


Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-24 Thread Sergey Poznyakoff
Marc Dequènes (Duck)  ha escrit:

> I've already used this setting. It works well if you search across
> *all* databases, but if you specify one, the search goes on and
> displays the whole database content after several minutes hanging the
> machine (please don't experiment with my server).

I see your point, but what you report is another bug, not related to
that particular search strategy. It becomes prominent when "all" is
used (because of a huge number of elements involved), but it affects
other searches as well.

Please apply the attached patch. It will fix the response procedure
for all types of the queries (both "match" and "define") in dictorg
databases. This will make single-database "all" matches feasible in
terms of time usage. 

On the other hand, I agree that a mechanism for disabling arbitrary
strategies is needed (both on database level and globally). I will
provide a solution for this latter.

Regards,
Sergey

>From 9b10f09671e8498ee9862bd9190fc1fb324e35e7 Mon Sep 17 00:00:00 2001
From: Sergey Poznyakoff 
Date: Mon, 24 May 2010 14:26:00 +0300
Subject: [PATCH] Speed up output procedure in dictorg.

Provide a general-purpose mechanism to address iterators by item
number in O(|n-pos|) time.

* include/dico/list.h (dico_iterator_prev)
(dico_iterator_item, dico_iterator_position): New prototypes.
* lib/list.c (list_entry) : New member.
(iterator) : New member.
(dico_iterator_position): New function.
(_iterator_increase_pos): New static.
(dico_iterator_first): Initialize pos to 0.
(dico_iterator_next): Increase pos.
(dico_iterator_prev,dico_iterator_item): New function.
(_dico_list_append): Initialize ep->prev.
(_dico_list_prepend): Initialize ep->prev. Call
_iterator_increase_pos to tell iterators to update their recorded positions.
(_dico_list_remove): Rewrite removal code using next & prev pointers.
(_dico_list_insert_sorted): Update next & prev pointers.
Call _iterator_increase_pos.
* modules/dict.org/dictorg.h (result) : New member.
* modules/dict.org/dictorg.c (common_match)
(suffix_match, _match_all): Initialize itr.
(mod_output_result): Use iterator to avoid rescanning the list
on each call.
(mod_free_result): Destroy the iterator.

* lib/utf8.c (utf8_strcasecmp, utf8_strncasecmp): Break the loop if
alen or blen is zero. This means that one of the operands is not
utf8, but try to return meaningful value anyway.
---
 include/dico/list.h|3 +
 lib/list.c |  126 +++-
 lib/utf8.c |8 +++
 modules/dict.org/dictorg.c |   14 +-
 modules/dict.org/dictorg.h |1 +
 5 files changed, 126 insertions(+), 26 deletions(-)

diff --git a/include/dico/list.h b/include/dico/list.h
index 8f5419d..4ca5d40 100644
--- a/include/dico/list.h
+++ b/include/dico/list.h
@@ -64,6 +64,9 @@ dico_iterator_t dico_list_iterator(dico_list_t list);
 void dico_iterator_destroy(dico_iterator_t *ip);
 void *dico_iterator_first(dico_iterator_t ip);
 void *dico_iterator_next(dico_iterator_t ip);
+void *dico_iterator_prev(dico_iterator_t ip);
+void *dico_iterator_item(dico_iterator_t ip, size_t n);
+size_t dico_iterator_position(dico_iterator_t ip);
 
 int dico_iterator_remove_current(dico_iterator_t ip, void **pptr);
 void dico_iterator_set_data(dico_iterator_t ip, void *data);
diff --git a/lib/list.c b/lib/list.c
index 309369c..9d1da6b 100644
--- a/lib/list.c
+++ b/lib/list.c
@@ -23,7 +23,7 @@
 #include 
 
 struct list_entry {
-struct list_entry *next;
+struct list_entry *next, *prev;
 void *data;
 };
 
@@ -42,6 +42,7 @@ struct iterator {
 dico_list_t list;
 struct list_entry *cur;
 int advanced;
+size_t pos;
 };
 
 static int
@@ -120,13 +121,22 @@ dico_iterator_current(dico_iterator_t ip)
 return ip->cur ? ip->cur->data : NULL;
 }
 
+size_t
+dico_iterator_position(dico_iterator_t ip)
+{
+if (!ip)
+   return 0;
+return ip->pos;
+}
+
 static void
 dico_iterator_attach(dico_iterator_t itr, dico_list_t list)
 {
 itr->list = list;
-itr->cur = NULL;
+itr->cur = list->head;
 itr->next = list->itr;
 itr->advanced = 0;
+itr->pos = 0;
 list->itr = itr;   
 }
 
@@ -178,6 +188,26 @@ dico_iterator_destroy(dico_iterator_t *ip)
 *ip = NULL;
 }

+static void
+_iterator_increase_pos(dico_iterator_t ip, size_t after)
+{
+for (; ip; ip = ip->next) {
+   if (ip->pos > after)
+   ip->pos++;
+}
+}
+
+static void
+_iterator_advance(dico_iterator_t ip, struct list_entry *e)
+{
+for (; ip; ip = ip->next) {
+   if (ip->cur == e) {
+   ip->cur = e->next;
+   ip->advanced++;
+   }
+}
+}
+
 void *
 dico_iterator_first(dico_iterator_t ip)
 {
@@ -185,6 +215,7 @@ dico_iterator_first(dico_iterator_t ip)
return NULL;
 ip->cur = ip->list->head;
 ip->advanced = 0;
+ip->pos = 0;
 return dico_iterator_current(ip);
 }
 
@@ -193,12 +224,53 @@ dico_iterator_next(dico_iterator_t ip)
 {
 if (!ip || !ip->cur)

Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-23 Thread Marc Dequènes (Duck)

Quoting Sergey Poznyakoff :


I described this in my previous post. Add the following to your config
file:

strategy all {
   deny-all yes;
}

This disables it for all searches.


I've already used this setting. It works well if you search across  
*all* databases, but if you specify one, the search goes on and  
displays the whole database content after several minutes hanging the  
machine (please don't experiment with my server).


--
Marc Dequènes (Duck)


pgpFRHtpQG9HP.pgp
Description: PGP Digital Signature


Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-23 Thread أحمد المحمودي
On Sun, May 23, 2010 at 10:51:45PM +0300, Sergey Poznyakoff wrote:
> Marc Dequ?nes (Duck)  ha escrit:
> 
> > How can we limit in non-default searches ?
> 
> I described this in my previous post. Add the following to your config
> file:
> 
> strategy all {
>deny-all yes;
> }
> 
> This disables it for all searches.
---end quoted text---

According to the info page, the above would disable the "all" strategy 
when the database argument is '*' or '!'. But that will not disable the 
"all" strategy when a database argument is something like "gcide". As 
far as I understand, that the problem is that even for the latter case, 
it would cause a 100% CPU load.

-- 
 ‎أحمد المحمودي (Ahmed El-Mahmoudy)
  Digital design engineer
 GPG KeyID: 0xEDDDA1B7
 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8  B176 BC19 6A94 EDDD A1B7



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-23 Thread Sergey Poznyakoff
Marc Dequènes (Duck)  ha escrit:

> How can we limit in non-default searches ?

I described this in my previous post. Add the following to your config
file:

strategy all {
   deny-all yes;
}

This disables it for all searches.

Regards,
Sergey
   



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-23 Thread Marc Dequènes (Duck)

Quoting Sergey Poznyakoff :


No, there is:

strategy all {
  deny-all yes;
}


But it only affects default searches, while certain databases are  
quite big (gcide is 13MB), i tested with a 1MB database to see:
# time dico --host=dico.duckcorp.org --noauth -d fd-cro-eng -s all  
"sproutchploufpiou"

...
real3m46.654s
user0m0.576s
sys 0m2.296s

I don't think this is acceptable.

How can we limit in non-default searches ? Or even have it disappear  
in the list of strategies completely, so that the listing in the CLI  
or web interface only present the ones authorized at least in certain  
conditions ? I can't see that in the manual.


--
Marc Dequènes (Duck)


pgp6dqFOLf9xq.pgp
Description: PGP Digital Signature


Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !

2010-05-23 Thread Sergey Poznyakoff
ãí¥ï ç¤¥í¥¨ïª  ha escrit:

> > The "Match everything (experimental)" strategy is not suited for
> > production servers, as its name says, and consume all CPU, leading
> > to an easy DOS attack method.

Any implementation of "match everything" strategy is potentially harmful
and certainly not suited for production servers. I thought it was obvious.

> There is no way to deactivate it,

No, there is:

strategy all {
  deny-all yes;
}

See the manual, section 3.3.12 "Strategies and Default Searches" [1]

Regards,
Sergey

[1] http://dico.prog.gnu.org.ua/manual/html_section/Configuration.html#SEC29



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org