Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
On Tue, Jun 29, 2010 at 01:24:12AM +0200, Marc Dequènes (Duck) wrote: > This new strategy works well too. It can be quite resource > consuming, so i limited to 3+ pattern length, and the result comes > in a reasonable amount of time. ---end quoted text--- How do you limit to 3+ pattern length ? Is there a setting in conf file ? Btw, I didn't find any documentation regarding the stratall nor substr strategies in the .info documentation -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Coin, Quoting أحمد المحمودي : On Sun, Jun 27, 2010 at 10:27:08PM +0300, Sergey Poznyakoff wrote: It took me a little longer than expected. I have moved the `all' strategy into a loadable module, so that it is not enabled unless the admin explicitly loads it in the configuration file. Please, try this tarball: I tested using the preview package, and this new version works pretty well. The conditional loading is working too. Apart from this changes, this version also implements the `substr' strategy, which matches a substring anywhere in the headword. This, too, is implemented as a module. This new strategy works well too. It can be quite resource consuming, so i limited to 3+ pattern length, and the result comes in a reasonable amount of time. You can do some testing with my server runing this version if you need. For the splitting question, i do support Sergey's view: split only if it is big and/or request non-trivial dependencies (medium or large libraries other than the ones installed by default for this type of installation (desktop/server/...)). Thanks both of you for you work :-). -- Marc Dequènes (Duck) pgp8HNELDnG8E.pgp Description: PGP Digital Signature
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
=?utf-8?B?2KPYrdmF2K8g2KfZhNmF2K3ZhdmI2K/Zig==?= ha escrit: > The packages of the other modules do depend on the dicod package. But I > separated them, as it may occur that a user may not want to install them > (especially that some of them pulls some dependencies). I certainly agree with this sort of policy for modules requiring extra dependencies. But (1) neither of the modules in question requires anything, and (2) the overall size of the two modules together is 6K, which is ridiculous for a separate module. Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
On Mon, Jun 28, 2010 at 11:22:12AM +0300, Sergey Poznyakoff wrote: > In my opinion they defininitely *do not* qualify for separate packages, > just as the rest of modules for dicod. They are part of the server and > should be distributed with it, ---end quoted text--- The packages of the other modules do depend on the dicod package. But I separated them, as it may occur that a user may not want to install them (especially that some of them pulls some dependencies). -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
ãí¥ï ç¤¥í¥¨ïª ha escrit: > Especially I need your opinion regarding install substr & > stratall modules in dicod package, instead of creating separate packages > for them. In my opinion they defininitely *do not* qualify for separate packages, just as the rest of modules for dicod. They are part of the server and should be distributed with it, Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
On Sun, Jun 27, 2010 at 10:27:08PM +0300, Sergey Poznyakoff wrote: > It took me a little longer than expected. I have moved the `all' > strategy into a loadable module, so that it is not enabled unless > the admin explicitly loads it in the configuration file. Please, > try this tarball: > > ftp://download.gnu.org.ua/pub/alpha/dico/dico-2.0.91.tar.gz > > Apart from this changes, this version also implements the > `substr' strategy, which matches a substring anywhere in the > headword. This, too, is implemented as a module. ---end quoted text--- Thanks, I prepared the debian package in git [1], Marc can you please review it ? Especially I need your opinion regarding install substr & stratall modules in dicod package, instead of creating separate packages for them. [1] git://git.debian.org/git/collab-maint/dico.git -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Hi Marc, Some time ago I wrote: > > On the other hand, I agree that a mechanism for disabling arbitrary > > strategies is needed (both on database level and globally). I will > > provide a solution for this latter. It took me a little longer than expected. I have moved the `all' strategy into a loadable module, so that it is not enabled unless the admin explicitly loads it in the configuration file. Please, try this tarball: ftp://download.gnu.org.ua/pub/alpha/dico/dico-2.0.91.tar.gz Apart from this changes, this version also implements the `substr' strategy, which matches a substring anywhere in the headword. This, too, is implemented as a module. As usual, your feedback is welcome. Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
On Mon, May 24, 2010 at 02:53:27PM +0200, Marc Dequènes (Duck) wrote: > Quoting Sergey Poznyakoff : > >Please apply the attached patch. It will fix the response procedure > >for all types of the queries (both "match" and "define") in dictorg > >databases. This will make single-database "all" matches feasible in > >terms of time usage. > > أحمد المحمودي is testing it soon, and i should have a look tonight > when back from office. ---end quoted text--- I tested with: $ time dico --host=localhost --noauth -d gcide -s all "sproutchploufpiou" Result before applying the patch: real14m31.965s user0m1.912s sys 0m6.128s Result after applying the patch: real0m3.486s user0m1.340s sys 0m1.968s Quite impressive ! Thanks ! -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Quoting Sergey Poznyakoff : I see your point, but what you report is another bug, not related to that particular search strategy. It becomes prominent when "all" is used (because of a huge number of elements involved), but it affects other searches as well. Yes, sure. But besides protecting my server from evil, i thought people would be tempted to experiment different strategies, and by mistake use this one (which is of no use for real search). That would cause a real mess on this machine, so i cannot advertise this service as is. Please apply the attached patch. It will fix the response procedure for all types of the queries (both "match" and "define") in dictorg databases. This will make single-database "all" matches feasible in terms of time usage. أحمد المحمودي is testing it soon, and i should have a look tonight when back from office. On the other hand, I agree that a mechanism for disabling arbitrary strategies is needed (both on database level and globally). I will provide a solution for this latter. Thanks a lot :-). -- Marc Dequènes (Duck) pgprQhV3ncKjh.pgp Description: PGP Digital Signature
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Marc Dequènes (Duck) ha escrit: > I've already used this setting. It works well if you search across > *all* databases, but if you specify one, the search goes on and > displays the whole database content after several minutes hanging the > machine (please don't experiment with my server). I see your point, but what you report is another bug, not related to that particular search strategy. It becomes prominent when "all" is used (because of a huge number of elements involved), but it affects other searches as well. Please apply the attached patch. It will fix the response procedure for all types of the queries (both "match" and "define") in dictorg databases. This will make single-database "all" matches feasible in terms of time usage. On the other hand, I agree that a mechanism for disabling arbitrary strategies is needed (both on database level and globally). I will provide a solution for this latter. Regards, Sergey >From 9b10f09671e8498ee9862bd9190fc1fb324e35e7 Mon Sep 17 00:00:00 2001 From: Sergey Poznyakoff Date: Mon, 24 May 2010 14:26:00 +0300 Subject: [PATCH] Speed up output procedure in dictorg. Provide a general-purpose mechanism to address iterators by item number in O(|n-pos|) time. * include/dico/list.h (dico_iterator_prev) (dico_iterator_item, dico_iterator_position): New prototypes. * lib/list.c (list_entry) : New member. (iterator) : New member. (dico_iterator_position): New function. (_iterator_increase_pos): New static. (dico_iterator_first): Initialize pos to 0. (dico_iterator_next): Increase pos. (dico_iterator_prev,dico_iterator_item): New function. (_dico_list_append): Initialize ep->prev. (_dico_list_prepend): Initialize ep->prev. Call _iterator_increase_pos to tell iterators to update their recorded positions. (_dico_list_remove): Rewrite removal code using next & prev pointers. (_dico_list_insert_sorted): Update next & prev pointers. Call _iterator_increase_pos. * modules/dict.org/dictorg.h (result) : New member. * modules/dict.org/dictorg.c (common_match) (suffix_match, _match_all): Initialize itr. (mod_output_result): Use iterator to avoid rescanning the list on each call. (mod_free_result): Destroy the iterator. * lib/utf8.c (utf8_strcasecmp, utf8_strncasecmp): Break the loop if alen or blen is zero. This means that one of the operands is not utf8, but try to return meaningful value anyway. --- include/dico/list.h|3 + lib/list.c | 126 +++- lib/utf8.c |8 +++ modules/dict.org/dictorg.c | 14 +- modules/dict.org/dictorg.h |1 + 5 files changed, 126 insertions(+), 26 deletions(-) diff --git a/include/dico/list.h b/include/dico/list.h index 8f5419d..4ca5d40 100644 --- a/include/dico/list.h +++ b/include/dico/list.h @@ -64,6 +64,9 @@ dico_iterator_t dico_list_iterator(dico_list_t list); void dico_iterator_destroy(dico_iterator_t *ip); void *dico_iterator_first(dico_iterator_t ip); void *dico_iterator_next(dico_iterator_t ip); +void *dico_iterator_prev(dico_iterator_t ip); +void *dico_iterator_item(dico_iterator_t ip, size_t n); +size_t dico_iterator_position(dico_iterator_t ip); int dico_iterator_remove_current(dico_iterator_t ip, void **pptr); void dico_iterator_set_data(dico_iterator_t ip, void *data); diff --git a/lib/list.c b/lib/list.c index 309369c..9d1da6b 100644 --- a/lib/list.c +++ b/lib/list.c @@ -23,7 +23,7 @@ #include struct list_entry { -struct list_entry *next; +struct list_entry *next, *prev; void *data; }; @@ -42,6 +42,7 @@ struct iterator { dico_list_t list; struct list_entry *cur; int advanced; +size_t pos; }; static int @@ -120,13 +121,22 @@ dico_iterator_current(dico_iterator_t ip) return ip->cur ? ip->cur->data : NULL; } +size_t +dico_iterator_position(dico_iterator_t ip) +{ +if (!ip) + return 0; +return ip->pos; +} + static void dico_iterator_attach(dico_iterator_t itr, dico_list_t list) { itr->list = list; -itr->cur = NULL; +itr->cur = list->head; itr->next = list->itr; itr->advanced = 0; +itr->pos = 0; list->itr = itr; } @@ -178,6 +188,26 @@ dico_iterator_destroy(dico_iterator_t *ip) *ip = NULL; } +static void +_iterator_increase_pos(dico_iterator_t ip, size_t after) +{ +for (; ip; ip = ip->next) { + if (ip->pos > after) + ip->pos++; +} +} + +static void +_iterator_advance(dico_iterator_t ip, struct list_entry *e) +{ +for (; ip; ip = ip->next) { + if (ip->cur == e) { + ip->cur = e->next; + ip->advanced++; + } +} +} + void * dico_iterator_first(dico_iterator_t ip) { @@ -185,6 +215,7 @@ dico_iterator_first(dico_iterator_t ip) return NULL; ip->cur = ip->list->head; ip->advanced = 0; +ip->pos = 0; return dico_iterator_current(ip); } @@ -193,12 +224,53 @@ dico_iterator_next(dico_iterator_t ip) { if (!ip || !ip->cur)
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Quoting Sergey Poznyakoff : I described this in my previous post. Add the following to your config file: strategy all { deny-all yes; } This disables it for all searches. I've already used this setting. It works well if you search across *all* databases, but if you specify one, the search goes on and displays the whole database content after several minutes hanging the machine (please don't experiment with my server). -- Marc Dequènes (Duck) pgpFRHtpQG9HP.pgp Description: PGP Digital Signature
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
On Sun, May 23, 2010 at 10:51:45PM +0300, Sergey Poznyakoff wrote: > Marc Dequ?nes (Duck) ha escrit: > > > How can we limit in non-default searches ? > > I described this in my previous post. Add the following to your config > file: > > strategy all { >deny-all yes; > } > > This disables it for all searches. ---end quoted text--- According to the info page, the above would disable the "all" strategy when the database argument is '*' or '!'. But that will not disable the "all" strategy when a database argument is something like "gcide". As far as I understand, that the problem is that even for the latter case, it would cause a 100% CPU load. -- أحمد المحمودي (Ahmed El-Mahmoudy) Digital design engineer GPG KeyID: 0xEDDDA1B7 GPG Fingerprint: 8206 A196 2084 7E6D 0DF8 B176 BC19 6A94 EDDD A1B7 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Marc Dequènes (Duck) ha escrit: > How can we limit in non-default searches ? I described this in my previous post. Add the following to your config file: strategy all { deny-all yes; } This disables it for all searches. Regards, Sergey -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
Quoting Sergey Poznyakoff : No, there is: strategy all { deny-all yes; } But it only affects default searches, while certain databases are quite big (gcide is 13MB), i tested with a 1MB database to see: # time dico --host=dico.duckcorp.org --noauth -d fd-cro-eng -s all "sproutchploufpiou" ... real3m46.654s user0m0.576s sys 0m2.296s I don't think this is acceptable. How can we limit in non-default searches ? Or even have it disappear in the list of strategies completely, so that the listing in the CLI or web interface only present the ones authorized at least in certain conditions ? I can't see that in the manual. -- Marc Dequènes (Duck) pgp6dqFOLf9xq.pgp Description: PGP Digital Signature
Bug#582799: [Bug-dico] Bug#582799: dicod: the 'all' strategy is dangerous for a production server !
ãí¥ï ç¤¥í¥¨ïª ha escrit: > > The "Match everything (experimental)" strategy is not suited for > > production servers, as its name says, and consume all CPU, leading > > to an easy DOS attack method. Any implementation of "match everything" strategy is potentially harmful and certainly not suited for production servers. I thought it was obvious. > There is no way to deactivate it, No, there is: strategy all { deny-all yes; } See the manual, section 3.3.12 "Strategies and Default Searches" [1] Regards, Sergey [1] http://dico.prog.gnu.org.ua/manual/html_section/Configuration.html#SEC29 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org