Re: [Zope-dev] Re: Spitter.c Hack
Tres: Okay, I uploaded it to my member folder. http://www.zope.org/Members/jspisak/Splitter/ I wasn't usinga sandbox for this, I just downloaded the source for 2.2.4 Here's the diff -u though: --- Zope-2.2.4-src/lib/python/SearchIndex/Splitter.cThu Jan 4 10:41:15 2001 +++ Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.cFri Jan 5 17:29:43 2001 @@ -169,24 +169,8 @@ len = PyString_Size(word) - 1; len = PyString_Size(word); -/*if(len < 2) Single-letter words are stop words! -{ - Py_INCREF(Py_None); - return Py_None; -} */ - -/* - Test whether a word has any letters. */ for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); ); -/*if (len < 0) -{ -Py_INCREF(Py_None); -return Py_None; -} - - * If no letters, treat it as a stop word. - */ Py_INCREF(word); Let me know what else I can do. Did you see my other mails regarding stats? > > From: "Jason Spisak" <[EMAIL PROTECTED]> wrote: > > > > Zopists, > > > > I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex. > > I have about 50,000 objects in that index, and search performance is nearly > > instantaneous still. I am running on a big machine though. If anyone > > wants those changes there's really easy. Just mail me directly, since it's > > a long file to post. > > Can you post a patch, or upload it to your Zope.org member folder > and post the link? > > cvs -q diff -u lib/python/SearchIndex/Splitter.c > > would do it, if you were working in a CVS sandbox for Zope. > > Tres. > -- > === > Tres Seaver[EMAIL PROTECTED] > Digital Creations "Zope Dealers" http://www.zope.org All my best, Jason Spisak CIO __ ___ ____ / // (_)_/_ __/__ / / ___ ___ __ _ / _ / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/ ' \ /_//_/_/_/ \__/_/ \__/\__/_//_/___(_)__/\___/_/_/_/ 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: Spitter.c Hack
> From: "Jason Spisak" <[EMAIL PROTECTED]> wrote: > > Zopists, > > I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex. > I have about 50,000 objects in that index, and search performance is nearly > instantaneous still. I am running on a big machine though. If anyone > wants those changes there's really easy. Just mail me directly, since it's > a long file to post. Can you post a patch, or upload it to your Zope.org member folder and post the link? cvs -q diff -u lib/python/SearchIndex/Splitter.c would do it, if you were working in a CVS sandbox for Zope. Tres. -- === Tres Seaver[EMAIL PROTECTED] Digital Creations "Zope Dealers" http://www.zope.org ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: Spitter.c Hack
Erik, > [Jason Spisak] > > | I am running on a big machine though. If anyone wants those changes > | there's really easy. Just mail me directly, since it's a long file > | to post. > > Hi. I would be interested in the file :-). > Okay, here's the diff. It truely is nothing more than cutting out the two parts that eliminate single letter words and numbers: *** Zope-2.2.4-src/lib/python/SearchIndex/Splitter.c --- Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.c *** *** 169,192 len = PyString_Size(word) - 1; len = PyString_Size(word); - /*if(len < 2) Single-letter words are stop words! - { - Py_INCREF(Py_None); - return Py_None; - } */ - - /* - Test whether a word has any letters. */ for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); ); - /*if (len < 0) - { - Py_INCREF(Py_None); - return Py_None; - } - - * If no letters, treat it as a stop word. - */ Py_INCREF(word); --- 169,176 > Would you also be willing to share some statistics on how many objects > you have in how many indexes, and how much time "complex" searches > take? I do understand if this is not possible, but it'd be appetiated > if it was possible. :-) > > Thanks. Well, here's the some output of the "Status" tab in the Catalog. Subtransactions are Disabled Subtransactions - Index Status * 48205 object are indexed in bobobase_modification_time * 48205 object are indexed in calendar_date * 48205 object are indexed in calendar_day * 48205 object are indexed in call_date * 48205 object are indexed in curators * 48205 object are indexed in data * 48205 object are indexed in id * 48205 object are indexed in meta_type * 48205 object are indexed in resume_in * 48205 object are indexed in status * 48205 object are indexed in users_calendar The only TextIndex is the 'data' index though. It is the one that gets hammered. Let's see...time stats...hmmm I put a REQUEST.set with the ZopeTime at the top of the search page and at the bottom after the 'in' tag for the Catalog. Search terms are: los and angeles and C++ and MFC and 310 Subtracting the float of the two times I get 1.85400104523 I'm not sure what that comes out to, I think it's part of a day though because of DateTime. The server stats: Dual Intel 400mhz Xenon w/ 1MB cache each LVD RAID 5 7200 RPM disk array 1GB RAM RedHat Linux 6.1 with some kernel updates... And the best piece of open source software I know: Zope 2.2.4 binary release Hope that helps. All my best, Jason Spisak CIO __ ___ ____ / // (_)_/_ __/__ / / ___ ___ __ _ / _ / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/ ' \ /_//_/_/_/ \__/_/ \__/\__/_//_/___(_)__/\___/_/_/_/ 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: Spitter.c Hack
Casey Duncan: It truely is nothing more than cutting out the two parts that eliminate single letter words and numbers: *** Zope-2.2.4-src/lib/python/SearchIndex/Splitter.c --- Zope-2.2.4-src/lib/python/SearchIndex/Splitter_Old.c *** *** 169,192 len = PyString_Size(word) - 1; len = PyString_Size(word); - /*if(len < 2) Single-letter words are stop words! - { - Py_INCREF(Py_None); - return Py_None; - } */ - - /* - Test whether a word has any letters. */ for (; --len >= 0 && ! isalpha((unsigned char)cword[len]); ); - /*if (len < 0) - { - Py_INCREF(Py_None); - return Py_None; - } - - * If no letters, treat it as a stop word. - */ Py_INCREF(word); --- 169,176 All my best, Jason Spisak CIO __ ___ ____ / // (_)_/_ __/__ / / ___ ___ __ _ / _ / / __/ -_) / / -_) __/ _ \(_-<_/ __/ _ \/ ' \ /_//_/_/_/ \__/_/ \__/\__/_//_/___(_)__/\___/_/_/_/ 6151 West Century Boulevard Suite 900 Los Angeles, CA 90045 P. 310.665.3444 F. 310.665.3544 Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: Spitter.c Hack
Jason Spisak wrote: > > Zopists, > > I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex. > I have about 50,000 objects in that index, and search performance is nearly > instantaneous still. I am running on a big machine though. If anyone > wants those changes there's really easy. Just mail me directly, since it's > a long file to post. Could you maybe post just the diff for poserity? -- | Casey Duncan | Kaivo, Inc. | [EMAIL PROTECTED] `--> ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] Re: Spitter.c Hack
Zopists, I finally got Splitter.c to let me index numbers and 'C++' in a TextIndex. I have about 50,000 objects in that index, and search performance is nearly instantaneous still. I am running on a big machine though. If anyone wants those changes there's really easy. Just mail me directly, since it's a long file to post. All my best, Jason Spisak ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )