[Zope] ZCatalog Proximity Search (was: RE: [Zope] ZODB or not ZODB?)
Eric L. Walstad writes: > // I use ZCatalog to catalog objects in the file system included into > // the Zope world via LocalFS. > // Minor patch to LocalFS needed. Everything works with the exception > // of proximity searches. > Q1 - What is a "proximity search" and when would one be used? Proximity searches are searches for terms that are near in the document. ZCatalog uses "..." as near operator. It implements a proximity search as an "and" search and assigns the document a relevance that is inverse proportional to the distance of the term occurrences. > Q2 - Where can I find the minor patch you mentioned? I just searched > Zope.org and didn't see it. URL:http://www.handshake.de/~dieter/pyprojects/zope/near.pat The patch fixes Zope 2.1.6 ZCatalog's "near" operator. It will work only for Zope 2.1.6. ZCatalog is a development hotspot. Lots have changed in the CVS and for Zope 2.2. Maybe, a patch is no longer needed for Zope 2.2. URL:http://www.handshake.de/~dieter/pyprojects/zope/localfs.pat is the small patch for LocalFS. It gives LocalFS the necessary infrastructure, that Zope's "find" machinery works for it. This machinery is also used by ZCatalog. Dieter ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
Jim Fulton writes: > In theory, you could use ZCatalog to catalog objects in the > file system or in a RDBMS, providing that you can provide > paths for them. I don't think anyone's done this yet. There > are bound to be bumps from wjoever does it first. :) I use ZCatalog to catalog objects in the file system included into the Zope world via LocalFS. Minor patch to LocalFS needed. Everything works with the exception of proximity searches. Dieter ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
On Wed, Jun 28, 2000 at 10:07:25AM -0400, Jim Fulton wrote: > Casey Duncan wrote: > > > Is ZODB up to the task of storing this quantity of objects? What problems > > might I run into? Is it a wise idea, could a data.fs file of this size > > (~3-400MB) become too easily corrupted? > > No. Zope.Org varies from 300MB to close to 2GB. What about adding a box somewhere in zope.org telling us the current size of the ZODB and perhaps some other stats (dunno, RAM, number of processes)? []s, |alo + -- Hack and Roll ( http://www.hackandroll.org ) News for, uh, whatever it is that we are. http://zope.gf.com.br/lalo mailto:[EMAIL PROTECTED] pgp key: http://zope.gf.com.br/lalo/pessoal/pgp Brazil of Darkness (RPG)--- http://zope.gf.com.br/BroDar ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
> In theory, you could use ZCatalog to catalog objects in the > file system or in a RDBMS, providing that you can provide > paths for them. I don't think anyone's done this yet. There > are bound to be bumps from wjoever does it first. :) There's a patch to the Local File System product to allow indexing files in the file system. This will incorporated into the next version. --jfarr ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
Andrew Kenneth Milton wrote: > > +[ Cary O'Brien ]- > | > | I'll let others speak to 3. I've never had a problem with ZODB, but I've > | never put 750MB in it. > > It'll take a fair amount of abuse :-) > > I've loaded the entire dmoz data into it (once only) just to see what > would happen. Booting was slow, but, once it got up, it ran OK. In addition, if you closed it explicitly, an index would be written. The next "boot" would then be pretty fast. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! Technical Director (888) 344-4332http://www.python.org Digital Creationshttp://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
Casey Duncan wrote: > > I am implementing a document Library using Zope. It has an exhaustive index > with several thousand topics in an outline residing on a PostgreSQL > database. This works well and I like it. > > My question is where is the best place to store the documents themselves? > They will be static HTML documents ranging from 1-50Kb in size roughly. > There will probably be at least 10,000-15,000 of these documents in the > library once all is said and done. > > In my mind I have three options: > > 1. Store them on the filesystem. > 2. Store them in a PgSQL table as blobs. > 3. Store them as DTML Docs in the ZODB. > > I would like to eventually have full text searching capabilities, so that > makes #1 less attractive (I would likely need my own Python method to do > it). #2 is somewhat of a pain to implement due to limitations in the PgSQL > row size and text searching would be slow. With #3 I could in theory use a > ZCatalog to implement the searching, so that is done for me. In theory, you could use ZCatalog to catalog objects in the file system or in a RDBMS, providing that you can provide paths for them. I don't think anyone's done this yet. There are bound to be bumps from wjoever does it first. :) > Is ZODB up to the task of storing this quantity of objects? What problems > might I run into? Is it a wise idea, could a data.fs file of this size > (~3-400MB) become too easily corrupted? No. Zope.Org varies from 300MB to close to 2GB. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! Technical Director (888) 344-4332http://www.python.org Digital Creationshttp://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
(snip) > The filesystem, imho. This lets you spread things out over > multiple disks and even (perhaps) multiple systems. Worst case > you've got 50k x 15k = 750M. Big for a ZODB (?), No. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! Technical Director (888) 344-4332http://www.python.org Digital Creationshttp://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats. ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
Casey Duncan wrote: > Is ZODB up to the task of storing this quantity of objects? What problems > might I run into? Is it a wise idea, could a data.fs file of this size > (~3-400MB) become too easily corrupted? Should I use a separate data.fs file > just to store the documents (ie using mountedFileStorage)? Or is it better > to use method #1 or #2? Information from anyone with experience in this > regard is greatly appreciated. Casey, Zope.org is 375 MB packed, and it grows by 100 MB a *day*. There are >8500 member folders. When you get this many objects in a folder, accessing the >folder (though not the objects themselves) gets *slow*. more info here: http://www.zope.org/Wikis/zope-dev/ReallyBigFolders ethan mindlace fremen Zopatista Community Liason ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
- Original Message - From: "Casey Duncan" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, June 16, 2000 6:35 PM Subject: [Zope] ZODB or not ZODB? > My question is where is the best place to store the documents themselves? > They will be static HTML documents ranging from 1-50Kb in size roughly. > There will probably be at least 10,000-15,000 of these documents in the > library once all is said and done. > > In my mind I have three options: > > 1. Store them on the filesystem. > 2. Store them in a PgSQL table as blobs. > 3. Store them as DTML Docs in the ZODB. > > Is ZODB up to the task of storing this quantity of objects? What problems > might I run into? Is it a wise idea, could a data.fs file of this size > (~3-400MB) become too easily corrupted? Should I use a separate data.fs file > just to store the documents (ie using mountedFileStorage)? Or is it better > to use method #1 or #2? Information from anyone with experience in this > regard is greatly appreciated. There are people who have experience with giant ZODBs... some people have run into the 2GB ext2fs file size limit. My Data.fs has been around ~100MB. FileSystem Storage is really quite stable, and is not likely to get corrupted no matter what the size. If you need to store the docs in multiple drives, you can use the mountable storages to set up another file on the other disk. One thing to be aware of: 10-15K documents is too much for a single Folder. You either want to break the docs up into multiple folders, or hang on for the BTreeFolder product. One other nice thing about storing in the ZODB: it's pretty easy to make your documents automatically add themselves to the ZCatalog. No need to manually update the indexes. (This would be true of PgSQL, but not the fs.) Kevin ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
+[ Cary O'Brien ]- | | I'll let others speak to 3. I've never had a problem with ZODB, but I've | never put 750MB in it. It'll take a fair amount of abuse :-) I've loaded the entire dmoz data into it (once only) just to see what would happen. Booting was slow, but, once it got up, it ran OK. -- Totally Holistic Enterprises Internet| P:+61 7 3870 0066 | Andrew Milton The Internet (Aust) Pty Ltd | F:+61 7 3870 4477 | ACN: 082 081 472 | M:+61 416 022 411 | Carpe Daemon PO Box 837 Indooroopilly QLD 4068|[EMAIL PROTECTED]| ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZODB or not ZODB?
> charset="iso-8859-1" > > I am implementing a document Library using Zope. It has an exhaustive index > with several thousand topics in an outline residing on a PostgreSQL > database. This works well and I like it. > > My question is where is the best place to store the documents themselves? > They will be static HTML documents ranging from 1-50Kb in size roughly. > There will probably be at least 10,000-15,000 of these documents in the > library once all is said and done. > > In my mind I have three options: > > 1. Store them on the filesystem. > 2. Store them in a PgSQL table as blobs. > 3. Store them as DTML Docs in the ZODB. > The filesystem, imho. This lets you spread things out over multiple disks and even (perhaps) multiple systems. Worst case you've got 50k x 15k = 750M. Big for a ZODB (?), but no sweat for a file system. PgSQL blobs are not yet ready for prime time. For one thing, I think they are all created in the same directory. And I'm a big PgSQL fan, so this pains me to say, but it is true. They are working on it. See the TOAST project in the postgresql mailing lists. You want to spread the documents out over a couple of directories. I've set up systems where everything had an ID and we'd split things up via digits in the id. I.e. document 252a8b7c is file 25/2a/7b/25218b7c. You could even compress the files if you wanted to. And you could use the "LocalFileSystem" (is that it?) product to serve up the files through Zope. You could tweak it to decompress too. > I would like to eventually have full text searching capabilities, so that > makes #1 less attractive (I would likely need my own Python method to do > it). #2 is somewhat of a pain to implement due to limitations in the PgSQL > row size and text searching would be slow. With #3 I could in theory use a > ZCatalog to implement the searching, so that is done for me. > I'd put the full text search into PostgreSQL. When the doc comes in, strip out the keywords and index it. > Is ZODB up to the task of storing this quantity of objects? What problems > might I run into? Is it a wise idea, could a data.fs file of this size > (~3-400MB) become too easily corrupted? Should I use a separate data.fs file > just to store the documents (ie using mountedFileStorage)? Or is it better > to use method #1 or #2? Information from anyone with experience in this > regard is greatly appreciated. > We implemented a system using #1. Actually, we had lots of little documents so we concatted and gziped them in batches of 200, keeping the filename, offset, and length. Turns out it was quick enought to unzip the file and pick out the document of interest. And batching them up kept the compression ratio up. System worked great, but was cancelled about a week before it was going to go online. ouch. I'll let others speak to 3. I've never had a problem with ZODB, but I've never put 750MB in it. -- cary > -Casey Duncan > [EMAIL PROTECTED] > > > --__--__-- > ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )
[Zope] ZODB or not ZODB?
I am implementing a document Library using Zope. It has an exhaustive index with several thousand topics in an outline residing on a PostgreSQL database. This works well and I like it. My question is where is the best place to store the documents themselves? They will be static HTML documents ranging from 1-50Kb in size roughly. There will probably be at least 10,000-15,000 of these documents in the library once all is said and done. In my mind I have three options: 1. Store them on the filesystem. 2. Store them in a PgSQL table as blobs. 3. Store them as DTML Docs in the ZODB. I would like to eventually have full text searching capabilities, so that makes #1 less attractive (I would likely need my own Python method to do it). #2 is somewhat of a pain to implement due to limitations in the PgSQL row size and text searching would be slow. With #3 I could in theory use a ZCatalog to implement the searching, so that is done for me. Is ZODB up to the task of storing this quantity of objects? What problems might I run into? Is it a wise idea, could a data.fs file of this size (~3-400MB) become too easily corrupted? Should I use a separate data.fs file just to store the documents (ie using mountedFileStorage)? Or is it better to use method #1 or #2? Information from anyone with experience in this regard is greatly appreciated. -Casey Duncan [EMAIL PROTECTED] ___ Zope maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope-dev )