[Zope] ZCatalog Proximity Search (was: RE: [Zope] ZODB or not ZODB?)

2000-06-30 Thread Dieter Maurer

Eric L. Walstad writes:
  // I use ZCatalog to catalog objects in the file system included into
  // the Zope world via LocalFS.
  // Minor patch to LocalFS needed. Everything works with the exception
  // of proximity searches.
  Q1 - What is a "proximity search" and when would one be used?
Proximity searches are searches for terms that are near in the
document. ZCatalog uses "..." as near operator.
It implements a proximity search as an "and" search and assigns
the document a relevance that is inverse proportional
to the distance of the term occurrences.

  Q2 - Where can I find the minor patch you mentioned?  I just searched
  Zope.org and didn't see it.
URL:http://www.handshake.de/~dieter/pyprojects/zope/near.pat

The patch fixes Zope 2.1.6 ZCatalog's "near" operator.
It will work only for Zope 2.1.6.

ZCatalog is a development hotspot. Lots have changed
in the CVS and for Zope 2.2. Maybe, a patch is no
longer needed for Zope 2.2.


URL:http://www.handshake.de/~dieter/pyprojects/zope/localfs.pat

is the small patch for LocalFS. It gives LocalFS the necessary
infrastructure, that Zope's "find" machinery works for it.
This machinery is also used by ZCatalog.



Dieter

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-28 Thread Jim Fulton

(snip)
 The filesystem, imho.  This lets you spread things out over
 multiple disks and even (perhaps) multiple systems.  Worst case
 you've got 50k x 15k = 750M.  Big for a ZODB (?),

No.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
Technical Director   (888) 344-4332http://www.python.org  
Digital Creationshttp://www.digicool.com   http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-28 Thread Jim Fulton

Casey Duncan wrote:
 
 I am implementing a document Library using Zope. It has an exhaustive index
 with several thousand topics in an outline residing on a PostgreSQL
 database. This works well and I like it.
 
 My question is where is the best place to store the documents themselves?
 They will be static HTML documents ranging from 1-50Kb in size roughly.
 There will probably be at least 10,000-15,000 of these documents in the
 library once all is said and done.
 
 In my mind I have three options:
 
 1. Store them on the filesystem.
 2. Store them in a PgSQL table as blobs.
 3. Store them as DTML Docs in the ZODB.
 
 I would like to eventually have full text searching capabilities, so that
 makes #1 less attractive (I would likely need my own Python method to do
 it). #2 is somewhat of a pain to implement due to limitations in the PgSQL
 row size and text searching would be slow. With #3 I could in theory use a
 ZCatalog to implement the searching, so that is done for me.

In theory, you could use ZCatalog to catalog objects in the
file system or in a RDBMS, providing that you can provide 
paths for them. I don't think anyone's done this yet. There
are bound to be bumps from wjoever does it first. :)
 
 Is ZODB up to the task of storing this quantity of objects? What problems
 might I run into? Is it a wise idea, could a data.fs file of this size
 (~3-400MB) become too easily corrupted?

No. Zope.Org varies from 300MB to close to 2GB.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
Technical Director   (888) 344-4332http://www.python.org  
Digital Creationshttp://www.digicool.com   http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-28 Thread Jim Fulton

Andrew Kenneth Milton wrote:
 
 +[ Cary O'Brien ]-
 |
 | I'll let others speak to 3.  I've never had a problem with ZODB, but I've
 | never put 750MB in it.
 
 It'll take a fair amount of abuse :-)
 
 I've loaded the entire dmoz data into it (once only) just to see what
 would happen. Booting was slow, but, once it got up, it ran OK.

In addition, if you closed it explicitly, an index would
be written. The next "boot" would then be pretty fast.

Jim

--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
Technical Director   (888) 344-4332http://www.python.org  
Digital Creationshttp://www.digicool.com   http://www.zope.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-28 Thread Jonothan Farr

 In theory, you could use ZCatalog to catalog objects in the
 file system or in a RDBMS, providing that you can provide
 paths for them. I don't think anyone's done this yet. There
 are bound to be bumps from wjoever does it first. :)

There's a patch to the Local File System product to allow indexing files in the
file system. This will incorporated into the next version.

--jfarr




___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-28 Thread Lalo Martins

On Wed, Jun 28, 2000 at 10:07:25AM -0400, Jim Fulton wrote:
 Casey Duncan wrote:
  
  Is ZODB up to the task of storing this quantity of objects? What problems
  might I run into? Is it a wise idea, could a data.fs file of this size
  (~3-400MB) become too easily corrupted?
 
 No. Zope.Org varies from 300MB to close to 2GB.

What about adding a box somewhere in zope.org telling us the
current size of the ZODB and perhaps some other stats (dunno,
RAM, number of processes)?

[]s,
   |alo
   +
--
  Hack and Roll  ( http://www.hackandroll.org )
News for, uh, whatever it is that we are.


http://zope.gf.com.br/lalo   mailto:[EMAIL PROTECTED]
 pgp key: http://zope.gf.com.br/lalo/pessoal/pgp

Brazil of Darkness (RPG)--- http://zope.gf.com.br/BroDar

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-28 Thread Dieter Maurer

Jim Fulton writes:
  In theory, you could use ZCatalog to catalog objects in the
  file system or in a RDBMS, providing that you can provide 
  paths for them. I don't think anyone's done this yet. There
  are bound to be bumps from wjoever does it first. :)
I use ZCatalog to catalog objects in the file system included into
the Zope world via LocalFS.
Minor patch to LocalFS needed. Everything works with the exception
of proximity searches.



Dieter

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-21 Thread ethan mindlace fremen

Casey Duncan wrote:

 Is ZODB up to the task of storing this quantity of objects? What problems
 might I run into? Is it a wise idea, could a data.fs file of this size
 (~3-400MB) become too easily corrupted? Should I use a separate data.fs file
 just to store the documents (ie using mountedFileStorage)? Or is it better
 to use method #1 or #2? Information from anyone with experience in this
 regard is greatly appreciated.

Casey,

Zope.org is 375 MB packed, and it grows by 100 MB a *day*.  There are
8500 member folders.  When you get this many objects in a folder, accessing the 
folder (though not the objects themselves) gets *slow*.

more info here:

http://www.zope.org/Wikis/zope-dev/ReallyBigFolders

ethan mindlace fremen
Zopatista Community Liason

___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-18 Thread Cary O'Brien

 charset="iso-8859-1"
 
 I am implementing a document Library using Zope. It has an exhaustive index
 with several thousand topics in an outline residing on a PostgreSQL
 database. This works well and I like it.
 
 My question is where is the best place to store the documents themselves?
 They will be static HTML documents ranging from 1-50Kb in size roughly.
 There will probably be at least 10,000-15,000 of these documents in the
 library once all is said and done.
 
 In my mind I have three options:
 
 1. Store them on the filesystem.
 2. Store them in a PgSQL table as blobs.
 3. Store them as DTML Docs in the ZODB.
 

The filesystem, imho.  This lets you spread things out over
multiple disks and even (perhaps) multiple systems.  Worst case
you've got 50k x 15k = 750M.  Big for a ZODB (?), but no sweat
for a file system.  PgSQL blobs are not yet ready for prime time.
For one thing, I think they are all created in the same directory.
And I'm a big PgSQL fan, so this pains me to say, but it is true.
They are working on it.  See the TOAST project in the postgresql
mailing lists.

You want to spread the documents out over a couple of directories.
I've set up systems where everything had an ID and we'd split things
up via digits in the id.  I.e. document 252a8b7c is file 25/2a/7b/25218b7c.

You could even compress the files if you wanted to.

And you could use the "LocalFileSystem" (is that it?) product to
serve up the files through Zope.  You could tweak it to decompress
too.

 I would like to eventually have full text searching capabilities, so that
 makes #1 less attractive (I would likely need my own Python method to do
 it). #2 is somewhat of a pain to implement due to limitations in the PgSQL
 row size and text searching would be slow. With #3 I could in theory use a
 ZCatalog to implement the searching, so that is done for me.
 

I'd put the full text search into PostgreSQL.  When the doc comes in,
strip out the keywords and index it.

 Is ZODB up to the task of storing this quantity of objects? What problems
 might I run into? Is it a wise idea, could a data.fs file of this size
 (~3-400MB) become too easily corrupted? Should I use a separate data.fs file
 just to store the documents (ie using mountedFileStorage)? Or is it better
 to use method #1 or #2? Information from anyone with experience in this
 regard is greatly appreciated.
 

We implemented a system using #1.  Actually, we had lots of little documents
so we concatted and gziped them in batches of 200, keeping the filename, offset,
and length.  Turns out it was quick enought to unzip the file and pick out
the document of interest.  And batching them up kept the compression ratio
up.

System worked great, but was cancelled about a week before it was going
to go online.  ouch.

I'll let others speak to 3.  I've never had a problem with ZODB, but I've
never put 750MB in it.

-- cary



 -Casey Duncan
 [EMAIL PROTECTED]
 
 
 --__--__--
 


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )




Re: [Zope] ZODB or not ZODB?

2000-06-18 Thread Kevin Dangoor

- Original Message -
From: "Casey Duncan" [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, June 16, 2000 6:35 PM
Subject: [Zope] ZODB or not ZODB?


 My question is where is the best place to store the documents themselves?
 They will be static HTML documents ranging from 1-50Kb in size roughly.
 There will probably be at least 10,000-15,000 of these documents in the
 library once all is said and done.

 In my mind I have three options:

 1. Store them on the filesystem.
 2. Store them in a PgSQL table as blobs.
 3. Store them as DTML Docs in the ZODB.

 Is ZODB up to the task of storing this quantity of objects? What problems
 might I run into? Is it a wise idea, could a data.fs file of this size
 (~3-400MB) become too easily corrupted? Should I use a separate data.fs
file
 just to store the documents (ie using mountedFileStorage)? Or is it better
 to use method #1 or #2? Information from anyone with experience in this
 regard is greatly appreciated.

There are people who have experience with giant ZODBs... some people have
run into the 2GB ext2fs file size limit. My Data.fs has been around ~100MB.
FileSystem Storage is really quite stable, and is not likely to get
corrupted no matter what the size. If you need to store the docs in multiple
drives, you can use the mountable storages to set up another file on the
other disk.

One thing to be aware of: 10-15K documents is too much for a single Folder.
You either want to break the docs up into multiple folders, or hang on for
the BTreeFolder product.

One other nice thing about storing in the ZODB: it's pretty easy to make
your documents automatically add themselves to the ZCatalog. No need to
manually update the indexes. (This would be true of PgSQL, but not the fs.)

Kevin


___
Zope maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )