Re: [Zope-dev] ZCatalog scalability

2001-02-18 Thread Michael R. Bernstein

Erik Enge wrote:
 
 [Michael Bernstein]
 
 | I need to know how far the ZCatalog will scale using this indexing
 | and search strategy. Does anyone have anectodal or benchmark data to
 | suggest if (and when) I will hit a 'wall' regarding the number of
 | objects being indexed and searched?
 
 I'm going to try to stuff 27 million objects into ZODB sometime in the
 next week or the week after that (all post addresses in England).  I
 haven't got a clue as to whether this will work or just... well not
 work.  I haven't come up with a strategy for segmenting the data, but
 that shouldn't be a problem at all.  This isn't actually much data, so
 I don't expect the Data.fs file to more than 500 MB.
 
 I'm quite confident that ZODB, ZCatalog and BTree will scale very
 nicely for this.  I have a plan ;).
 
 I'll let you know how it goes.  (And please, do poke at me if it takes
 too long.)

Ok, I'm poking :-).

How did it go?

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] ZCatalog scalability

2001-02-18 Thread Erik Enge

[Michael R. Bernstein]

| Erik Enge wrote:
|  
|  I'll let you know how it goes.  (And please, do poke at me if it takes
|  too long.)
| 
| Ok, I'm poking :-).

Thanks.  Keep doing it till you get what you need, I truly don't
mind.  :-)
 
| How did it go?

Thanks to the speed of delivery at Royal Mail in the UK, I haven't
recieved the data yet (!).  They promised to have it too me by the
coming friday (the 23rd, I believe).  I'll try to process all 27
million records and give feeback to the community.

(I did a premature calculation of how long it would take to populate
the Zope instance, and I'm guessing somewhere between 80 to 100 hours,
meaning the feedback cannot come any sooner than Sunday, at best.)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] ZCatalog scalability

2001-02-18 Thread Michael R. Bernstein

Erik Enge wrote:
 
 [Michael R. Bernstein]
 
 | Erik Enge wrote:
 | 
 |  I'll let you know how it goes.  (And please, do poke at me if it takes
 |  too long.)
 |
 | Ok, I'm poking :-).
 
 Thanks.  Keep doing it till you get what you need, I truly don't
 mind.  :-)
 
 | How did it go?
 
 Thanks to the speed of delivery at Royal Mail in the UK, I haven't
 recieved the data yet (!).  They promised to have it too me by the
 coming friday (the 23rd, I believe).  I'll try to process all 27
 million records and give feeback to the community.
 
 (I did a premature calculation of how long it would take to populate
 the Zope instance, and I'm guessing somewhere between 80 to 100 hours,
 meaning the feedback cannot come any sooner than Sunday, at best.)

What I'm looking for is any indication that object creation
time and/or indexing time goes up with the number of objects
already in the ZODB.

Will you be populating the ZODB in batches (say 100,000
objects or so)? If so, can you benchmark each batch, so we
can see if the batch proccessing time goes up as you
progress through the 270 batches?

Thanks,

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] ZCatalog scalability

2001-02-18 Thread Erik Enge

[Michael R. Bernstein]

| What I'm looking for is any indication that object creation time
| and/or indexing time goes up with the number of objects already in
| the ZODB.

Well, one thing I've already learned - which you all probably know -
is that you do _not_ want to put index_object() in your class'
__init__() method.  That's because the CatalogAwarness class you
subclass does this for you in the manage_afterAdd() method.

If you put index_object() in __init__() you might as well go on
holliday before it's finished.  A long one.
 
| Will you be populating the ZODB in batches (say 100,000 objects or
| so)?

I'll do that as a secondary solution, if doing it in one batch is too
ineffective. 

| If so, can you benchmark each batch, so we can see if the batch
| proccessing time goes up as you progress through the 270 batches?

Yes.  No problemo, senor! :)

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] ZCatalog scalability

2001-02-18 Thread Michael R. Bernstein

Erik Enge wrote:
 
 [Michael R. Bernstein]
 
 | What I'm looking for is any indication that object creation time
 | and/or indexing time goes up with the number of objects already in
 | the ZODB.
 
 Well, one thing I've already learned - which you all probably know -
 is that you do _not_ want to put index_object() in your class'
 __init__() method.  That's because the CatalogAwarness class you
 subclass does this for you in the manage_afterAdd() method.

For my 'archive' applications, I'm using a SkinScript to
index the objects as they're added instead of subclassing
from CatalogAware.

 | Will you be populating the ZODB in batches (say 100,000 objects or
 | so)?
 
 I'll do that as a secondary solution, if doing it in one batch is too
 ineffective.

I should mention that even spliting this up into three
batches of 9 million records would *probably* give me the
indication I'm looking for, as to whether there was any
progressive performance degradation with the number of
objects.

Thanks again, Erik, and good luck!

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] ZCatalog scalability

2001-01-23 Thread Erik Enge

[Michael Bernstein]

| We seem to have disposed of the wildcard issue [snipped out
| below], and I'm looking forward to Eric's results, but does
| anyone else have any information about whether there is a
| practical upper limit on how many objects can be indexed and
| searched in a ZCatalog?

I don't know.  But there is one on BTree folders, right?  And as soon
as those get full, you need to start segment your data (which you
probably would have done in the first place anyway).  Then you can use
several ZCatalogs in different locations (so that they don't carry so
many objects each).  Then create a nice little method that finds out
which (or all) of the ZCatalogs to ask, when users make queries.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-23 Thread Erik Enge

[Chris Withers]

| ...and is that specifically for BTree folders, or Zope BTree's in general?

I don't believe that B-Tree folders have those kinds of limitations by
general design.  I'm more conserned that somewhere along the lines,
doing operations on a huge BTree Folder (Yes, in Zope) will be slow.

However, this is more gut-feeling than anything else.

Hm, more over, if you actually need to stuff that many objects into
one Folder, you are probably trying to use the wrong tool for the job.

I do expect that stuffing 27 million objects into one BTree Folder
will be slow, and I don't want to segment the data.  I do expect that
I'll have to resort to a relational database, and I have no problem
with that.  Object databases aren't always the right tool for the job,
and when they aren't, Zope let's me talk with the other ones nicely,
so no problemo seor ;).

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-23 Thread Michael Bernstein

Erik Enge wrote:
 
 [Chris Withers]
 
 | ...and is that specifically for BTree folders, or Zope BTree's in general?
 
 I don't believe that B-Tree folders have those kinds of limitations by
 general design.  I'm more conserned that somewhere along the lines,
 doing operations on a huge BTree Folder (Yes, in Zope) will be slow.

What sort of 'operations' do you mean? copying and pasting
the whole thing?

 Hm, more over, if you actually need to stuff that many objects into
 one Folder, you are probably trying to use the wrong tool for the job.
 
 I do expect that stuffing 27 million objects into one BTree Folder
 will be slow, and I don't want to segment the data.  I do expect that
 I'll have to resort to a relational database, and I have no problem
 with that.  Object databases aren't always the right tool for the job,
 and when they aren't, Zope let's me talk with the other ones nicely,
 so no problemo seor ;).

Eric,

I had separated the storage issue into a different thread
(Specialist/Rack Scalability), and received a reply from
Phillip Eby:

 Just to expand a little on the abov...  Racks should scale at least as
 well, if not larger than a ZCatalog, given the same storage backing for
 the ZODB.  This is because ZCatalog has to manage a minimum of one
 forward and reverse BTree for *each* index, plus another few BTrees
 for overall storage and housekeeping.  Also, keyword and full text
 indexes store multiple BTree entries per object, so that's a factor as
 well.

So the question I was asking is: "if we ignore the issue of
storage and consider indexing and searching the ZCatalog
alone, and assuming that wildcard searches are disallowed,
how far will a single ZCatalog with a text index (on a
computed attribute that concatenates several properties) and
a keyword index (for creating ZTopic heirarchies) scale?"

While I'm perfectly willing to split up the storage of the
data as neccessary, I am far less enamoured by the prospect
of divvying up the indexing and searching to multiple
ZCatalogs. In any case, according to Phillip, if I don't
have to split the ZCatalog, I shouldn't have to split the
storage (in Racks, anyway, but probably BTree Folders too),
either.

Anyway Eric, I hope that when you report your results,
you're able to separate indexing, searching, storage, and
retreival results, so that the appropriate factor can be
identified as the bottleneck. Or at least into
indexing/searching and storage/retreival.

Thanks,

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-23 Thread Erik Enge

[Michael Bernstein]

| Erik Enge wrote:
|  
|  I don't believe that B-Tree folders have those kinds of limitations by
|  general design.  I'm more conserned that somewhere along the lines,
|  doing operations on a huge BTree Folder (Yes, in Zope) will be slow.
| 
| What sort of 'operations' do you mean? copying and pasting
| the whole thing?

My point was that anything will be slow at some stage.  This is valid
for BTree Folders as well, and doing opertions on the objects
(copying, deleting, modifying), will be slower than adding them, I
guess.  I'm not quite sure here Michael, someone else has probably
more experience and knowledge to answer more correctly.
 
| I had separated the storage issue into a different thread

Oops, I forgot that, sorry.

| Anyway Eric, I hope that when you report your results, you're able
| to separate indexing, searching, storage, and retreival results, so
| that the appropriate factor can be identified as the bottleneck. Or
| at least into indexing/searching and storage/retreival.

Yes, I hope I'll manage to do that.  And till then, I guess we just
have to wait to see how things work in practice, which might be
different to the theory.  ;)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-22 Thread Dieter Maurer

Steve Alexander writes:
  Michael Bernstein wrote:
  
  
   Also, is there a way to disable wildcards in full text
   searches?
  
  Do not allow direct queries to search the catalog. Instead, make 
  searches go through an external method (or a PythonScript with Proxy 
  permissions) that uses string.replace to change '*' and '?' to ''.
Simply do not use a globbing vocabulary is another alternative.


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-22 Thread Michael Bernstein

We seem to have disposed of the wildcard issue [snipped out
below], and I'm looking forward to Eric's results, but does
anyone else have any information about whether there is a
practical upper limit on how many objects can be indexed and
searched in a ZCatalog?

Michael Bernstein wrote:
 
 After comsidering the feedback I got from the previous
 'Massive scalability' thread, I decided to split my queries
 into two areas: Rack scalability and ZCatalog scalability.
 This email deals with the latter.
 
 [snip]

 What I am interested in for my application are two things:
 
 - ZTopics populated using one or more keyword indexes
 
 - Full text search on a single computed attribute that
 concatenates several fields including the aforementioned
 keyword index fields and a few simple string attributes
 (title, caption, description, etc.)
 
 I need to know how far the ZCatalog will scale using this
 indexing and search strategy. Does anyone have anectodal or
 benchmark data to suggest if (and when) I will hit a 'wall'
 regarding the number of objects being indexed and searched?
 Some anectodal data suggests that single field indexing will
 scale easily to 60,000 objects, but what about hundreds of
 thousands or even millions of objects?

 [snip]

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




[Zope-dev] ZCatalog scalability

2001-01-21 Thread Michael Bernstein

After comsidering the feedback I got from the previous
'Massive scalability' thread, I decided to split my queries
into two areas: Rack scalability and ZCatalog scalability.
This email deals with the latter.

Partial match (wildcard) searches have already been
identified as a resource hog, depending on the size of the
result list. I am more than willing to give up wildcards in
my application for performance.

What I am interested in for my application are two things:

- ZTopics populated using one or more keyword indexes

- Full text search on a single computed attribute that
concatenates several fields including the aforementioned
keyword index fields and a few simple string attributes
(title, caption, description, etc.)

I need to know how far the ZCatalog will scale using this
indexing and search strategy. Does anyone have anectodal or
benchmark data to suggest if (and when) I will hit a 'wall'
regarding the number of objects being indexed and searched?
Some anectodal data suggests that single field indexing will
scale easily to 60,000 objects, what about hundreds of
thousands or even millions of objects?

Also, is there a way to disable wildcards in full text
searches?

Thanks,

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-21 Thread Steve Alexander

Michael Bernstein wrote:


 Also, is there a way to disable wildcards in full text
 searches?

Do not allow direct queries to search the catalog. Instead, make 
searches go through an external method (or a PythonScript with Proxy 
permissions) that uses string.replace to change '*' and '?' to ''.

--
Steve Alexander
Software Engineer
Cat-Box limited
http://www.cat-box.net



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-21 Thread Michael Bernstein

Steve Alexander wrote:
 
 Michael Bernstein wrote:
 
  Also, is there a way to disable wildcards in full text
  searches?
 
 Do not allow direct queries to search the catalog. Instead, make
 searches go through an external method (or a PythonScript with Proxy
 permissions) that uses string.replace to change '*' and '?' to ''.

A *very* handy suggestion. You might want to add that as a
Tip to Zope.org.

Thanks,

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-21 Thread Michael Bernstein

Erik Enge wrote:
 
 [Michael Bernstein]
 
 | I need to know how far the ZCatalog will scale using this indexing
 | and search strategy. Does anyone have anectodal or benchmark data to
 | suggest if (and when) I will hit a 'wall' regarding the number of
 | objects being indexed and searched?
 
 I'm going to try to stuff 27 million objects into ZODB sometime in the
 next week or the week after that (all post addresses in England).  I
 haven't got a clue as to whether this will work or just... well not
 work.  I haven't come up with a strategy for segmenting the data, but
 that shouldn't be a problem at all.  This isn't actually much data, so
 I don't expect the Data.fs file to more than 500 MB.
 
 I'm quite confident that ZODB, ZCatalog and BTree will scale very
 nicely for this.  I have a plan ;).
 
 I'll let you know how it goes.  (And please, do poke at me if it takes
 too long.

Will do, Thanks!

Michael Bernstein.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-21 Thread Erik Enge

[Michael Bernstein]

| I need to know how far the ZCatalog will scale using this indexing
| and search strategy. Does anyone have anectodal or benchmark data to
| suggest if (and when) I will hit a 'wall' regarding the number of
| objects being indexed and searched?

I'm going to try to stuff 27 million objects into ZODB sometime in the
next week or the week after that (all post addresses in England).  I
haven't got a clue as to whether this will work or just... well not
work.  I haven't come up with a strategy for segmenting the data, but
that shouldn't be a problem at all.  This isn't actually much data, so
I don't expect the Data.fs file to more than 500 MB.

I'm quite confident that ZODB, ZCatalog and BTree will scale very
nicely for this.  I have a plan ;).

I'll let you know how it goes.  (And please, do poke at me if it takes
too long.)


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-21 Thread Chris Withers

 Michael Bernstein wrote:


  Also, is there a way to disable wildcards in full text
  searches?

 Do not allow direct queries to search the catalog. Instead, make
 searches go through an external method (or a PythonScript with Proxy
 permissions) that uses string.replace to change '*' and '?' to ''.

Wouldn't using a normal vocabulary as opposed to a globbing vocabulary
prevent this as well?

cheers,

Chris


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability

2001-01-21 Thread Steve Alexander

Chris Withers wrote:

 
 Wouldn't using a normal vocabulary as opposed to a globbing vocabulary
 prevent this as well?

That would stop globbing searches for everyone.

While I might want to stop users of a site making wildcard searches, I 
still want to keep that facility for myself :-)

--
Steve Alexander
Software Engineer
Cat-Box limited
http://www.cat-box.net


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




[Zope-dev] ZCatalog Scalability

2001-01-17 Thread Chris Withers

John Eikenberry wrote:
 
 and retrieval. But ZCatalog did not. It was basically useless for partial
 matching searches (taking many minutes for searches that retrieved more
 than 100 matches). I was also concerned about the indexing overhead. It
 doesn't scale well when changing/adding many things at a time (we might
 have bulk adds/changes).

I wonder if this will change with Zope 2.3?

cheers,

Chris

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )




Re: [Zope-dev] ZCatalog scalability...

2000-12-06 Thread Chris Withers

John Eikenberry wrote:
 
 the potential of up
 to 50,000 entries. 

 Using a ZCatalog for
 listings

This may cause you real problems, especially if there's a 'bulk data
load' at any point.

Cheers,

Chris

PS: How's the catalog revamp coming along? Any published ZSearch
interface yet?

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )