Re: AW: [ZODB-Dev] diploma thesis: ZODB Indexing

2007-09-05 Thread Tino Wildenhain

Sebastian Wehrmann schrieb:

Am 4. September 2007, 16:17:27 Uhr schrieb Jim Fulton:

I would very much like to see an open indexing+querying framework for  
Python objects.  I'm thinking of something *like* an SQL engine that  
allowed one to plug in relation and index implementations and that  
took queries in some form, optimized them and executing them using  
the given index and relations.



We plan to realize three things important to us:

- We don't want indexing on application-level (e.g. application-specific)
- We want ad-hoc queries
- We don't want to rely on transforming the data into a relational model


very well!

Greets
Tino
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: AW: [ZODB-Dev] diploma thesis: ZODB Indexing

2007-09-05 Thread Jim Fulton


On Sep 5, 2007, at 3:42 AM, Sebastian Wehrmann wrote:


Am 4. September 2007, 16:17:27 Uhr schrieb Jim Fulton:


I would very much like to see an open indexing+querying framework for
Python objects.  I'm thinking of something *like* an SQL engine that
allowed one to plug in relation and index implementations and that
took queries in some form, optimized them and executing them using
the given index and relations.



We plan to realize three things important to us:

- We don't want indexing on application-level (e.g. application- 
specific)


What does that mean?  Relational applications certainly define  
application specific indexes.



- We want ad-hoc queries


OK.

- We don't want to rely on transforming the data into a relational  
model


Good.


My tutor is Christian Theune here at gocept.

My ZODB experience is quite low, I'm just getting started. Bear  
with any questions I come up, please.


I'd like to see a generic framework for defining collections and  
indexes in Python and querying them efficiently.  No ZODB expertise  
should be needed,


Jim

--
Jim Fulton  mailto:[EMAIL PROTECTED]Python 
Powered!
CTO (540) 361-1714  
http://www.python.org
Zope Corporationhttp://www.zope.com http://www.zope.org



___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: AW: [ZODB-Dev] diploma thesis: ZODB Indexing

2007-09-05 Thread Christian Theune
Hey,

I'm afraid I can't forget what you mentioned. ;)

Honestly: I'm sorry if I sounded like repelling your ideas. That wasn't
my intention.

I've got the feeling that the current mode of communication is annoying
for you. Unfortunately I'm struggling a bit on coordinating a discussion
that flows both on the list and in our office, I'll try to find a better
mode. Hints and tips are welcome.

Am Mittwoch, den 05.09.2007, 09:55 -0400 schrieb Jim Fulton:
 On Sep 5, 2007, at 9:39 AM, Christian Theune wrote:

  Am Mittwoch, den 05.09.2007, 09:24 -0400 schrieb Jim Fulton:
  I'd like to see a generic framework for defining collections and
  indexes in Python and querying them efficiently.  No ZODB expertise
  should be needed,
 
  I have the feeling you already pondered this a bit and have some more
  specific ideas ... :)
 
 No more than I've said before.  There is no reason why this has to be  
 zodb specific.

Sebastian and I talked about this face to face and we think we
understood your hint that this doesn't have to be ZODB specific. 

We imagine we need two kinds of components to make this work:

1. A query processor that could look like:

class IQueryProcessor(Interface):

def query(...):
Returns a list of matching objects. The parameters are
   specific to the query processor in use.


Alternatively, as the signature of the only method isn't specified
anyway, we could make each query processor define its own interface
instead.

2. An object collection that serves two purposes:

a) maintain indexes

b) provide a low-level query API that is rich enough to let different
query processors e.g. for SQL, xpath, ... work against them.

This is the one that needs most work to get the separation of concerns
right. One split we came up with are the responsibilities to define:

- which objects to index
- how to store the indexes
- how to derive the structural relations between objects

Those could be separated into individual components and make the object
collection a component that joins those together.

On the definition of indexes: we're not sure whether a generic set of
indexes will be sufficient (e.g. the three indexes from XISS - class
index, attribute index, structural index) or do those need to be
exchanged? 

For our ad-hoc querying we certainly don't want to have to set up
specialised indexes to make things work, but maybe optional indexes
could be used when possible -- just like RDBMS.


Christian

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: AW: [ZODB-Dev] diploma thesis: ZODB Indexing

2007-09-05 Thread Stephan Richter
On Wednesday 05 September 2007 11:00, Christian Theune wrote:
 b) provide a low-level query API that is rich enough to let different
 query processors e.g. for SQL, xpath, ... work against them.

Having thought about this problem domain too (see my old work on ZOQLMethod), 
there is a big difference between relational- and tree-based querying. I 
wonder whether you can get both covered -- would be awesome though. :-)

I would definitely be interested in seeing public discussion about the 
approach here. Some random thoughts that you probably know already:

* The storage of indices should be pluggable like the ZODB. This would allow 
backends like pyLucene, BTree-based ones or even relational databases.

* One of the big problems right now is that there is no efficient way to do 
inverses of searches. I think some time should be spent doing this.

* I really like the API of hurry.query. I would love to see something like it 
as the backend for querying languages.

I am really looking forward to what you will come up with!

Regards,
Stephan
-- 
Stephan Richter
CBU Physics  Chemistry (B.S.) / Tufts Physics (Ph.D. student)
Web2k - Web Software Design, Development and Training
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: AW: [ZODB-Dev] diploma thesis: ZODB Indexing

2007-09-05 Thread Tino Wildenhain

Christian Theune schrieb:

Am Mittwoch, den 05.09.2007, 09:24 -0400 schrieb Jim Fulton:
I'd like to see a generic framework for defining collections and  
indexes in Python and querying them efficiently.  No ZODB expertise  
should be needed,


I have the feeling you already pondered this a bit and have some more
specific ideas ... :)

I also have the feeling that our goal for ad-hoc querying would be
incompatible with your envisioned framework for defining collections and
indexes. 


My impression is many people have thought about this problem.
Now someone stood up and starts actual working, this is good :-)

I think ad-hoc queries are not per se incompatible, they would
just act like sequential scan in relational databases - therefore
work but not too efficient. Maybe the api can generate a warning
if desired so the application developer can add indexing.

Consequently thinking about the whole scope, sometime at the
end we will even need further abstraction regarding authorization
and access to objects and attributes. This would dramatically
change the way zope works with ZODB but open a lot more uses
of ZODB independently of the Zope world.

Just my 1e-21 cents :-)

Tino
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


[ZODB-Dev] Re: AW: diploma thesis: ZODB Indexing

2007-09-05 Thread Laurence Rowe

Christian Theune wrote:
snip /

We imagine we need two kinds of components to make this work:

1. A query processor that could look like:

class IQueryProcessor(Interface):

def query(...):
Returns a list of matching objects. The parameters are
   specific to the query processor in use.


Alternatively, as the signature of the only method isn't specified
anyway, we could make each query processor define its own interface
instead.

2. An object collection that serves two purposes:

a) maintain indexes

b) provide a low-level query API that is rich enough to let different
query processors e.g. for SQL, xpath, ... work against them.

This is the one that needs most work to get the separation of concerns
right. One split we came up with are the responsibilities to define:

- which objects to index
- how to store the indexes
- how to derive the structural relations between objects

Those could be separated into individual components and make the object
collection a component that joins those together.

On the definition of indexes: we're not sure whether a generic set of
indexes will be sufficient (e.g. the three indexes from XISS - class
index, attribute index, structural index) or do those need to be
exchanged? 


For our ad-hoc querying we certainly don't want to have to set up
specialised indexes to make things work, but maybe optional indexes
could be used when possible -- just like RDBMS.



Make sure you take a look at SQLAlchemy's implementation of this, 
sqlalchemy.orm.query.


RDBMS do not get fast querying for free... They just revert to a 
complete record scan when they do not have an index - analogous to the 
find tab in the ZMI. As anyone who has ever queried such a database can 
attest, it ain't quick. (RDBMSs tend to create implicit indexes on 
primary and foreign keys also.)


Laurence

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] Re: AW: diploma thesis: ZODB Indexing

2007-09-05 Thread Christian Theune
Hi,

Am Mittwoch, den 05.09.2007, 21:47 +0100 schrieb Laurence Rowe:
 Make sure you take a look at SQLAlchemy's implementation of this, 
 sqlalchemy.orm.query.

Thanks for the tip.

 RDBMS do not get fast querying for free... They just revert to a 
 complete record scan when they do not have an index - analogous to the 
 find tab in the ZMI. As anyone who has ever queried such a database can 
 attest, it ain't quick. (RDBMSs tend to create implicit indexes on 
 primary and foreign keys also.)

Well. They do have some support on the storage side because of the
strong typed rectangular shape. E.g. we discovered that postgres seems
to never do index lookups in tables with less than about 1000 rows --
for us it always did table scans even when indexes existed.

___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev