Re: [Zope3-dev] Florent's O-R blog entry

2005-08-29 Thread Gary Poster


On Aug 27, 2005, at 6:08 AM, Wichert Akkerman wrote:


Previously Gary Poster wrote:


We have at least three maintained and capable ZODB backends,  with
different strengths and weaknesses, appropriate for different  use
cases.  Lets not jump to discard any of them.



With current filesystem developments it might be interesting to try
and use more filesystem capabilities in backends. Storing metadata
in extended attributes would be interesting for example. There is
always room for some experiments :)


Yes, and thank you for letting me agree with you. :-)

I'm not asserting that ZODB BTrees are the answer to world hunger; or  
that O/R mapping is evil and must be destroyed; or that we shouldn't  
experiment to find better solutions to our real problems.  My points  
are intended to be most pertinent to core design decisions for Zope  
community projects, given the current state of the Zope world.


In regards to your example idea, I agree that it might be an  
interesting approach to explore in ZODB backends.  I bet Ape would be  
the easiest way to begin experimenting with it.


Gary
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-24 Thread Lennart Regebro
I'm not convinced that Florent blog entry says what Gary thinks it
does, but I agree with Gary on the other stuff: It should be possible
by configuration, to switch out at least some parts to a relational
database.

The catalog indexes and metadata is a prime example of this. No, there
is nothing wrong with ZCatalog, but it is data like that that
relational databases are sepcialized to handle. For HUGE catalogs,
installing a dedictaed relational database makes sense.

As Florent pointed out, you may want to do typical aggregational stuff
on meta data I personally thing the right way to o that is by indexing
them in a catalog that supports that, like for example a relation
database. :-)

It would be nice to have some sort of transparent support to choose
how blobs are stored; in the ZODB on disk or in a relational database.

It would be nice to have an easy way for usage statistics to be stored
in a relational database. Storing the in the ZODB is generally not a
good idea (as they tend to make every click a write transaction).

Therefore, I agree with what I think Florent tried to say: An
enterprise CMS needs to have relational integration built in, straight
in the core, so that you can, configuratively store certain data in a
relational database.

I also agree with what I think Gary is trying to say: We should NOT
try to store as much stuff as possible in a relational database.

Just my 2 centimes.
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-24 Thread Gary Poster


On Aug 24, 2005, at 8:08 AM, Lennart Regebro wrote:


I'm not convinced that Florent blog entry says what Gary thinks it
does,


Hopefully you can see where I would get my interpretation, though.   
I'm happy to have Florent clarify on his return.  This is a good  
discussion in any case.



but I agree with Gary on the other stuff: It should be possible
by configuration, to switch out at least some parts to a relational
database.

The catalog indexes and metadata is a prime example of this. No, there
is nothing wrong with ZCatalog,


Well, I didn't say that, actually.  I said that Btrees are excellent  
tools; neither they, nor the Zope 3 catalog are hacks; and the pure  
ZODB catalog story has compelling advantages over an RDBMS catalog,  
in addition to disadvantages.



but it is data like that that
relational databases are sepcialized to handle. For HUGE catalogs,
installing a dedictaed relational database makes sense.


I can imagine situations in which this might be true, yes.  That  
said, BTrees are excellent for huge data sets.  Fixing the ZODB cache  
story would be a big step to making them even better.



As Florent pointed out, you may want to do typical aggregational stuff
on meta data I personally thing the right way to o that is by indexing
them in a catalog that supports that, like for example a relation
database. :-)


I suppose.  This doesn't strike me as hugely compelling--until  
someone explains to me why it wouldn't be, it seems like it would be  
easy to implement these sorts of features efficiently in a ZODB-based  
catalog if someone cared enough.


Again, my argument is not against O/R mappings, it is on relying on  
them exclusively for a shared platform.  I'm *very* happy to have us  
leverage the component architecture to have different component  
implementations for different use cases.  That, of course, makes  
plenty of sense, and is at the heart of Zope 3.



It would be nice to have some sort of transparent support to choose
how blobs are stored; in the ZODB on disk or in a relational database.


Sounds good.


It would be nice to have an easy way for usage statistics to be stored
in a relational database. Storing the in the ZODB is generally not a
good idea (as they tend to make every click a write transaction).


Absolutely.  RDBMS should be one option for that sort of use case, as  
should rotated flat files, probably.  We've used both, for different  
use cases.



Therefore, I agree with what I think Florent tried to say: An
enterprise CMS needs to have relational integration built in, straight
in the core, so that you can, configuratively store certain data in a
relational database.

I also agree with what I think Gary is trying to say: We should NOT
try to store as much stuff as possible in a relational database.


Cool.  I'll expand your summary of my email to say that a relational  
database should not be required for a shared enterprise CMS  
project, but RDBMS-based components, and transparent O/R backends  
like Ape, should be configurable options.  From other parts of your  
mail, it seems you might agree with that too.  I hope so.


I would be happy to agree with your interpretation of Florent's blog,  
given the caveats from my message, as you and I summarize them here.   
Some of Florent's blog is worded in such a way as to make me wonder  
if your interpretation is accurate, though. :-)


Gary
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-24 Thread Lennart Regebro
On 8/24/05, Gary Poster [EMAIL PROTECTED] wrote:
 Hopefully you can see where I would get my interpretation, though.

Yup.

 ZODB catalog story has compelling advantages over an RDBMS catalog,
 in addition to disadvantages.

I'm sure they do, although I'm not immediately aware of them (I would
be interested to hear them though).

  As Florent pointed out, you may want to do typical aggregational stuff
  on meta data I personally thing the right way to o that is by indexing
  them in a catalog that supports that, like for example a relation
  database. :-)

 I suppose.  This doesn't strike me as hugely compelling--until
 someone explains to me why it wouldn't be, it seems like it would be
 easy to implement these sorts of features efficiently in a ZODB-based
 catalog if someone cared enough.

I agree it should in any case be implemented by extending the
ZODB-based catalog to have these features, so that you can do this
transparently.

  I also agree with what I think Gary is trying to say: We should NOT
  try to store as much stuff as possible in a relational database.

 Cool.  I'll expand your summary of my email to say that a relational
 database should not be required for a shared enterprise CMS
 project, but RDBMS-based components, and transparent O/R backends
 like Ape, should be configurable options.  From other parts of your
 mail, it seems you might agree with that too.  I hope so.

Absolutely. In fact, I think one of the compelling things with Zope
has been it's ease of installation, because it is reasonably
self-contained, as it includes it's own dedicated database that does
not require any separate complicated setup.

The idea that you with a normal install exe file can install an
enterprise CMS on you windows machine to check it out is a *huge*
plus, and a serious hyping-point.

The enterprise CMS should not require a RDBMS. It should however be a
question of configuration if you want to use one for the parts where
it makes sense.

--
Lennart Regebro, Nuxeo http://www.nuxeo.com/
CPS Content Management http://www.cps-project.org/
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-24 Thread Martijn Faassen

Paul Winkler wrote:

Martijn Faassen wrote:


Missing powerful query concepts
---

Certain powerful query concepts like joins, available in a relational
setting, are missing. I've already run into a scenario where I wanted to
someting like this: given a bunch of version objects with field 'id',
where multiple objects can have the same 'id' to indicate they're
versions of the same object, I want all objects where field
'workflow_state' is 'PUBLISHED' unless there is another object with the
same id that have workflow_state 'NEW', in which case I want that one'.

I think joins would be a way to solve it, though I haven't figured out
the details, nor how to implement them efficiently on top of the
catalog. This kind of thing is where a relational database makes life a
lot simpler.



I used to have the same complaints in Zope 2, but so far I've been happy
with Dieter's AdvancedQuery product.  See
http://www.dieter.handshake.de/pyprojects/zope/AdvancedQuery.html
It might be worth a look while thinking about what to implement for zope 3.

Here's Dieter's example from that page:

from Products.AdvancedQuery import Eq, Between, Le

# search for objects below 'a/b/c' with ids between 'a' and 'z~'
query = Eq('path','a/b/c')  Between('id', 'a', 'z~')


Something very similar to this I can also do with the layer I built on 
top of Zope 3's catalog. It wasn't hard to write at all, which speaks 
for the clean design of the Zope 3 catalog.



# evaluate and sort descending by 'modified' and ascending by 'Creator'
context.Catalog.evalAdvancedQuery(query, (('modified','desc'), 'Creator',))


This is interesting and my layer cannot do this yet.


# search 'News' not yet archived and 'File's not yet expired.
now = context.ZopeTime()
query = Eq('portal_type', 'News')  ~ Le('ArchivalDate', now)
| Eq('portal_type', 'File')  ~ Le('expires', now)
context.Catalog.evalAdvancedQuery(query)


In your example you haven't done a join as I describe above, unless I 
miss something. The essential part is that I want an object with state 
'PUBLISHED' unless there is another object where field 'ID' is the same 
as this object that is with state 'NEW'. The join is in the 'ID' 
matching part.


Regards,

Martijn
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-24 Thread Santi Camps

En/na [EMAIL PROTECTED] ha escrit:


Martijn Faassen wrote:
 


Missing powerful query concepts
---

Certain powerful query concepts like joins, available in a relational
setting, are missing. I've already run into a scenario where I wanted to
someting like this: given a bunch of version objects with field 'id',
where multiple objects can have the same 'id' to indicate they're
versions of the same object, I want all objects where field
'workflow_state' is 'PUBLISHED' unless there is another object with the
same id that have workflow_state 'NEW', in which case I want that one'.

I think joins would be a way to solve it, though I haven't figured out
the details, nor how to implement them efficiently on top of the
catalog. This kind of thing is where a relational database makes life a
lot simpler.
   



I used to have the same complaints in Zope 2, but so far I've been happy
with Dieter's AdvancedQuery product.  See
http://www.dieter.handshake.de/pyprojects/zope/AdvancedQuery.html
It might be worth a look while thinking about what to implement for zope 3.
 

A very interesing discussion.  I'm just and advanced Zope 2 user, and a 
Zope 3 beginner, but those questions are over my head since I start to 
develop using ZODB, so I will put here my opinions.


The use of ZODB and ZCatalogs has a lot of advantatges, that's obvious, 
but also has some limitations.   We also use AdvancedQuery to solve some 
of them, but there are another not solved ones:


1) Join's.There isn't an standard way to make joins, and also to 
make sorts and filters over joined data.   I know that with enought 
python knowledge this can be done manually, but that's hard (at least 
much more hard than using SQL).   Moreover, programmers writing business 
application reports are the ones with less python experience and, 
believe me, some joins can be writted in a very unefficient way :-)


2) Relational Integrity.   I like to make a data definition and rely on 
it, so if I define that an invoice has a restricted foreign key with a 
customer, the customer will not be deleted while the invoice exists.   
Despide this behaviours could (and should) be implemented in the 
application logic, sometimes there are bugs in the application and data 
inconsistencies are generated.   Working with RDBM's you could rely on 
your data definition, so data inconsistencies never will be done.


3) Huge amounts of data.   When working with applications writing a lot 
of data or making queries (in special join's) over huge amount of data, 
the performance of a ZODB database is poor in front of some RDBM's (I'm 
talking about Zope 2.7, I haven't this experience with Z3 yet).


Despide of this, I think the advantatges of use ZODB are more than the 
limitations, so we continue using it.   I think a good approach could be 
a transparent O/R mapping plus a way to use SQL over the RDBM's and get 
objects back.   That way, a pure ZODB application can work with 
FileStorage or with a RDBM's storage, and applications requiring it will 
be able to take advantage of SQL.


I just want to express an opinion from a user that simply uses ZODB and 
ZCatalogs, and that has been using RDBM's for a long time before.


Regards

Santi Camps


___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-24 Thread Michel Pelletier

This is a general reply, Martijn just summed up many of the points so
nicely (as usual) that I'm using his email as a starting point...

 From: Martijn Faassen [EMAIL PROTECTED]
 Subject: Re: [Zope3-dev] Florent's O-R blog entry

 I have had some opportunity to work with the Zope 3 catalog recently, 
 and I have a few comments. First of all, I agree with the main idea that 
 the Zope 3 catalog is not a hack, and is clean and flexible. I believe 
 the catalog should be invested in, as I think it's cool.

agreed.

 Now as to where I see areas where features are lacking in the Zope 3 
 catalog:
 
 Underfeatured query API
 ---
 
 I do think that currently the API to query it is woefully underfeatured.
 
 I've tried to work on this problem and am sitting on some code that just 
 needs a bit of time to polish and release that allows a simple query 
 language on top of the catalog. It's just building up a tree of python 
 objects for queries, nothing special, but it is a lot higher level than 
 what's already there.

This is an important feature that the catalog needs, query logic.  The
argument that Python (without at least a query API) is its query
language no longer holds water for me.  Good query languages should
allow you to define what you are looking for, now how you look for it,
and your code, inspired by something like Dieter's AdvancedQuery, is
absolutely necessary.

But... I think we are missing another layer here.  I pointed this out in
my first reply to Gary's post about Zemantic.  I think a three-tiered
approach needs to be taken to searching, just like it has been taken
with many other aspects of Zope 3.  I think this removes the whole
argument of catalog vs. rdbms for enterprise systems and make the
answer even better.  

1) The model tier contains the searchable sources.  These sources
provide a simple search source interface.  I do not think it's possible
and I do not propose to have a consistent query interface across various
source implementation, but it's certainly possible to have a consistent
source management interface and it's possible for sources to describe
themselves and their searchable content using a common schema language.
Sources can be local components (catalogs, rdbms) or remote (google,
wikipedia, etc)

2) The controller tier provides query logic (agents is a common term
here).  Agents are components that implement a particular query
interface, and know how to query registered searchable sources or other
agents based on that source or agent's description.  Agents take care of
the dirty work (like result merging and joins) and provide a clean
interface to submit a query and retrieve results.  What the query
looks like and which search source is used depends on the agent and its
configuration.  If they know how to, agents can query agents to delegate
their work.

3) The view tier is a simple, high-level interface that the user (in
this case the typical third-party developer) primarily interacts with.
It is used to discover, query and manage sources and agents.  It doesn't
hide so much the underlying complexity as it provides a way to manage
the complexity.

Consider your typical CMF/Plone based application today.  They have a
portal_catalog that is the sole searchable source for the application
framework.  To add another source, one must, in general, hack the
application or the framework (note Archetypes gets around this, to a
degree, but only for catalogs, and only on values that map to an AT
field).  

If a third-party product wanted to plug into the search framework, they
need to either hack the existing portal_catalog, or if that is not
possible (which it very often is not due to name and usage conflicts at
the index level) they must create their own catalog.  But now the two
searchable sources are completely disconnected and unrelated and various
interfaces and things have to be hacked to include both sources in the
application's logic.

-Michel


___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-23 Thread Gary Poster


On Aug 23, 2005, at 1:11 PM, Gary Poster wrote:
FWIW, my concluding sentence would have been better written as  
Meanwhile, deciding that a community project require an O/R back  
end over FileStorage or DirectoryStorage, as Florent argues, feels  
like a significant case of throwing the baby out with the bath  
water.


Argh, communication.  That still could be too-easily misinterpreted,  
and I didn't stare at it long enough before I sent it.  One more try.


Meanwhile, deciding that a community project require any specific  
backend--Ape, FileStorage, DirectoryStorage, or another--feels like a  
mistake.  Discarding FileStorage or DirectoryStorage, as Florent  
argues, is a significant case of throwing the baby out with the bath  
water.  We have at least three maintained and capable ZODB backends,  
with different strengths and weaknesses, appropriate for different  
use cases.  Lets not jump to discard any of them.


Gary
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-23 Thread Gary Poster


On Aug 23, 2005, at 12:56 PM, Shane Hathaway wrote:


Gary Poster wrote:

In conclusion, the nebulous concept of enterprise applications  
on  Zope does not have a clear cut decision for or against an O/R  
mapper  such as Ape.  The cost of O/R mappings is not  
inconsequential, and  the advantages are not conclusive.  I hope  
that large projects that  the Zope community works on together can  
support both, and do not  depend on or exclude their use.  Florent  
makes some excellent  observations, and solutions to the problems  
he identifies could be  done at a number of layers in the code  
base.  Meanwhile, switching  entirely to an O/R back end over  
FileStorage or DirectoryStorage  feels like a significant case of  
throwing the baby out with the bath  water.




I would use this argument to support the idea of transparent ZODB- 
based O/R mapping, which is what Ape does.  With a transparent  
mapper, users can choose their own storage backend.  The baby is  
the application code and the bath water is FileStorage/ 
DirectoryStorage.  Ape keeps the baby 100% intact. ;-)


I strongly disagree that FileStorage/DirectoryStorage is bath  
water--something that has served its purpose, and is discardable.  I  
agree that O/R mapping like Ape provides is a great solution for some  
cases (such as the one you listed, and there are others) and allows  
you to transparently replace back ends if it is (or becomes)  
necessary.  It is an exciting idea and technology, and appropriate  
for some use cases.


FWIW, my concluding sentence would have been better written as  
Meanwhile, deciding that a community project require an O/R back end  
over FileStorage or DirectoryStorage, as Florent argues, feels like a  
significant case of throwing the baby out with the bath water.


Gary
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-23 Thread Shane Hathaway

Gary Poster wrote:


On Aug 23, 2005, at 1:11 PM, Gary Poster wrote:

FWIW, my concluding sentence would have been better written as  
Meanwhile, deciding that a community project require an O/R back  end 
over FileStorage or DirectoryStorage, as Florent argues, feels  like a 
significant case of throwing the baby out with the bath  water.



Argh, communication.  That still could be too-easily misinterpreted,  
and I didn't stare at it long enough before I sent it.  One more try.


Meanwhile, deciding that a community project require any specific  
backend--Ape, FileStorage, DirectoryStorage, or another--feels like a  
mistake.  Discarding FileStorage or DirectoryStorage, as Florent  
argues, is a significant case of throwing the baby out with the bath  
water.  We have at least three maintained and capable ZODB backends,  
with different strengths and weaknesses, appropriate for different  use 
cases.  Lets not jump to discard any of them.


I agree 100%.  However, your concern is that projects will require a 
specific ZODB backend, while my concern is that projects will dump ZODB 
altogether.  I think the latter is the greater risk, and people need a 
middle ground so they don't isolate themselves from the rest of the 
community.  Ape could be a part of that middle ground.


Also, I did not intend to disparage the excellent FileStorage and 
DirectoryStorage packages.  I always tell people to use FileStorage or 
DirectoryStorage unless they have a good reason not to, and the biggest 
reason not to use FileStorage (through-the-web code is hard to put under 
version control) is already disappearing with Zope 3.


Shane
___
Zope3-dev mailing list
Zope3-dev@zope.org
Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com



Re: [Zope3-dev] Florent's O-R blog entry

2005-08-23 Thread Janko Hauser


Am 23.08.2005 um 20:36 schrieb Shane Hathaway:


Gary Poster wrote:


On Aug 23, 2005, at 1:11 PM, Gary Poster wrote:

Argh, communication.  That still could be too-easily  
misinterpreted,  and I didn't stare at it long enough before I  
sent it.  One more try.
Meanwhile, deciding that a community project require any specific   
backend--Ape, FileStorage, DirectoryStorage, or another--feels  
like a  mistake.  Discarding FileStorage or DirectoryStorage, as  
Florent  argues, is a significant case of throwing the baby out  
with the bath  water.  We have at least three maintained and  
capable ZODB backends,  with different strengths and weaknesses,  
appropriate for different  use cases.  Lets not jump to discard  
any of them.




I agree 100%.  However, your concern is that projects will require  
a specific ZODB backend, while my concern is that projects will  
dump ZODB altogether.  I think the latter is the greater risk, and  
people need a middle ground so they don't isolate themselves from  
the rest of the community.  Ape could be a part of that middle ground.


Also, I did not intend to disparage the excellent FileStorage and  
DirectoryStorage packages.  I always tell people to use FileStorage  
or DirectoryStorage unless they have a good reason not to, and the  
biggest reason not to use FileStorage (through-the-web code is hard  
to put under version control) is already disappearing with Zope 3.


This is a good discussion, and I think this will provide a good  
ground for a technical pro/contra view of the storage situation. But  
I think the post from Florent looks at this from a slightly different  
angle. Perhaps I misinterpret it, but his thoughts look at the needs  
for a content repository storage. I do not think he wanted to totally  
replace ZODB for all the other stuff. And assuming he looks at the  
storage question from this point (actually Florent is in holidays at  
the moment) his views are build with some general concerns as  
background.


Let's assume enterprise means big and sellable to corporations,  
then the concerns of potential customers are valid, that valuable  
content is stored in some piece of software, which is only known to a  
small group of developers. Building a content repository as a  
marketable solution on this piece of software needs more convincing  
than to say We have this piece of great software and your content  
ends in your favorite traditional RDBMS.


Ok I will stop to interpret what Florent may have thought, I better  
present my own path of thinking. In the end I'm against a RDBMS as  
the only core part of a Zope CMS repository.


I started with the general idea to have a content repository for  
simple content objects, which are all described by schemas. This  
leads to a rather flat and more structured, nearly homogenous mass of  
objects, compared to the normal objects present in a Zope CMS.
The repository is a layer over potentially many storages. This leads  
fairly easily to the idea to have a backend storage which stores this  
data into a RDBMS. This is the level Florent probably looked at also.  
But I have concerns to many of the other points. At this level the  
RDBMS is really just a storage of attribute mappings. The hole logic,  
for example the relation between different content objects is part of  
the stored data or held in the repository application or some  
registries. I assume that the moment one starts to use the relational  
aspects of the RDBMS the application logic becomes part of the  
storage. This would  need to be adressed in the O-R-mapper, which  
would mean that also the O-R-mapper becomes part of the application  
logic. There are further proposed benefits of an RDBMS-storage like  
indexing, direct searching, report generation which are all  
reflecting back in the application domain, which would lead in the  
end to the situation that one would circumvent the O-R-mapper for  
complex or special tasks and starts to work directly on the data.  
This in the end is bad from my point of view and greatly raises the  
complexity. It would also mean a big development effort to recreate,  
overshadow and map current functionality given us by Zope  for nearly  
free.


There are many valid points where the ZODB has some shortcomings.  
Blob support for example will be much better, although it will not be  
totally solved by just storing blobs on the filesystem. Which leads  
to my last point. From a solution point of view there are many  
hacks or individual adaptions involved to have a big scalable site. I  
think we should look for some of these to be better, means more  
standardly incorporated into the z3ecms toolbox. Just for example,  
the answer to time consuming cataloging for cases with many writes is  
to use the queued catalog product. But integrating it into a system  
is a hand job, needs a developer who knows how to do it, where to  
fiddle to integrate it right. Such technically already present