-----BEGIN PGP SIGNED MESSAGE-----
Michel Pelletier wrote:
> On Tue, 2005-08-23 at 12:49 -0400, Gary Poster wrote:
>>Michel (and anyone else with experience with RDFLib on the list), I
>>recently looked at RDFLib (http://rdflib.net/) and came away (after
>>an hour or so) with a good first impression.
> Great. I've cc:ed Dan Krech, the lead rdflib developer on this mail.
> For his benefit I might explain things that you obviously know.
>>My biggest disappointment was that, from the perspective of a Zope 3
>>developer, using it alongside other Zope 3 indexes (and other intid-
>>based data structures) meant that I would have to externally convert
>>to and from RDF in order to merge results and convert the RDF URIs to
> Correct. A specific and important optimization in Zope-style cataloging
> is that objects have a cheap unique integer to reduce catalog footprint
> and significantly improve result merging and joining. These intergers
> are exposed as a utility component in Zope.
>> It would be much more efficient if I could have an RDF
>>resource class that represented an intid, and even more efficient if
>>I could get IFBTrees back directly from searches that somehow
>>included the intids.
> Yes, this is a problem that needs to be solved, and your suggestion is
> one way to solve it. I've discussed this a few time with Florent at the
> paris and EUpy sprints and he had a similar suggestion.
> I'm uncomfortable with it for a few reasons, 1) because intids are such
> a Zope-catalog-optimization specific thing. I know why they are
> exposed, so that catalog results can be efficiently merged, but they
> don't have anything to do with RDF, so 2) rdflib can't really change its
> interface to accomodate them. Also, 3) they are backend specific, for
> example rdflib has a URI -> integer mapping for its in-meomory and ZODB
> backends to reduce footprint, but a sql backend would need no such
> integer, you would in fact have to *add* a column to hold that value
> just so the data would merge efficiently with a catalog. This seems
> antithetical to Zope 3's philosophy in general as it violates the
> concept of not requiring third party libs and data to change themselves
> significantly just to work with Zope. Of course, this isn't a problem
> of the catalog, it's a problem in general merging search results from
Note that RDBMS-based applicattions will *already* impose such a
requirement, from the moment that you want to join results from the RDF
query to those from any other tables: every non-toy RDBMS in existence
has a "preferred primary key" type, which is an integer, for precisely
the same reasons (to allow efficent joins).
RDBMS best practices insist that "normal" tables have a primary key of
that type, whose value is supposed to remain invisible (or at least
opaque) to humans.
If we want to allow for scalable use of rdflib, I would guess we need to
"promote" the integer ID from "implementation detail" to a first-class
> I'd like to make the optimization available so that searches on a graph
> can be efficiently merged with searches on a catalog, but I don't think
> it can be done by pushing intids down into rdflib, or for that matter
> any other third party component you want to play with the catalog
> efficiently. Perhaps instead of pushing the integers down we could push
> URIs up, Zope's cataloging could grown another layer of indirection on
> top of intids and provide a URI utility that maps to intids. Of course
> you might object to that for the same reasons I'm objecting to this. ;)
> But at least URIs are a well known standard.
They are know, but they are an *infeasible* join key (not only are they
strings, but as arbitrary-length strings with common prefixes, their
sorting semantics are almost worst-case for many join algorithms.)
>>Have you thought about that use case? If one used a variation of
>>your back end that assigned intids to non-intid-based resources like
>>URIs and Literals and stored the relationships via intids,
> One doesn't need a variation, this is exactly the way the in-memory and
> ZODB backends work now as an optimization. But they are internal
> details of the implementation of those backends.
As I argue above, I believe this to be a false encapsulation.
>>store the data as IFBTrees and offer up an API to get "raw" IFBTree
>>results. Any obvious ways that would be a problem? Does it feel
>>reasonable to you? Any suggestions?
> Well not any good ones yet, although I know it's an important problem.
> I'll have to think about it a bit more. Do you understand my
> objections? Does anyone else have any suggestions out there? This is
> probably worth solving in the general case, since it's going to come up
> anytime you're going to want to merge catalog results with anything.
>>I'm generally interested in RDFLib, your use of it, and your hopes
>>for it, if you feel like holding forth. :-)
> Great! And I didn't even have to feed you any kool aid or buy you a
> bottle of aquavit. ;)
Now if I only *liked* carawy-in-a-bottle. ;)
Tres Seaver +1 202-558-7113 [EMAIL PROTECTED]
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
Zope3-dev mailing list