-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michel Pelletier wrote: > On Tue, 2005-08-23 at 12:49 -0400, Gary Poster wrote: > >>Michel (and anyone else with experience with RDFLib on the list), I >>recently looked at RDFLib (http://rdflib.net/) and came away (after >>an hour or so) with a good first impression. > > > Great. I've cc:ed Dan Krech, the lead rdflib developer on this mail. > For his benefit I might explain things that you obviously know. > > >>My biggest disappointment was that, from the perspective of a Zope 3 >>developer, using it alongside other Zope 3 indexes (and other intid- >>based data structures) meant that I would have to externally convert >>to and from RDF in order to merge results and convert the RDF URIs to >>objects. > > > Correct. A specific and important optimization in Zope-style cataloging > is that objects have a cheap unique integer to reduce catalog footprint > and significantly improve result merging and joining. These intergers > are exposed as a utility component in Zope. > > >> It would be much more efficient if I could have an RDF >>resource class that represented an intid, and even more efficient if >>I could get IFBTrees back directly from searches that somehow >>included the intids. > > > Yes, this is a problem that needs to be solved, and your suggestion is > one way to solve it. I've discussed this a few time with Florent at the > paris and EUpy sprints and he had a similar suggestion. > > I'm uncomfortable with it for a few reasons, 1) because intids are such > a Zope-catalog-optimization specific thing. I know why they are > exposed, so that catalog results can be efficiently merged, but they > don't have anything to do with RDF, so 2) rdflib can't really change its > interface to accomodate them. Also, 3) they are backend specific, for > example rdflib has a URI -> integer mapping for its in-meomory and ZODB > backends to reduce footprint, but a sql backend would need no such > integer, you would in fact have to *add* a column to hold that value > just so the data would merge efficiently with a catalog. This seems > antithetical to Zope 3's philosophy in general as it violates the > concept of not requiring third party libs and data to change themselves > significantly just to work with Zope. Of course, this isn't a problem > of the catalog, it's a problem in general merging search results from > anywhere.
Note that RDBMS-based applicattions will *already* impose such a requirement, from the moment that you want to join results from the RDF query to those from any other tables: every non-toy RDBMS in existence has a "preferred primary key" type, which is an integer, for precisely the same reasons (to allow efficent joins). RDBMS best practices insist that "normal" tables have a primary key of that type, whose value is supposed to remain invisible (or at least opaque) to humans. If we want to allow for scalable use of rdflib, I would guess we need to "promote" the integer ID from "implementation detail" to a first-class API citizen. > I'd like to make the optimization available so that searches on a graph > can be efficiently merged with searches on a catalog, but I don't think > it can be done by pushing intids down into rdflib, or for that matter > any other third party component you want to play with the catalog > efficiently. Perhaps instead of pushing the integers down we could push > URIs up, Zope's cataloging could grown another layer of indirection on > top of intids and provide a URI utility that maps to intids. Of course > you might object to that for the same reasons I'm objecting to this. ;) > But at least URIs are a well known standard. They are know, but they are an *infeasible* join key (not only are they strings, but as arbitrary-length strings with common prefixes, their sorting semantics are almost worst-case for many join algorithms.) <snip> >>Have you thought about that use case? If one used a variation of >>your back end that assigned intids to non-intid-based resources like >>URIs and Literals and stored the relationships via intids, > > > One doesn't need a variation, this is exactly the way the in-memory and > ZODB backends work now as an optimization. But they are internal > details of the implementation of those backends. As I argue above, I believe this to be a false encapsulation. >>you could >>store the data as IFBTrees and offer up an API to get "raw" IFBTree >>results. Any obvious ways that would be a problem? Does it feel >>reasonable to you? Any suggestions? > > > Well not any good ones yet, although I know it's an important problem. > I'll have to think about it a bit more. Do you understand my > objections? Does anyone else have any suggestions out there? This is > probably worth solving in the general case, since it's going to come up > anytime you're going to want to merge catalog results with anything. > > >>I'm generally interested in RDFLib, your use of it, and your hopes >>for it, if you feel like holding forth. :-) > > > Great! And I didn't even have to feed you any kool aid or buy you a > bottle of aquavit. ;) Now if I only *liked* carawy-in-a-bottle. ;) Tres. - -- =================================================================== Tres Seaver +1 202-558-7113 [EMAIL PROTECTED] Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDC4aB+gerLs4ltQ4RAmiaAJ9OLuM1D73UZF8pMiKMffO64mtKhwCghOFK swFsBJESA0h7CCTCFOi9AXw= =2SZA -----END PGP SIGNATURE----- _______________________________________________ Zope3-dev mailing list Zope3email@example.com Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com