RE: [Zope-dev] Hey Chris, question for you
> I think it has changed for FieldIndexes. Yes, from UnKeywordIndex.py newKeywords = getattr(obj, self.id, ()) > You can now make > the distinction > between "doesnt have that attribute" and "attribute is one of > [None, '', [], > ()]" within a Field Index. Reviewing UnKeywordIndex.py, I dont see what an object should do to mean "doesnt have that attribute" "dont include me in this FieldIndex". Any suggestions? Thanks for your time, ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
I think it has changed for FieldIndexes. You can now make the distinction between "doesnt have that attribute" and "attribute is one of [None, '', [], ()]" within a Field Index. You do this in an almost natural way, the major exception being that you need to wrap a blank string ('') in a sequence in the query (e.g. title=['']) due to hysterical behavior. I'm not sure about Text Indexes. - Original Message - From: "Toby Dickenson" <[EMAIL PROTECTED]> To: "Michel Pelletier" <[EMAIL PROTECTED]> Cc: "Casey Duncan" <[EMAIL PROTECTED]>; "Chris McDonough" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, June 27, 2001 11:47 AM Subject: Re: [Zope-dev] Hey Chris, question for you On Tue, 26 Jun 2001 15:42:40 -0700 (PDT), Michel Pelletier <[EMAIL PROTECTED]> wrote: >Hmm the reason for the current behavior was optimization by saving space >not indexing empty values. I was always very pleased with that characteristic, but I had not realised it was a design goal. I thought I observed that characteristic had changed in a recent Zope release... hmmm, Ill take a look. Toby Dickenson [EMAIL PROTECTED] ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
On Tue, 26 Jun 2001 15:42:40 -0700 (PDT), Michel Pelletier <[EMAIL PROTECTED]> wrote: >Hmm the reason for the current behavior was optimization by saving space >not indexing empty values. I was always very pleased with that characteristic, but I had not realised it was a design goal. I thought I observed that characteristic had changed in a recent Zope release... hmmm, Ill take a look. Toby Dickenson [EMAIL PROTECTED] ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
Chris McDonough wrote: > > Hi casey, > > Changes were recently made to Field/Keyword Indexes so that they will > store empty items. An equivalent change could be made to TextIndexes... > we'd need to think about that a bit. > > But for your purposes, you might want to start out attempting to write > your operator implementation using Field and Keyword indexes... > > - C > > Michel Pelletier wrote: > > > > > > Hmm the reason for the current behavior was optimization by saving space > > not indexing empty values. The problem with your latter aproach is that > > "all objects in the catalog" may include object that don't have a title > > attribute at all. > > > > I'm not against indexing empty values though. > > > > -Michel > > My implementation does not modify the behavior of the indexes in any way, and I would like to keep it that way if possible. I have been able to (thus far) pull this off without compromises, which was my hope in the beginning. I guess the question here is given the query: spam != 'eggs' Should objects be returned that do not have an attribute "spam" at all. For the behavior to be intuitive, I would say yes, but that is just my opinion. I also though of an optimization that could eventually be included if this behavior is adopted. for example, take the following query expression: title == 'foo' and spam != 'eggs' As implemented, my query engine does the following: 1. Find items where title matches 'foo' (exact behavior depends on index type) 2. Find items where spam matches 'eggs' 3. Take the difference of all items in the index spam and the result of #2 4. Return the intersection of #3 and #1 To be "intuitive" (I use that term loosely) I think it should be: 1. Find items where title matches 'foo' 2. Find items where spam matches 'eggs' 3. Take the difference of all items in the catalog and the result of #2 4. Return the intersection of #3 and #1 Which can be optimized as: 1. Find items where title matches 'foo' 2. Find items where spam matches 'eggs' 3. Return the difference #1 and #2 If an "or" is used in place of the "and", then the optimization doesn't apply though. One other thing: I noticed (with a colleague) that passing a list of values to a FieldIndex and a TextIndex results in nearly opposite behavior. The fieldIndex does a union on the results of querying against each item in the list whereas TextIndex does an intersection. This seemed highly inconsistent to me, another thread perhaps... -- | Casey Duncan | Kaivo, Inc. | [EMAIL PROTECTED] `--> ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
Hi casey, Changes were recently made to Field/Keyword Indexes so that they will store empty items. An equivalent change could be made to TextIndexes... we'd need to think about that a bit. But for your purposes, you might want to start out attempting to write your operator implementation using Field and Keyword indexes... - C Michel Pelletier wrote: > > On Tue, 26 Jun 2001, Casey Duncan wrote: > > > Ok, I was able to get it to work by instantiating a IISet around > > _unindex.keys() and passing that to difference (Thanks!), however, I > > notice an interesting side effect. Let's say you have a TextIndex on > > title and you do the following query: > > > > title != 'foo*' > > > > Which to me means: "all cataloged objects whose title do not match the > > substring 'foo*'" > > > > However, this is not what you get exactly, instead you get: > > > > "all cataloged objects that have a non-empty title that does not match > > the substring 'foo*'" > > > > Because from what I am seeing, objects with empty (or no) titles are not > > included in the index *at all*. So the set of "all objects" does not > > include ones without titles. I could fix this by making all objects be > > instead "All objects in the catalog" (via catalog.data.keys()) instead > > of "all objects in the index", but I wanted to see if anyone had > > additional thoughts about this. > > Hmm the reason for the current behavior was optimization by saving space > not indexing empty values. The problem with your latter aproach is that > "all objects in the catalog" may include object that don't have a title > attribute at all. > > I'm not against indexing empty values though. > > -Michel > > ___ > Zope-Dev maillist - [EMAIL PROTECTED] > http://lists.zope.org/mailman/listinfo/zope-dev > ** No cross posts or HTML encoding! ** > (Related lists - > http://lists.zope.org/mailman/listinfo/zope-announce > http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
On Tue, 26 Jun 2001, Casey Duncan wrote: > Ok, I was able to get it to work by instantiating a IISet around > _unindex.keys() and passing that to difference (Thanks!), however, I > notice an interesting side effect. Let's say you have a TextIndex on > title and you do the following query: > > title != 'foo*' > > Which to me means: "all cataloged objects whose title do not match the > substring 'foo*'" > > However, this is not what you get exactly, instead you get: > > "all cataloged objects that have a non-empty title that does not match > the substring 'foo*'" > > Because from what I am seeing, objects with empty (or no) titles are not > included in the index *at all*. So the set of "all objects" does not > include ones without titles. I could fix this by making all objects be > instead "All objects in the catalog" (via catalog.data.keys()) instead > of "all objects in the index", but I wanted to see if anyone had > additional thoughts about this. Hmm the reason for the current behavior was optimization by saving space not indexing empty values. The problem with your latter aproach is that "all objects in the catalog" may include object that don't have a title attribute at all. I'm not against indexing empty values though. -Michel ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
Chris McDonough wrote: > > > Chris: > > > > I am working on getting a decent query language for ZCatalog/Catalog and > > Very cool... > > > I have been able to make good progress, however I am running into a bit > > of an issue that I thought you might know something about: > > > > In order to implement a "!=" query operator, I am trying to do the > > following: > > Tricky. > Ok, I was able to get it to work by instantiating a IISet around _unindex.keys() and passing that to difference (Thanks!), however, I notice an interesting side effect. Let's say you have a TextIndex on title and you do the following query: title != 'foo*' Which to me means: "all cataloged objects whose title do not match the substring 'foo*'" However, this is not what you get exactly, instead you get: "all cataloged objects that have a non-empty title that does not match the substring 'foo*'" Because from what I am seeing, objects with empty (or no) titles are not included in the index *at all*. So the set of "all objects" does not include ones without titles. I could fix this by making all objects be instead "All objects in the catalog" (via catalog.data.keys()) instead of "all objects in the index", but I wanted to see if anyone had additional thoughts about this. -- | Casey Duncan | Kaivo, Inc. | [EMAIL PROTECTED] `--> ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Hey Chris, question for you
> Chris: > > I am working on getting a decent query language for ZCatalog/Catalog and Very cool... > I have been able to make good progress, however I am running into a bit > of an issue that I thought you might know something about: > > In order to implement a "!=" query operator, I am trying to do the > following: Tricky. > >From the index, return the result set that match the value (easy) > Subtract that from the set of all items in the index (not so easy) > > I see that there is the difference method available from IIBTree, > however I seem to be unable to use it on the entire index (Which is an > OOBTree and not really a set I guess). Here is a snippit of my code > which doesn't work: > > if op == '!=' or op[:3] == 'not': > w, rs = difference(index._index, rs) # XXX Not a warm fuzzy... > > (where rs is the index result set that matches the value and index is > the Catalog index OOBTree) > > What can I supply for the first argument to get a set of all items in > the index, or is there any easier and better approach to this whole > issue? Well.. I assume that _index is the forward data structure of a FieldIndex. In this case, you could get the info you want (a list of all document ids in the index) from _unindex.keys(), as _index and _unindex are mirror images of each other that need to be kept in sync... I think what comes back is a BTreeItems object. I think this is usable in conjunction with the resultset IISet (also a list of document ids) via the difference function... I haven't tried it, though... > BTW: I realize I could step though _index.items() and create an IISet > but that seems awful inefficient... Yeah, that'd be terrible. This is a tricky operator. I can't really wrap my head around using it in conjunction with parens. Then again, maybe you wouldn't... HTH, - C ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )