[Repoze-dev] catalog oddness

2011-01-14 Thread Wichert Akkerman
This may already be different in the trunk of repoze.catalog, but I just 
stumbled over this: when you do a catalog search and ask it to order by 
an empty index you get an empty result set. I was expecting the result 
to be an unordered result for that situation. Is this expected behaviour?


Wichert.
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Chris Rossi
On Fri, Jan 14, 2011 at 5:20 AM, Wichert Akkerman wich...@wiggy.net wrote:

 This may already be different in the trunk of repoze.catalog, but I just
 stumbled over this: when you do a catalog search and ask it to order by an
 empty index you get an empty result set. I was expecting the result to be an
 unordered result for that situation. Is this expected behaviour?


Hi Wichert,

I haven't observed this behavior, but it seems like an undefined case, to
me.  I'm not sure what I would expect it to do in such a situation.
 Supposing you have a set of documents you want sorted by an index and the
index contains only a subset of those documents?  It seems to me the case is
undefined--I would have a tendency to raise an exception, personally.

I think in the interest of a well defined determinism I would suggest that
if you are using an index to sort, you should make sure that the
discriminator for that index be able to return some value for any document.
 This way even if logically, to you, the document doesn't really have a
value for that index, you can at least be deterministic about how it will be
sorted.

Chris
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Wichert Akkerman

On 1/14/11 15:04 , Chris Rossi wrote:

On Fri, Jan 14, 2011 at 5:20 AM, Wichert Akkerman wich...@wiggy.net
mailto:wich...@wiggy.net wrote:

This may already be different in the trunk of repoze.catalog, but I
just stumbled over this: when you do a catalog search and ask it to
order by an empty index you get an empty result set. I was expecting
the result to be an unordered result for that situation. Is this
expected behaviour?


Hi Wichert,

I haven't observed this behavior, but it seems like an undefined case,
to me.  I'm not sure what I would expect it to do in such a situation.
  Supposing you have a set of documents you want sorted by an index and
the index contains only a subset of those documents?  It seems to me the
case is undefined--I would have a tendency to raise an exception,
personally.


I was expecting a missing index value to be treated as None (or NULL in 
SQL terms) and the related items to appear either first or last. Raising 
an exception is undesirable: there are valid situations where an object 
might have a None value for an indexed attribute and that should not 
lead to exceptions when doing catalog queries.



I think in the interest of a well defined determinism I would suggest
that if you are using an index to sort, you should make sure that the
discriminator for that index be able to return some value for any
document.  This way even if logically, to you, the document doesn't
really have a value for that index, you can at least be deterministic
about how it will be sorted.


The object did have a value, but it was None which the indexed 
apparently ignores. The fact that it was always None was a bug in my 
code that has been fixed now - it should be either None or a date (it 
was a publication-date field).


Wichert.

___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Chris Rossi
On Fri, Jan 14, 2011 at 10:21 AM, Wichert Akkerman wich...@wiggy.netwrote:


 The object did have a value, but it was None which the indexed apparently
 ignores. The fact that it was always None was a bug in my code that has been
 fixed now - it should be either None or a date (it was a publication-date
 field).


 Ah, having a value of None is different than not having a value.  Ie,

def discriminator(obj, default):
Does not have a value.
return default

def discriminator(obj, default):
Has a value of None.
return None

You're sure it's the second case you're hitting?  It wouldn't be an empty
index, but it would be one where every document has a value of None.  If
you're seeing an empty result set in that case, then that sounds like a bug.

Chris
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Steve Schmechel
Hi Wickert,

I can't really comment on whether this particular instance is a bug, but
I would definitely agree with Chris on the benefits of coming up with
your own deterministic rules and enforcing them.

Most of my work has been with various SQL databases and null handling 
always ends up being problematic, even if you can perform operations with
such values included in the operation.

It also often results in almost comical statements like, I *want* the
this date column to be sorted and put all the ones where we don't know
the date at the top *OR* the bottom - I don't care which.

The fact that most SQL databases will handle ordering nullable columns
at all is based on an important assumption made in the standard: that
although two null values are not equal to one another, they are also
not distinct.

http://en.wikipedia.org/wiki/Null_(SQL)#Grouping_and_sorting

This implies that all the entries that you have where the publication
date is unknown, all occurred at the same time.  While useful for
grouping items, it is almost always a clear, factual error about the
data itself.  (In your case, it probably means it wasn't published
yet.)

When this assumption is not clear, or later gets lost in the output,
people tend to make bad decisions using the implied sort order.
Example: Delete all the entries before a particular publication date.
What does that really imply?  What will the result be?  Will all the
unpublished ones get deleted also?

There is not one particular right way to handle all the situations, so
you have to think through each one.  Maybe you could combine the use of
other data (creation date, etc.) into the sorting to get a better
chronology.  Or, maybe it would be better to always keep the two groups
of records distinct: a list of published ones sorted by publication
date, and a list of unpublished ones sorted by something else.

Sorry if this response is a bit off-topic, but I just wanted to offer
some advice on a topic that has bitten me more than enough times.


Steve


--- On Fri, 1/14/11, Wichert Akkerman wich...@wiggy.net wrote:

 From: Wichert Akkerman wich...@wiggy.net
 Subject: Re: [Repoze-dev] catalog oddness
 To: repoze-dev@lists.repoze.org
 Date: Friday, January 14, 2011, 9:21 AM
 On 1/14/11 15:04 , Chris Rossi
 wrote:
  On Fri, Jan 14, 2011 at 5:20 AM, Wichert Akkerman
 wich...@wiggy.net
  mailto:wich...@wiggy.net
 wrote:
  
      This may already be different
 in the trunk of repoze.catalog, but I
      just stumbled over this: when
 you do a catalog search and ask it to
      order by an empty index you
 get an empty result set. I was expecting
      the result to be an unordered
 result for that situation. Is this
      expected behaviour?
  
  
  Hi Wichert,
  
  I haven't observed this behavior, but it seems like an
 undefined case,
  to me.  I'm not sure what I would expect it to do
 in such a situation.
    Supposing you have a set of documents
 you want sorted by an index and
  the index contains only a subset of those
 documents?  It seems to me the
  case is undefined--I would have a tendency to raise an
 exception,
  personally.
 
 I was expecting a missing index value to be treated as None
 (or NULL in SQL terms) and the related items to appear
 either first or last. Raising an exception is undesirable:
 there are valid situations where an object might have a None
 value for an indexed attribute and that should not lead to
 exceptions when doing catalog queries.
 
  I think in the interest of a well defined determinism
 I would suggest
  that if you are using an index to sort, you should
 make sure that the
  discriminator for that index be able to return some
 value for any
  document.  This way even if logically, to you,
 the document doesn't
  really have a value for that index, you can at least
 be deterministic
  about how it will be sorted.
 
 The object did have a value, but it was None which the
 indexed apparently ignores. The fact that it was always None
 was a bug in my code that has been fixed now - it should be
 either None or a date (it was a publication-date field).
 
 Wichert.
 
 ___
 Repoze-dev mailing list
 Repoze-dev@lists.repoze.org
 http://lists.repoze.org/listinfo/repoze-dev
 


  
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev