Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Steve Schmechel
Hi Wickert,

I can't really comment on whether this particular instance is a bug, but
I would definitely agree with Chris on the benefits of coming up with
your own deterministic rules and enforcing them.

Most of my work has been with various SQL databases and "null handling "
always ends up being problematic, even if you can perform operations with
such values included in the operation.

It also often results in almost comical statements like, "I *want* the
this date column to be sorted and put all the ones where we don't know
the date at the top *OR* the bottom - I don't care which."

The fact that most SQL databases will handle ordering nullable columns
at all is based on an important assumption made in the standard: that
although two null values are "not equal" to one another, they are also
"not distinct".

http://en.wikipedia.org/wiki/Null_(SQL)#Grouping_and_sorting

This implies that all the entries that you have where the publication
date is unknown, all occurred at the same time.  While useful for
grouping items, it is almost always a clear, factual error about the
data itself.  (In your case, it probably means it wasn't published
yet.)

When this assumption is not clear, or later gets lost in the output,
people tend to make bad decisions using the "implied sort order".
Example: "Delete all the entries before a particular publication date."
What does that really imply?  What will the result be?  Will all the
unpublished ones get deleted also?

There is not one particular "right" way to handle all the situations, so
you have to think through each one.  Maybe you could combine the use of
other data (creation date, etc.) into the sorting to get a better
chronology.  Or, maybe it would be better to always keep the two groups
of records distinct: a list of published ones sorted by publication
date, and a list of unpublished ones sorted by something else.

Sorry if this response is a bit off-topic, but I just wanted to offer
some advice on a topic that has bitten me more than enough times.


Steve


--- On Fri, 1/14/11, Wichert Akkerman  wrote:

> From: Wichert Akkerman 
> Subject: Re: [Repoze-dev] catalog oddness
> To: repoze-dev@lists.repoze.org
> Date: Friday, January 14, 2011, 9:21 AM
> On 1/14/11 15:04 , Chris Rossi
> wrote:
> > On Fri, Jan 14, 2011 at 5:20 AM, Wichert Akkerman
>  > <mailto:wich...@wiggy.net>>
> wrote:
> > 
> >     This may already be different
> in the trunk of repoze.catalog, but I
> >     just stumbled over this: when
> you do a catalog search and ask it to
> >     order by an empty index you
> get an empty result set. I was expecting
> >     the result to be an unordered
> result for that situation. Is this
> >     expected behaviour?
> > 
> > 
> > Hi Wichert,
> > 
> > I haven't observed this behavior, but it seems like an
> undefined case,
> > to me.  I'm not sure what I would expect it to do
> in such a situation.
> >   Supposing you have a set of documents
> you want sorted by an index and
> > the index contains only a subset of those
> documents?  It seems to me the
> > case is undefined--I would have a tendency to raise an
> exception,
> > personally.
> 
> I was expecting a missing index value to be treated as None
> (or NULL in SQL terms) and the related items to appear
> either first or last. Raising an exception is undesirable:
> there are valid situations where an object might have a None
> value for an indexed attribute and that should not lead to
> exceptions when doing catalog queries.
> 
> > I think in the interest of a well defined determinism
> I would suggest
> > that if you are using an index to sort, you should
> make sure that the
> > discriminator for that index be able to return some
> value for any
> > document.  This way even if logically, to you,
> the document doesn't
> > really have a value for that index, you can at least
> be deterministic
> > about how it will be sorted.
> 
> The object did have a value, but it was None which the
> indexed apparently ignores. The fact that it was always None
> was a bug in my code that has been fixed now - it should be
> either None or a date (it was a publication-date field).
> 
> Wichert.
> 
> ___
> Repoze-dev mailing list
> Repoze-dev@lists.repoze.org
> http://lists.repoze.org/listinfo/repoze-dev
> 


  
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Chris Rossi
On Fri, Jan 14, 2011 at 10:21 AM, Wichert Akkerman wrote:
>
>
> The object did have a value, but it was None which the indexed apparently
> ignores. The fact that it was always None was a bug in my code that has been
> fixed now - it should be either None or a date (it was a publication-date
> field).
>
>
> Ah, having a value of None is different than not having a value.  Ie,

def discriminator(obj, default):
"""Does not have a value."""
return default

def discriminator(obj, default):
"""Has a value of None."""
return None

You're sure it's the second case you're hitting?  It wouldn't be an "empty"
index, but it would be one where every document has a value of None.  If
you're seeing an empty result set in that case, then that sounds like a bug.

Chris
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Wichert Akkerman

On 1/14/11 15:04 , Chris Rossi wrote:

On Fri, Jan 14, 2011 at 5:20 AM, Wichert Akkerman mailto:wich...@wiggy.net>> wrote:

This may already be different in the trunk of repoze.catalog, but I
just stumbled over this: when you do a catalog search and ask it to
order by an empty index you get an empty result set. I was expecting
the result to be an unordered result for that situation. Is this
expected behaviour?


Hi Wichert,

I haven't observed this behavior, but it seems like an undefined case,
to me.  I'm not sure what I would expect it to do in such a situation.
  Supposing you have a set of documents you want sorted by an index and
the index contains only a subset of those documents?  It seems to me the
case is undefined--I would have a tendency to raise an exception,
personally.


I was expecting a missing index value to be treated as None (or NULL in 
SQL terms) and the related items to appear either first or last. Raising 
an exception is undesirable: there are valid situations where an object 
might have a None value for an indexed attribute and that should not 
lead to exceptions when doing catalog queries.



I think in the interest of a well defined determinism I would suggest
that if you are using an index to sort, you should make sure that the
discriminator for that index be able to return some value for any
document.  This way even if logically, to you, the document doesn't
really have a value for that index, you can at least be deterministic
about how it will be sorted.


The object did have a value, but it was None which the indexed 
apparently ignores. The fact that it was always None was a bug in my 
code that has been fixed now - it should be either None or a date (it 
was a publication-date field).


Wichert.

___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


Re: [Repoze-dev] catalog oddness

2011-01-14 Thread Chris Rossi
On Fri, Jan 14, 2011 at 5:20 AM, Wichert Akkerman  wrote:

> This may already be different in the trunk of repoze.catalog, but I just
> stumbled over this: when you do a catalog search and ask it to order by an
> empty index you get an empty result set. I was expecting the result to be an
> unordered result for that situation. Is this expected behaviour?
>
>
Hi Wichert,

I haven't observed this behavior, but it seems like an undefined case, to
me.  I'm not sure what I would expect it to do in such a situation.
 Supposing you have a set of documents you want sorted by an index and the
index contains only a subset of those documents?  It seems to me the case is
undefined--I would have a tendency to raise an exception, personally.

I think in the interest of a well defined determinism I would suggest that
if you are using an index to sort, you should make sure that the
discriminator for that index be able to return some value for any document.
 This way even if logically, to you, the document doesn't really have a
value for that index, you can at least be deterministic about how it will be
sorted.

Chris
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev


[Repoze-dev] catalog oddness

2011-01-14 Thread Wichert Akkerman
This may already be different in the trunk of repoze.catalog, but I just 
stumbled over this: when you do a catalog search and ask it to order by 
an empty index you get an empty result set. I was expecting the result 
to be an unordered result for that situation. Is this expected behaviour?


Wichert.
___
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev