RE: Yet another __ne not equal discussion

2011-10-27 Thread Kääriäinen Anssi
"""
The exclude() option in its current form is unworkable on multi-valued
relations. I'd like to repeat that for emphasis: exclude() can *never*
obsolete direct negative lookups for multi-value relations.
"""

I do see a problem here: the equality ~Q(a=1) <-> Q(a__lt=1)|Q(a__gt=1) is
not correct in m2m situations: the first is asking for rows where a must not
be 1, most of all, if there is no a, it is a match. The other is asking for rows
where there must be an A value, and it must be < 1 or > 1, most of all, if
there is no value at all, it is NOT a match. So: 
filter(~Q(a=1), a__isnull=False) <-> Q(a__lt=1)|Q(a__gt=1).
The ORM is not able to handle the first version correctly. The interpretation
would be that there is at least one 'a' row, and its value is not 1.

I am strongly against the idea that Q(a__neq=1) would have different
interpretation of ~Q(a__eq=1). If they would have different interpretation, then
there would be basis for negative lookups. Although AFAICS you could still get
the same results using ~Q(a__eq=1, a__isnull=False) so the API would still work
without negative lookups.

I am basing the following discussion on the assumption that a__neq and ~a__eq
should be the same thing.

>From ORM API standpoint, the claim that .exclude() can never obsolete
direct negative lookups is wrong as far as I understand the problem.
Reason:

.filter(Q(__neq=val)) <-> .filter(~Q(__exact=val)) <-> 
.exclude(Q(__exact=val))

Another way to see this is that Django should return same results for
the queries:
filter(~Q(employment__school__site_name='RAE'), employment__end_date=None)
and
filter(employment__school__site_name__neq='RAE', employment__end_date=None)

However, I do not think your issue is due to the above equality between
the two different ways of writing ~__eq problem, it is due to a bug in ORM
implementation. The second filter condition is not pushed down to the
subquery generated by the negated Q condition, and thus it generates
another join and potentially targets different rows. I think this is the main
problem in your situation. This is reinforce by this snippet from your query:
WHERE (
NOT 
`data_staff`.`id`
IN ( subquery data_employment U1)
-- different data_employment reference from the subquery
AND `data_employment`.`end_date` IS NULL
)
That is, you have the data_employment table two times in the query, and
thus the filters are targeting potentially different rows. Note that this is a
bug. The conditions are in the same .filter() call, and thus they should target
the same row!

IMHO There are two underlying problems in the ORM related to this matter,
one is detecting when to use a subquery for the filter condition. The logic
for that is easily fooled. Another problem is that if you do a subquery,
other conditions that should go into the subquery WHERE are sometimes
not correctly pushed down to the subquery clause. This is similar to HAVING
clause pushdown problem.

I must say the m2m handling is very complicated, it took some time to see
the ~__eq=1 <-> __lt=1|__gt=1 difference for example... Thus, it is likely
that I am missing something else, too.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Yet another __ne not equal discussion

2011-10-27 Thread Adam Moore
Opps! ...
>     ).extra(where=['`data_school`.`site_name` != RAE']

should be
).extra(where=['`data_school`.`site_name` != %s'], params=['RAE']

which is case-in-point for why helping me avoid extra() is a good
thing!

~Adam sM

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Yet another __ne not equal discussion

2011-10-27 Thread Adam Moore


On Oct 27, 6:26 am, Kääriäinen Anssi  wrote:
>
> Adam Moore wrote:
> > It's also worth noting that Q() objects permit the unary negation
> > operator, but this also yields the undesired results of the exclude()
> > call:
> > Blog.objects.filter(~Q(entry__author_count=2),
> > entry__tag__name='django')
>
> As far as I understand, this is exactly the query you want. The filters
> are treated as single call, that is, they should target the same row,
> not possibly different rows of the multijoin.
>

I'm back at work now, so I can show the real world examples:

I've got a school system database with schools, students, staff,
classes, etc. Students have a many-to-many relationship with schools
through attendance relations. Attendance relations themselves have
begin_date and end_date attributes, so the database can track
everywhere the students have been over time. As a rule of thumb, any
given student should only have one attendance relation at any given
time whose end_date is None. Staff have a very similar relationship
with schools but through an employment relation. Even if they
physically move between multiple schools, they can only be in payroll
at one.

All of this is with django fresh from svn this morning:

Here's the total number of Staff in the database:
>>> Staff.objects.all().count()
3019

^This is all staff that have ever been in our district. Here is
current, active staff only:
>>> Staff.objects.filter(employment__end_date=None).count()
2671

Make sure I'm not violating my own rule of thumb, staff should only
have exactly 1 active employment relation:
>>> Staff.objects.filter(employment__end_date=None).distinct().count()
2671

Now how many staff are at 1 specific school whose call letters are
'RAE':
>>> Staff.objects.filter(employment__end_date=None, 
>>> employment__school__site_name='RAE').distinct().count()
94

How many active staff are not at that school? We have enough info to
know the correct answer:
>>> 2671 - 94
2577

But how do we pull this from the database. It can easily be done on
the employment table itself:
>>> Employment.objects.filter(end_date=None).exclude(school__site_name='RAE').count()
2577

It can't easily be done directly on staff:
>>> Staff.objects.filter(employment__end_date=None).exclude(employment__school__site_name='RAE').distinct().count()
2564

^This is excluding all staff who have ever been at that school.
Negating a Q() object has the same problem:
>>> Staff.objects.filter(~Q(employment__school__site_name='RAE'), 
>>> employment__end_date=None).distinct().count()
2564

On Oct 27, 6:26 am, Kääriäinen Anssi wrote:
> But they _should_ produce the same result, and if they do not,
> introducing negated lookups isn't the way to fix this - the correct
> thing to do is fixing the ORM.

The exclude() option in its current form is unworkable on multi-valued
relations. I'd like to repeat that for emphasis: exclude() can *never*
obsolete direct negative lookups for multi-value relations.

Fixing up negated Q() objects while refusing direct negated lookups is
a little ridiculous. It's akin to saying you shouldn't support __lte
and __gte when ~Q(__gt) and ~Q(__lt) _should_ do the same thing. Even
better, who needs __gt when we've got (~Q(__lt) & ~Q(__exact)) :P.
Back on a serious note, it's worth repeating here that you can
approximate __ne with (Q(__lt) | Q(__gt)) - why make users take their
shoes off in order to put on their hats? Just give them __ne and that
cascades into the natural fix for ~Q(__exact).

Deciding which one of these has a more promising future is left up to
the experts:
>>> print Staff.objects.filter(
~Q(employment__school__site_name='RAE'),
employment__end_date=None,
).distinct().order_by().values('id').query
SELECT DISTINCT `data_staff`.`id`
FROM `data_staff`
LEFT OUTER JOIN `data_employment` ON (`data_staff`.`id` =
`data_employment`.`staff_id`)
WHERE (
NOT (
(`data_staff`.`id`
IN (
SELECT U1.`staff_id`
FROM `data_employment` U1
INNER JOIN `data_school` U2 ON (U1.`school_id` =
U2.`id`)
WHERE (
U2.`site_name` = RAE
AND U1.`staff_id` IS NOT NULL
)
)
AND `data_staff`.`id` IS NOT NULL
)
)
AND `data_employment`.`end_date` IS NULL
)

>>> print Staff.objects.filter(
employment__end_date=None,
employment__school__isnull=False, #force the join
).extra(where=['`data_school`.`site_name` != RAE']
).distinct().order_by().values('id').query
SELECT DISTINCT `data_staff`.`id`
FROM `data_staff`
LEFT OUTER JOIN `data_employment` ON (`data_staff`.`id` =
`data_employment`.`staff_id`)
INNER JOIN `data_school` ON (`data_employment`.`school_id` =
`data_school`.`id`)
WHERE (
`data_employment`.`end_date` IS NULL
AND 

Re: Message Formatting in AdminMailHandler

2011-10-27 Thread Julien Phalip
Hi,

On 27 October 2011 22:47, momo2k  wrote:

> Hello,
>
> I posted this question some time ago on django-users, but since nobody
> answered and I think about this as a serious problem, I'll post here:
>
> Is there a reason why AdminMailHandler does not use
> record.getMessage() as all other formatters do? (e.g. logging/
> __init__.py:436 Python 2.6). Correct me if I'm wrong, but most
> developers use the args of the logging call for string formatting,
> arent they?
>
> LOGGER.error('This is a %s message', some_var, exc_info=exc_info)
>
> In django this leads to unsubstituted admin mail messages.


This should already be fixed in trunk:
https://code.djangoproject.com/changeset/16715

Regards,

Julien

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Message Formatting in AdminMailHandler

2011-10-27 Thread momo2k
Hello,

I posted this question some time ago on django-users, but since nobody
answered and I think about this as a serious problem, I'll post here:

Is there a reason why AdminMailHandler does not use
record.getMessage() as all other formatters do? (e.g. logging/
__init__.py:436 Python 2.6). Correct me if I'm wrong, but most
developers use the args of the logging call for string formatting,
arent they?

LOGGER.error('This is a %s message', some_var, exc_info=exc_info)

In django this leads to unsubstituted admin mail messages.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: Yet another __ne not equal discussion

2011-10-27 Thread Kääriäinen Anssi
Quote:
"""
It's also worth noting that Q() objects permit the unary negation
operator, but this also yields the undesired results of the exclude()
call:
Blog.objects.filter(~Q(entry__author_count=2),
entry__tag__name='django')
"""

As far as I understand, this is exactly the query you want. The filters
are treated as single call, that is, they should target the same row,
not possibly different rows of the multijoin. It is another matter if it
actually works in current ORM implementation. IIRC something like
filter(~Q(pk=1)) and .exclude(Q(pk=1)) can produce different results.
But they _should_ produce the same result, and if they do not,
introducing negated lookups isn't the way to fix this - the correct
thing to do is fixing the ORM.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Yet another __ne not equal discussion

2011-10-27 Thread Adam Moore
Greetings! Thanks to everyone for Django!

Long story short, I believe certain queries that are straightforward
and easy in plain SQL are not directly achievable in Django at all.
Obviously, Django is flexible enough that there is a workaround - but
a workaround shouldn't be necessary in a framework that is otherwise
so well rounded.

Let's start in the shallow end - It's been said that it doesn't make
sense to have a __ne "not equal" lookup when the other lookup types
don't have a negative variant. In light of the evidence below, I would
say that every lookup type should have a negative variant. But even
this would only go so far. For example, you wouldn't need a __ngt "not
greater than" lookup because it's the same thing as __lte "less than
or equal to."

But on to something of more substance - It's been said that exclude()
makes any negative lookup unnecessary. I'm relatively new to
databasing, but I believe that this is not true because of the way
multi-valued relationships are handled. To quote from the
documentation:

[quote]
Django has a consistent way of processing filter() and exclude()
calls. Everything inside a single filter() call is applied
simultaneously to filter out items matching all those requirements.
Successive filter() calls further restrict the set of objects, but for
multi-valued relations, they apply to any object linked to the primary
model, not necessarily those objects that were selected by an earlier
filter() call.
[/quote]

This means that by omitting any negative filter() lookups in favor of
successive exclude() calls, you are losing the ability to guarantee
that you are sticking to the correct instance of relationships.
Another possible solution to this is a new type of filter() and
exclude() variant that stick to old relationships or, in other words,
refrain from introducing "joins" if at all possible. This "other"
solution is a little insane, if I do say so myself; so the use of
negative lookups is simply more logical and truer to the underlying
SQL.

Examples: a little simplified and contrived, say you have blogs with
entries with tags and author_counts.

If you want to retrieve entries tagged 'django' with exactly 2 co-
authors:
Entry.objects.filter(tag__name='django', author_count=2)

If you want entries tagged 'django' that do not have exactly 2 co-
authors:
Entry.objects.filter(tag__name='django').exclude(author_count=2)

The exclude() solution works great for the above, but what if you want
blogs that contain those entries?

Blog.objects.filter(entry__tag__name='django', entry__author_count=2)
^This works as expected for the regular lookup, but not for the
negative lookup:
Blog.objects.filter(entry__tag__name='django').exclude(entry__author_count=2)
^This does not guarantee that the exact entries selected by the
filter() are the same ones being used for the exclude(). In other
words, I believe it joins in entries twice when it is needed only
once.

What you need to get this perfect is this imaginary lookup:
Blog.objects.filter(entry__tag__name='django',
entry__author_count__ne=2)

The workaround that gives the desired results is:
Blog.objects.filter(Q(entry__author_count__lt=2) |
Q(entry__author_count__gt=2), entry__tag__name='django')
^It's a blast from the past: the old <> "less than or greater than"
operator instead of != "not equal."

It's also worth noting that Q() objects permit the unary negation
operator, but this also yields the undesired results of the exclude()
call:
Blog.objects.filter(~Q(entry__author_count=2),
entry__tag__name='django')
^I believe this touches on a whole separate issue that needs a ticket.
The solution would be for negated Q() objects to flip-flop __gt, __lt,
__gte, __lte lookups to their logical opposites; but for the rest of
the lookup types, it would require some negative lookup primitives to
be implemented.

Of course, I must ask if I'm completely overlooking something here.

Adam sM

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.