Re: Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Paulo Scardine
+1: I have my fair share of Django/SQLAlchemy frankensteins. It kind of 
works, but the resulting creature is ugly.

I've used Flask for some projects and I'm really impressed by the power of 
SQLAlchemy.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-developers/-/SweL7f8fGt0J.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Luke Plant
On 30/06/12 20:23, Anssi Kääriäinen wrote:

> TL;DR ;) But I intend to do so shortly.
> 
> For now I have to comment on the "clear up sql/query.py" part. I am
> doubtful if moving to SQLAlchemy will solve the sql/query.py and sql/
> compiler.py complexities.
> 
> I have little knowledge of SQLAlchemy, but my understanding is that it
> does not do things like "when filtering on negated multijoin
> condition, push the condition to subquery", automatic grouping by the
> right set of columns or handle what kind of joins are needed by select
> related. It probably doesn't do join promotion ("should this join be
> INNER or LEFT OUTER join, if dealing with an ORed where condition,
> then LEFT OUTER, and so on). The sql/query.py deals mostly with these
> kinds of problems. The sql/compiler.py also deals in part with
> problems like this.

Mike Bayer pointed me to this code, which does something like Django's
join syntax on top of SQLAlchemy:

https://github.com/mitsuhiko/sqlalchemy-django-query/blob/master/sqlalchemy_django_query.py

I really don't know enough to know how well it is approximating what
Django does. It would be surprising if it was really doing the same thing!

I think it's one of those things where we really won't know the impact
until we've got most of the way there, and even then differences of
approach could make all the difference.

Luke


-- 
Parenthetical remarks (however relevant) are unnecessary

Luke Plant || http://lukeplant.me.uk/

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Luke Plant
On 30/06/12 20:25, Jacob Kaplan-Moss wrote:

> Before we do get too deep into this, however, I want to talk about
> this "Django 2.0" thing:
> 
> Clearly there will be something called "Django 2.0" at some point --
> after 1.9, if we get there, comes 2.0. However, I think it would be a
> mistake to make "Django 2.0" backwards-incompatible. We've seen
> countless examples -- Perl 6, Python 3, Rails 3, ... -- that these
> sorts of "breaks from the past" really alienate and frustrate the
> community. Over the years we've actually gotten really good at
> balancing forward motion with stability. Our reputation as a stable,
> reliable platform is something I'm intensely proud of.
> 
> It's going to take a lot of work to convince me of the value of a
> "break from the past" sort of approach. If this can't be done in a way
> that promises a smooth upgrade path... I'm not sure it's worth doing.

That's exactly the approach I had in making this proposal. The only
publicly documented API that I'm expecting to break or be removed is
QuerySet.extra(), and none of us like that anyway.

The internals I expect will break:

 - anything that relies on manipulating QuerySet.query (I've got one
project where I do that a little bit, I've not seen anyone else do it).

 - DB backend implementations that provide their own SQLCompiler
classes. Actually most external DB backends would probably break.

I'm *not* expecting the vast majority of Model._meta to change in big
ways, which is the biggest 'internal' that people regularly use. Since
it isn't to do with query generation, it doesn't need to change.

Of course, reality can have other ideas with big changes like this, but
that's what I would be aiming for.

For the things that will break, I hope they would be replaced by much
more appealing options - QuerySet.query would be replaced by the
QuerySet.as_sqlalchemy() method that Anssi mentioned, and you won't need
most 3rd party DB backends, or can plug them in via SQLAlchemy's
extension points, which are cleaner than ours from what I can see.

Luke

-- 
Parenthetical remarks (however relevant) are unnecessary

Luke Plant || http://lukeplant.me.uk/

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Anssi Kääriäinen
On 30 kesä, 22:25, Jacob Kaplan-Moss  wrote:
> Before we do get too deep into this, however, I want to talk about
> this "Django 2.0" thing:
>
> Clearly there will be something called "Django 2.0" at some point --
> after 1.9, if we get there, comes 2.0. However, I think it would be a
> mistake to make "Django 2.0" backwards-incompatible. We've seen
> countless examples -- Perl 6, Python 3, Rails 3, ... -- that these
> sorts of "breaks from the past" really alienate and frustrate the
> community. Over the years we've actually gotten really good at
> balancing forward motion with stability. Our reputation as a stable,
> reliable platform is something I'm intensely proud of.
>
> It's going to take a lot of work to convince me of the value of a
> "break from the past" sort of approach. If this can't be done in a way
> that promises a smooth upgrade path... I'm not sure it's worth doing.

I am sure doing Django 2.0 would result in a _long_ development cycle.
Off my head some candidates for 2.0:
  - Template layer
  - ORM
  - Admin

Probably a lot of things in contrib, some things in model layer, I am
sure there are a couple of issues in forms and so on.

How long till we get every part upgraded to 2.0 and get the code
polished to release quality? -1 to finding that out :)

I wonder if it would be possible to support both a new version
(new_models?) and old version side by side. After long enough
deprecation period, kick the old code out of core, maybe into a
separate repository where interested people could pick it up if they
still need it.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Jacob Kaplan-Moss
Wow. There's really a lot to think about here, and I'm only just
starting. Thanks for putting this together, Luke: I know it's been
something that's been discussed a ton, but until now nobody's really
done the due diligence to figure out exactly what the process and
ramifications would be.

Before we do get too deep into this, however, I want to talk about
this "Django 2.0" thing:

Clearly there will be something called "Django 2.0" at some point --
after 1.9, if we get there, comes 2.0. However, I think it would be a
mistake to make "Django 2.0" backwards-incompatible. We've seen
countless examples -- Perl 6, Python 3, Rails 3, ... -- that these
sorts of "breaks from the past" really alienate and frustrate the
community. Over the years we've actually gotten really good at
balancing forward motion with stability. Our reputation as a stable,
reliable platform is something I'm intensely proud of.

It's going to take a lot of work to convince me of the value of a
"break from the past" sort of approach. If this can't be done in a way
that promises a smooth upgrade path... I'm not sure it's worth doing.

Now, that's not a vote against (at least not yet); I think we can find
balance here. I'm certainly not arguing that any backwards
incompatibilities sink the proposal. There's a certain level of
incompatibility that'll be OK, especially when the upside's so great.
External dependencies? If the ecosystem's ready (and it's getting
there), then we can adopt them without affecting most users. Changed
internals? We've already been pretty clear that the internals of the
model system is off-limits, and I think we can tolerate some changes
there.

So: if we're going to go down this path -- and your reasons for why we
should are spot-on -- I say we have to figure out if we can minimize
the upgrade path.

Jacob

On Sat, Jun 30, 2012 at 9:22 AM, Luke Plant  wrote:
> Hi all,
>
> A good while back I put forward the idea of using SQLAlchemy Core in
> Django [1]. Having had more experience working with SQLAlchemy, I'm
> putting that idea forward as a formal proposal, as I mentioned in a more
> recent thread here.
>
> Apologies in advance for the length! I've included a few 'TL;DR'
> summaries and headings in the different sections which you might want to
> scan first.
>
> === Proposal ===
>
> We should re-write our query generation code to use SQLAlchemy Core.
> This includes DDL queries as well as DML (e.g. CREATE TABLE as well as
> SELECT, INSERT etc).
>
> This would also involve replacing our database connection objects with
> SQLAlchemy's. In this proposal, our high-level ORM, with model
> definition and query API, would remain the same - we wouldn't be using
> the SQLAlchemy ORM.
>
> This is a "Django 2.0" proposal i.e. not immediate future, and not fully
> backwards compatible.
>
> === Background - Django 2.0 philosophy ===
>
> TL;DR: "evolution not revolution, some backwards incompatibilities"
>
> Although we haven't really discussed the timing of Django 2.0, or its
> philosophy, I think we should be doing. My own assumption is that it
> should be evolution, not revolution, similar to Python 3, where we break
> stuff that has dogged us in the past, and will hamper us going forward,
> but don't make radical changes where not necessary. This proposal fits
> that basic assumption.
>
> Also, in Django to date we've eschewed external dependencies. That has
> been partly due to the poor and confusing state of Python packaging,
> which is hopefully improving. I think it will make us a very poor Python
> citizen if we don't reverse this policy at some point, and Django 2.0 is
> an obvious point to do it.
>
> This proposal does not represent a move away from being a 'full stack'
> framework. "Full stack" does not mean "no dependencies".
>
> Our current recommended installation method is via pip [2], and we would
> be doing our users a disservice to recommend otherwise. Installation via
> pip actually makes the instructions shorter - manual methods require
> things like removing old versions manually - and for anything beyond
> trivial use of Django you have to know pip anyway.
>
> So, with our recommended installation method, adding a dependency
> doesn't make things any more difficult at all.
>
> === Background - ORM philosophy ===
>
> TL;DR: "Let's make Django's DB layer the best it can be for relational
> databases."
>
> Whether we call it "the ORM" or "the model layer" as some people prefer,
> I think it's fairly certain that the overwhelming majority of our users
> are using relational databases.
>
> Many of the things that make Django a compelling choice,
> including the admin and re-usable apps, either don't work or are of
> little use if you are not using a relational database.
>
> So my philosophy is that we should aim to provide a really excellent
> ORM that will take users as far as possible.
>
> This doesn't preclude having non-relational support in Django. But
> it seems very strange to make 

Re: Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Anssi Kääriäinen
On 30 kesä, 17:22, Luke Plant  wrote:
> Hi all,
>
> A good while back I put forward the idea of using SQLAlchemy Core in
> Django [1]. Having had more experience working with SQLAlchemy, I'm
> putting that idea forward as a formal proposal, as I mentioned in a more
> recent thread here.
>
> Apologies in advance for the length! I've included a few 'TL;DR'
> summaries and headings in the different sections which you might want to
> scan first.

> == Motivation 3: Code cleanup
>
> As mentioned, our own query generation has been pushed to the limit and
> beyond. (I'm talking about classes like Query, SQLCompiler). It has
> grown and grown, so that the Query class is now a 2000 line behemoth,
> containing a constructor with over 40 assignments to 'self'.
>
> Most of the core developers are scared to touch this stuff, AFAICS,
> myself included. It has no virtually no unit tests. While it has very
> high test coverage, it is tested only by tests that check QuerySet and
> other high-level APIs.
>
> As such, it's very difficult to change, and it may well be beyond our
> ability to successfully refactor.
>
> Switching to SQLAlchemy would force us to rewrite this code, which is
> for our own good. In addition, large chunks of it can be dropped
> entirely (i.e. most of database specific stuff). This will reduce our
> maintenance load going forward (eventually).
>
> (BTW, I'm not saying that we should let the existing code continue to
> rot, we should of course try to clean it up as best we can, and that
> effort is not wasted - I'm talking about a longer term strategy here.
> If we can refactor this code, great - this motivation can be dropped
> from the list, but I think the others still stand).

TL;DR ;) But I intend to do so shortly.

For now I have to comment on the "clear up sql/query.py" part. I am
doubtful if moving to SQLAlchemy will solve the sql/query.py and sql/
compiler.py complexities.

I have little knowledge of SQLAlchemy, but my understanding is that it
does not do things like "when filtering on negated multijoin
condition, push the condition to subquery", automatic grouping by the
right set of columns or handle what kind of joins are needed by select
related. It probably doesn't do join promotion ("should this join be
INNER or LEFT OUTER join, if dealing with an ORed where condition,
then LEFT OUTER, and so on). The sql/query.py deals mostly with these
kinds of problems. The sql/compiler.py also deals in part with
problems like this.

What SQLAlchemy could do for us is generate the actual SQL. We would
probably never need to worry about things like how to generate proper
LIMIT queries for different backends, nor would we need to worry about
proper escaping etc. Hopefully we could get rid of most of the dirty
details of supporting different dialects of SQL. This and support for
more backends is reason enough to support this idea. DDL would be a
big bonus, too.

As said, I don't know SQLAlchemy well, so it might be it does more
things usable for our ORM. If so, even more reason to use it.

In addition, I would love if I could do first some operations using
Django's query API, then say "qs.as_sqlalchemy()" and then use
SQLAlchemy for those operations Django's ORM doesn't support.

So, count me in as +1 for this idea. I would not be surprised if it
turns out this will add complexity instead of reducing it. Still, to
me this seem promising.

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Proposal: use SQLAlchemy Core for query generation

2012-06-30 Thread Luke Plant
Hi all,

A good while back I put forward the idea of using SQLAlchemy Core in
Django [1]. Having had more experience working with SQLAlchemy, I'm
putting that idea forward as a formal proposal, as I mentioned in a more
recent thread here.

Apologies in advance for the length! I've included a few 'TL;DR'
summaries and headings in the different sections which you might want to
scan first.

=== Proposal ===

We should re-write our query generation code to use SQLAlchemy Core.
This includes DDL queries as well as DML (e.g. CREATE TABLE as well as
SELECT, INSERT etc).

This would also involve replacing our database connection objects with
SQLAlchemy's. In this proposal, our high-level ORM, with model
definition and query API, would remain the same - we wouldn't be using
the SQLAlchemy ORM.

This is a "Django 2.0" proposal i.e. not immediate future, and not fully
backwards compatible.

=== Background - Django 2.0 philosophy ===

TL;DR: "evolution not revolution, some backwards incompatibilities"

Although we haven't really discussed the timing of Django 2.0, or its
philosophy, I think we should be doing. My own assumption is that it
should be evolution, not revolution, similar to Python 3, where we break
stuff that has dogged us in the past, and will hamper us going forward,
but don't make radical changes where not necessary. This proposal fits
that basic assumption.

Also, in Django to date we've eschewed external dependencies. That has
been partly due to the poor and confusing state of Python packaging,
which is hopefully improving. I think it will make us a very poor Python
citizen if we don't reverse this policy at some point, and Django 2.0 is
an obvious point to do it.

This proposal does not represent a move away from being a 'full stack'
framework. "Full stack" does not mean "no dependencies".

Our current recommended installation method is via pip [2], and we would
be doing our users a disservice to recommend otherwise. Installation via
pip actually makes the instructions shorter - manual methods require
things like removing old versions manually - and for anything beyond
trivial use of Django you have to know pip anyway.

So, with our recommended installation method, adding a dependency
doesn't make things any more difficult at all.

=== Background - ORM philosophy ===

TL;DR: "Let's make Django's DB layer the best it can be for relational
databases."

Whether we call it "the ORM" or "the model layer" as some people prefer,
I think it's fairly certain that the overwhelming majority of our users
are using relational databases.

Many of the things that make Django a compelling choice,
including the admin and re-usable apps, either don't work or are of
little use if you are not using a relational database.

So my philosophy is that we should aim to provide a really excellent
ORM that will take users as far as possible.

This doesn't preclude having non-relational support in Django. But
it seems very strange to make that the focus, when we've had
little-to-no support for it so far, or to allow that support to limit
how well we can cater for the 99%.

=== Motivation ===

== Motivation 1: Django's ORM leaves you high and dry when you reach its
limits.

While the ORM can do a surprising number of queries, there are plenty it
can't, and in all the medium-to-large projects I've worked on I've gone
beyond what the ORM can do.

At this point, you've got a few options, from easiest to hardest:

A) Do the aggregation/filtering etc in Python.

B) Write raw SQL.

C) Use SQLAlchemy or some other SQL generation tool.

D) Write a patch to extend the ORM.


None of these is great:

A) Data manipulation in Python, when it could be done in SQL, is
obviously a bad idea since it is usually very inefficient. But I've seen
a lot of code that does this, because it was hard/impossible to get
Django's ORM to do the query needed.

This anti-pattern will also give Django applications the reputation for
being slow. Obviously, we can point the finger at the developer, but if
we've made it hard for the developer to do the right thing, that is unfair.

B) Raw SQL fails if you have dynamic queries i.e. where the shape of the
query can vary.

Example 1: you are writing library code e.g. a re-usable app that knows
nothing about the tables it is actually querying, and may have been
given any arbitrary QuerySet as an input to manipulate in some way.

Example 2: even if you have full knowledge of the tables, you might have
additional WHERE clauses/JOINs/sub queries in some cases, that you want
to programmatically add to the query.

I've come across both these types in projects I've been involved in, and
I know I'm far from the only one.

Raw SQL can also fail if you are manually writing 'static' queries but
need compatibility with multiple DB backends.

C) For SQL generation, SQLAlchemy is the best, but for good reason it
comes with its own database connection objects. Having two sources of
connection objects causes problems, such as