Re: [GSOC] NoSQL Support for the ORM

2010-04-09 Thread Russell Keith-Magee
On Thu, Apr 8, 2010 at 5:55 AM, Waldemar Kornewald  wrote:
> On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor  wrote:
>>> Other issues that spring to mind:
>>>
>>>  * What about nonSQL datatypes? List/Set types are a common feature of
>>> Non-SQL backends, and are The Right Way to solve a whole bunch of
>>> problems. How do you propose to approach these datatypes? What (if
>>> any) overlap exists between the use of set data types and m2m? Is
>>> there any potential overlap between supporting List/Set types and
>>> supporting Arrays in SQL?
>>>
>>
>> Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
>> opinion there's no reason, once we have a good clean seperation of
>> concerns in the architecture that implementing a ListField would be
>> particularly hard.  If we happened to include one in Django, all the
>> better (from the perspective of interoperability).
>
> Do all SQL DBs provide an array type? PostgreSQL has it and I think it
> can exactly mimic NoSQL lists, but I couldn't find an equivalent in
> sqlite and MySQL. Does this possibly stand in the way of integrating
> an official ListField into Django or is it OK to have a field that
> isn't supported on all DBs? Or can we fall back to storing the list
> items in separate entities in that case?

No - Array types aren't available everywhere. However, it would be
nice to be able to support them (even if not in core); if this GSoC
lays the groundwork to make this possible, then it's worth looking at.

I was more interested in the m2m issue - the 'natural' way to handle
m2m on some NoSQL isn't to have a separate relation, it's to maintain
a list/set of related references.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC] NoSQL Support for the ORM

2010-04-08 Thread Alex Gaynor
On Wed, Apr 7, 2010 at 5:55 PM, Waldemar Kornewald  wrote:
> On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor  wrote:
>>> Other issues that spring to mind:
>>>
>>>  * What about nonSQL datatypes? List/Set types are a common feature of
>>> Non-SQL backends, and are The Right Way to solve a whole bunch of
>>> problems. How do you propose to approach these datatypes? What (if
>>> any) overlap exists between the use of set data types and m2m? Is
>>> there any potential overlap between supporting List/Set types and
>>> supporting Arrays in SQL?
>>>
>>
>> Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
>> opinion there's no reason, once we have a good clean seperation of
>> concerns in the architecture that implementing a ListField would be
>> particularly hard.  If we happened to include one in Django, all the
>> better (from the perspective of interoperability).
>
> Do all SQL DBs provide an array type? PostgreSQL has it and I think it
> can exactly mimic NoSQL lists, but I couldn't find an equivalent in
> sqlite and MySQL. Does this possibly stand in the way of integrating
> an official ListField into Django or is it OK to have a field that
> isn't supported on all DBs? Or can we fall back to storing the list
> items in separate entities in that case?
>

I'd be -1 on using a separate entity, if it's supported it is, if not
it's not.  There's no reason it has to be included in Django in any
event (certainly none of the non-relational backends will be, at least
to start with).

>>>  * How does a non-SQL backend integrate with syncdb and other setup
>>> tools? What about inspectdb?
>>>
>>
>> Most, but not all non-relational databases don't require table setup
>> the way relational DBs do.  MongoDB doesn't require anything at all,
>> by contrast Cassandra requires an XML configuration file.  How to
>> handle these is a little touchy, but basically I think syncdb should
>> stay conceptually pure, generating "tables", if extra config is needed
>> backends should ship custom management commands.
>
> Essentially, I agree, but I would add things like auto-generated
> CouchDB views to the syncdb process (since syncdb on SQL already takes
> care of creating indexes, too).
>
>>>  * Why the choice of MongoDB specifically? Do you have particular
>>> experience with MongoDB? Does MongoDB have features that make it a
>>> good choice?
>>>
>>
>> MongoDB offers a wide range of filtering options, which from my
>> perspective means it presents a greater test of the flexibility of the
>> developed APIs.  For this reason GAE would also be a good choice.
>> Something like Riak or Cassandra, which basically only have native
>> support for get(pk=3) would be a poor test of the flexibility of the
>> API.
>
> MongoDB really is a good choice. Out-of-the-box (without manual index
> definitions) it provides more features than GAE and most other NoSQL
> DBs. MongoDB and GAE should also have the simplest backends.
>
> Why should the Cassandra/CouchDB/Riak/Redis/etc. backend only support
> pk=... queries? There's no reason why the backend couldn't maintain
> indexes for the other fields and transparently support filters on any
> field. I mean, you don't really want developers to manually create and
> query separate indexing models for mapping one field value to its
> respective primary key in the primary model table. We can do much
> better than that.
>

Because that's all they support out of the box.  You call it
maintaining an index, but it really means setting up a separate
"table" (in RDBMS parlance) and I think that's a level of emulation
that's far beyond what should be supported out of the box.  In any
event I can't stop someone from writing a backend that does do that
level of abstraction.

>>>  * Given that you're only proposing a single proof-of-concept backend,
>>> have you given any thought to the needs of other backends? It's not
>>> hard to envisage that Couch, Cassandra, GAE etc will all have slightly
>>> different requirements and problems. Is there a common ground that
>>> exists between all data store backends? If there isn't, how do you
>>> know that what you are proposing will be sufficient to support them?
>>>
>>
>> To a certain extent this is a matter of knowing the featuresets of the
>> databases and, hopefully, having a mentor who is knowledgeable about
>> them.  The reality is under the GSOC time constraints attempting to
>> write complete backends for multiple databases would probably be
>> impossible.
>
> Well, you might be able to quickly adapt the MongoDB backend to GAE
> (within GSoC time constraints) due to their similarity. Anyway, there
> is common ground between the NoSQL DBs, but this highly depends on
> what problem we agree to solve. If we only provide exactly the
> features that each DB supports natively, they'll appear dissimilar
> because they take very different approaches to indexing and if this
> isn't abstracted and automated NoSQL 

Re: [GSOC] NoSQL Support for the ORM

2010-04-08 Thread burc...@gmail.com
Hi all,

On Thu, Apr 8, 2010 at 12:55 AM, Waldemar Kornewald
 wrote:
> On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor  wrote:
>>> Other issues that spring to mind:
[...]
> Well, you might be able to quickly adapt the MongoDB backend to GAE
> (within GSoC time constraints) due to their similarity. Anyway, there
> is common ground between the NoSQL DBs, but this highly depends on
> what problem we agree to solve. If we only provide exactly the
> features that each DB supports natively, they'll appear dissimilar
> because they take very different approaches to indexing and if this
> isn't abstracted and automated NoSQL support doesn't really make sense
> with Django. OTOH, if the goal is to make an abstraction around their
> indexes they can all look very similar from the perspective of
> Django's ORM (of course they have different "features" like sharding
> or eventual consistency or being in-memory DBs or supporting fast
> writes or reads or having transactions or ..., but in the end only few
> of these features have any influence on Django's ORM, at all).
>
> Bye,
> Waldemar Kornewald

Could we switch to one issue/feature per thread, please?

I think the overall approach is chosen already, and everyone agreed with it.
And each detail now has to be discussed separately, and overall
discussion continued here.
I.e, I have few words about design of counters and indexes (and my
favorite NoSQL Berkeley DB), but not arrays/lists.

-- 
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC] NoSQL Support for the ORM

2010-04-07 Thread Waldemar Kornewald
On Wed, Apr 7, 2010 at 5:22 PM, Alex Gaynor  wrote:
>> Other issues that spring to mind:
>>
>>  * What about nonSQL datatypes? List/Set types are a common feature of
>> Non-SQL backends, and are The Right Way to solve a whole bunch of
>> problems. How do you propose to approach these datatypes? What (if
>> any) overlap exists between the use of set data types and m2m? Is
>> there any potential overlap between supporting List/Set types and
>> supporting Arrays in SQL?
>>
>
> Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
> opinion there's no reason, once we have a good clean seperation of
> concerns in the architecture that implementing a ListField would be
> particularly hard.  If we happened to include one in Django, all the
> better (from the perspective of interoperability).

Do all SQL DBs provide an array type? PostgreSQL has it and I think it
can exactly mimic NoSQL lists, but I couldn't find an equivalent in
sqlite and MySQL. Does this possibly stand in the way of integrating
an official ListField into Django or is it OK to have a field that
isn't supported on all DBs? Or can we fall back to storing the list
items in separate entities in that case?

>>  * How does a non-SQL backend integrate with syncdb and other setup
>> tools? What about inspectdb?
>>
>
> Most, but not all non-relational databases don't require table setup
> the way relational DBs do.  MongoDB doesn't require anything at all,
> by contrast Cassandra requires an XML configuration file.  How to
> handle these is a little touchy, but basically I think syncdb should
> stay conceptually pure, generating "tables", if extra config is needed
> backends should ship custom management commands.

Essentially, I agree, but I would add things like auto-generated
CouchDB views to the syncdb process (since syncdb on SQL already takes
care of creating indexes, too).

>>  * Why the choice of MongoDB specifically? Do you have particular
>> experience with MongoDB? Does MongoDB have features that make it a
>> good choice?
>>
>
> MongoDB offers a wide range of filtering options, which from my
> perspective means it presents a greater test of the flexibility of the
> developed APIs.  For this reason GAE would also be a good choice.
> Something like Riak or Cassandra, which basically only have native
> support for get(pk=3) would be a poor test of the flexibility of the
> API.

MongoDB really is a good choice. Out-of-the-box (without manual index
definitions) it provides more features than GAE and most other NoSQL
DBs. MongoDB and GAE should also have the simplest backends.

Why should the Cassandra/CouchDB/Riak/Redis/etc. backend only support
pk=... queries? There's no reason why the backend couldn't maintain
indexes for the other fields and transparently support filters on any
field. I mean, you don't really want developers to manually create and
query separate indexing models for mapping one field value to its
respective primary key in the primary model table. We can do much
better than that.

>>  * Given that you're only proposing a single proof-of-concept backend,
>> have you given any thought to the needs of other backends? It's not
>> hard to envisage that Couch, Cassandra, GAE etc will all have slightly
>> different requirements and problems. Is there a common ground that
>> exists between all data store backends? If there isn't, how do you
>> know that what you are proposing will be sufficient to support them?
>>
>
> To a certain extent this is a matter of knowing the featuresets of the
> databases and, hopefully, having a mentor who is knowledgeable about
> them.  The reality is under the GSOC time constraints attempting to
> write complete backends for multiple databases would probably be
> impossible.

Well, you might be able to quickly adapt the MongoDB backend to GAE
(within GSoC time constraints) due to their similarity. Anyway, there
is common ground between the NoSQL DBs, but this highly depends on
what problem we agree to solve. If we only provide exactly the
features that each DB supports natively, they'll appear dissimilar
because they take very different approaches to indexing and if this
isn't abstracted and automated NoSQL support doesn't really make sense
with Django. OTOH, if the goal is to make an abstraction around their
indexes they can all look very similar from the perspective of
Django's ORM (of course they have different "features" like sharding
or eventual consistency or being in-memory DBs or supporting fast
writes or reads or having transactions or ..., but in the end only few
of these features have any influence on Django's ORM, at all).

Bye,
Waldemar Kornewald

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 

Re: [GSOC] NoSQL Support for the ORM

2010-04-07 Thread Alex Gaynor
On Wed, Apr 7, 2010 at 2:19 PM, lasizoillo  wrote:
> 2010/4/7 Alex Gaynor :
>
>>  * 2 weeks - begin working on a backend for a non-relational database 
>> (probably
>>   MongoDB)
>
> Pymodels[1] have backends for MogoDB and Tokyo Tyrant/Cabinet. Maybe
> some things can be reused in backend.
>
> http://bitbucket.org/neithere/pymodels/
>
> Regards,
>
> Javi
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To post to this group, send email to django-develop...@googlegroups.com.
> To unsubscribe from this group, send email to 
> django-developers+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/django-developers?hl=en.
>
>

I don't really see how, they use a completely different API.

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC] NoSQL Support for the ORM

2010-04-07 Thread lasizoillo
2010/4/7 Alex Gaynor :

>  * 2 weeks - begin working on a backend for a non-relational database 
> (probably
>   MongoDB)

Pymodels[1] have backends for MogoDB and Tokyo Tyrant/Cabinet. Maybe
some things can be reused in backend.

http://bitbucket.org/neithere/pymodels/

Regards,

Javi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: [GSOC] NoSQL Support for the ORM

2010-04-07 Thread Alex Gaynor
On Wed, Apr 7, 2010 at 6:47 AM, Russell Keith-Magee
 wrote:
> On Wed, Apr 7, 2010 at 8:11 AM, Alex Gaynor  wrote:
>> Non-relational database support for the Django ORM
>> ==
>>
>> Note:  I am withdrawing my proposal on template compilation.  Another student
>> has expressed some interest in working on it, and in any event I am now more
>> interested in working on this project.
>>
>> About Me
>> 
>>
>> I'm a sophomore computer science student at Rensselaer Polytechnic Institute.
>> I'm a frequent contributor to Django (including last year's successful 
>> multiple
>> database GSoC project) and other related projects; I'm also a committer on 
>> both
>> `Unladen Swallow `_ and
>> `PyPy `_.
>>
>> Background
>> ~~
>>
>> As the person responsible for large swaths of multiple database support I am
>> intimately familiar with the architecture of the ORM, the code itself, and 
>> the
>> various concerns that need to be accounted for (pickleability, etc.).
>>
>> Rationale
>> ~
>>
>> Non-relational databases tend to support some subset of the operations that 
>> are
>> supported on relational databases, therefore it should be possible to perform
>> these operations on all databases.  Some people are of the opinion that we
>> shouldn't bother to support these databases, because they can't perform all
>> operations, I'm of the opinion that the abstraction is already a little 
>> leaky,
>> we may as well exploit this for a common API where possible, as well as 
>> giving
>> users of these databases the admin and models forms for free.
>>
>> Method
>> ~~
>>
>> The ORM architecture currently has a ``QuerySet`` which is backend agnostic, 
>> a
>> ``Query`` which is SQL specific, and a ``SQLCompiler`` which is backend
>> specific (i.e. Oracle vs. MySQL vs. generic).  The plan is to change 
>> ``Query``
>> to be backend agnostic by delaying the creation of structures that are SQL
>> specific, specifically join/alias data.  Instead of structures like
>> ``self.where``, ``self.join_aliases``, or ``self.select`` all working in 
>> terms
>> of joins and table aliases the composition of a query would be stored in 
>> terms
>> of a tree containing the "raw" filters, as passed to the filter calls, with
>> things like ``Field.get_prep_value`` called appropriately.  The 
>> ``SQLCompiler``
>> will be responsible for computing the joins for all of these data-structures.
>
> I can see the intention here, and I can see how this approach could be
> used to solve the problem. However, my initial concern is that normal
> SQL users will end up carrying around a lot of extra overhead so that
> they can support backends that they will never use.
>
> Have you given any thought to how complex the datastructures inside
> Query will need to be, and how complex and/or expensive the conversion
> process will be?
>

I see no reason they need to be any more complex than the current
ones.  You have a tree that represents filters (combined where and
having, this means that the SQLCompiler is responsible for splitting
these up, which I think will make fixing some other bugs easier (i.e.
disjunction with a filter on aggregates currently doesn't work)).
There's already quite a lot of stuff that's computed later, such as
select_related's transformation into JOINs.

> Other issues that spring to mind:
>
>  * What about nonSQL datatypes? List/Set types are a common feature of
> Non-SQL backends, and are The Right Way to solve a whole bunch of
> problems. How do you propose to approach these datatypes? What (if
> any) overlap exists between the use of set data types and m2m? Is
> there any potential overlap between supporting List/Set types and
> supporting Arrays in SQL?
>

Is there overlap between List/Set and Arrays in SQL?  Probably.  In my
opinion there's no reason, once we have a good clean seperation of
concerns in the architecture that implementing a ListField would be
particularly hard.  If we happened to include one in Django, all the
better (from the perspective of interoperability).

>  * How does a non-SQL backend integrate with syncdb and other setup
> tools? What about inspectdb?
>

Most, but not all non-relational databases don't require table setup
the way relational DBs do.  MongoDB doesn't require anything at all,
by contrast Cassandra requires an XML configuration file.  How to
handle these is a little touchy, but basically I think syncdb should
stay conceptually pure, generating "tables", if extra config is needed
backends should ship custom management commands.

As for inspectdb it only really makes sense on backends that have
structured "tables", so they could implement it, and other backends
could punt.

>  * What about basic connection management? Is the existing Connection
> API likely to be compatible, or will modifications be 

[GSOC] NoSQL Support for the ORM

2010-04-06 Thread Alex Gaynor
Non-relational database support for the Django ORM
==

Note:  I am withdrawing my proposal on template compilation.  Another student
has expressed some interest in working on it, and in any event I am now more
interested in working on this project.

About Me


I'm a sophomore computer science student at Rensselaer Polytechnic Institute.
I'm a frequent contributor to Django (including last year's successful multiple
database GSoC project) and other related projects; I'm also a committer on both
`Unladen Swallow `_ and
`PyPy `_.

Background
~~

As the person responsible for large swaths of multiple database support I am
intimately familiar with the architecture of the ORM, the code itself, and the
various concerns that need to be accounted for (pickleability, etc.).

Rationale
~

Non-relational databases tend to support some subset of the operations that are
supported on relational databases, therefore it should be possible to perform
these operations on all databases.  Some people are of the opinion that we
shouldn't bother to support these databases, because they can't perform all
operations, I'm of the opinion that the abstraction is already a little leaky,
we may as well exploit this for a common API where possible, as well as giving
users of these databases the admin and models forms for free.

Method
~~

The ORM architecture currently has a ``QuerySet`` which is backend agnostic, a
``Query`` which is SQL specific, and a ``SQLCompiler`` which is backend
specific (i.e. Oracle vs. MySQL vs. generic).  The plan is to change ``Query``
to be backend agnostic by delaying the creation of structures that are SQL
specific, specifically join/alias data.  Instead of structures like
``self.where``, ``self.join_aliases``, or ``self.select`` all working in terms
of joins and table aliases the composition of a query would be stored in terms
of a tree containing the "raw" filters, as passed to the filter calls, with
things like ``Field.get_prep_value`` called appropriately.  The ``SQLCompiler``
will be responsible for computing the joins for all of these data-structures.

The major complications are operations where ordering matters, for example
``filter()`` and ``annotate()``.  Because the order of these operations matters
it is imperative that the structures continue to maintain the ordered semantics
of these methods.  Another example is that filters across a many valued
relationship have different semantics when they're in the same call to
``filter()`` as opposed to separate calls.  In the current ``Query`` this is
represented by using different table aliases, however because the new structure
doesn't deal in aliases yet all values should be annotated with a table
"counter" indicating that once joins are computed two different values need to
be on the same join.  This is a bit of a leaky abstraction, but that's life.
It should be noted that joins don't have to be explicitly marked as being
different, only the same (i.e. the ``SQLCompiler`` can choose to reuse,
reorder, or do anything else it likes to efficiently generate SQL).

For operations that aren't supported by a backend (i.e. a JOIN on a
non-relational backend, or ``extra`` SQL on non-SQL backends) it is the
backend's responsibility to raise the appropriate exception (or attempt to
emulate it in some way (e.g. some JOINs can be emulated with nested IN
queries)).

Timeline


This timeline is way coarser than I'd like, consider it a work in progress.

 * 2 weeks - update all ``Query`` methods to store data in a backend agnostic
   manner.
 * 4 weeks - update ``SQLCompiler`` to correctly generate SQL from the
   structures, specifically migrate the JOIN generation logic.
 * 2 weeks - begin working on a backend for a non-relational database (probably
   MongoDB)
 * 3 weeks - deal with bugs as they come up, these will mostly be
related to the
   semantics of inserts and updates at a guess.

Deliverables


 * Refactored ORM ``Query`` and ``SQLCompiler`` classes.
 * A working MongoDB backend (to live outside of the core) supporting:
   * Native lookups (MongoDB supports most "basic" lookup types)
   * Creation/update
   * deletion
   * Working forms (should fall out naturally)

Reality
~~~

All applications aren't magically going to start working on database they
weren't designed to work with.  Using a non-relational database requires a
fundamental change of mindset, the point of this is to be able to use the same
API where possible, and get access to things like the admin and forms.

A note on the admin
~~~

The admin's fundamental operations are list, create, update.  Fundamentally
these should fall out, naturally, for all backends that work.  However, there
are some operations that can subtly require more advanced backend
operations.  Specifically, ``list_filter``