Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Alex Gaynor
On Sat, Mar 21, 2009 at 1:25 AM, Malcolm Tredinnick <
malc...@pointy-stick.com> wrote:

>
> On Sat, 2009-03-21 at 00:41 -0400, Alex Gaynor wrote:
> >
> >
> > > One suggestion Eric Florenzano had was that we go above and
> > beyond
> > > just storing the methods and parameters, we don't even
> > excecute them
> > > at all until absolutely necessary.
> >
> >
> > Excuse me for a moment whilst I add Eric to a special list
> > I've been
> > keeping. He's trying to make trouble.
> >
> > Ok, back now... There are at least two problems with this.
> >
> > (a) Backwards incompatible in that some querysets would return
> > noticeably different results before and after that change. It
> > would be
> > subtle, quiet and very difficult to detect without auditing
> > every line
> > of code that contributes to a queryset. The worst kind of
> > change for us
> > to make from the perspective of the users.
> >
> > What scenario does it return different results, the one place I can
> > think of is:
> >
> > query = queryset.order_by('I AM NOT A REAL FIELD, HAHA')
> > render_to_response('template.html', {'q': query})
> >
> > which would raise an exception in the template instead of in the view.
>
> It's related to eager/deferred argument evaluation (which is done for
> the same reasons): any "smart" object like Q objects would require
> changing to handle deferring things correctly. They can currently be
> designed to evaluate only once and will work correctly.
>

I don't see this as an issue, simply because whatever happens in the
instantiation of these objects would be the same for whatever connection was
in use.


>
> >
> >
> > (b) Intentionally not done right now and not because I'm
> > whimsical and
> > arbitrary (although I am). The problem is it requires storing
> > all sorts
> > of arbitrarily complex Python objects. Which breaks pickling,
> > which
> > breaks caching. People tend to complain, a lot, about that
> > last bit.
> >
> > That's why the Where.add() converts things to more basic types
> > when they
> > are added (via a filter() command).  If somebody really needs
> > lazily
> > evaluated parameters, it's easy enough via a custom Q-like
> > object, but
> > so far nobody has asked for that if they've gotten stuck doing
> > it. It's
> > even something we could consider adding to Django, although
> > it's not a
> > no-brainer given the potential to break caching.
> >
> > I vaguely recall there being a ticket about this that you wontfixed,
> > although that may have been about defering calling callables :).  In
> > any event the caching issue was one I hadn't considered, although one
> > solution would be not to pickle it with the ability to switch to a
> > different query type, it's a bit of a strange restriction, but I don't
> > think it's one that would practically affect people, and it's less
> > restricitive.
>
> You wrote a really long sentence there that didn't make a lot of sense
> (too many prepositions and commas, not enough nouns and full stops).
> Unclear which restriction you're arguing against, but the picklability
> of querysets is pretty much a requirement. It's something people really
> use.
>
> However, before we go too far down this path: this is a very minor
> thing. It's unlikely to be required. Adding it "because we can" is an
> argument Eric can propose at some much later date if it's not absolutely
> *required* for multi-db stuff. I think we won't need to worry about this
> at all.
>

Just to clear that up what I was say was:

When you pickly a QuerySet we build up the entire Query as we would right
before SQL excecution and then just pickle that.  Then the restriction is
that you can't change the database type to be used on an unpickled query.


>
> >
> >
> > [...]
> > >
> > > Thanks for all the review Malcolm.
> >
> >
> > No problems.
> >
> > > One question that I didn't really ask in the initial post is
> > what
> > > parameters should a "DatabaseManager" receieve on it's
> > methods, one
> > > suggestion is the Query object, since that gives the use the
> > maximal
> > > amount of information,, however my concerns there are that
> > it's not a
> > > public API, and having a private API as a part of the public
> > API feels
> > > klunky.
> >
> >
> > At first glance, I believe the word you're looking for is
> > "wrong". :-)
> >
> > Yes, that's the one.
> >
> >
> > Definitely a valid concern.
> >
> > >   OTOH there isn't really another data structure that
> > carries around
> > > the information someone writing their sharding logic(or
> > 

Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Malcolm Tredinnick

On Sat, 2009-03-21 at 00:41 -0400, Alex Gaynor wrote:
> 
> 
> > One suggestion Eric Florenzano had was that we go above and
> beyond
> > just storing the methods and parameters, we don't even
> excecute them
> > at all until absolutely necessary.
> 
> 
> Excuse me for a moment whilst I add Eric to a special list
> I've been
> keeping. He's trying to make trouble.
> 
> Ok, back now... There are at least two problems with this.
> 
> (a) Backwards incompatible in that some querysets would return
> noticeably different results before and after that change. It
> would be
> subtle, quiet and very difficult to detect without auditing
> every line
> of code that contributes to a queryset. The worst kind of
> change for us
> to make from the perspective of the users.
> 
> What scenario does it return different results, the one place I can
> think of is:
> 
> query = queryset.order_by('I AM NOT A REAL FIELD, HAHA')
> render_to_response('template.html', {'q': query})
> 
> which would raise an exception in the template instead of in the view.

It's related to eager/deferred argument evaluation (which is done for
the same reasons): any "smart" object like Q objects would require
changing to handle deferring things correctly. They can currently be
designed to evaluate only once and will work correctly.

>  
> 
> (b) Intentionally not done right now and not because I'm
> whimsical and
> arbitrary (although I am). The problem is it requires storing
> all sorts
> of arbitrarily complex Python objects. Which breaks pickling,
> which
> breaks caching. People tend to complain, a lot, about that
> last bit.
> 
> That's why the Where.add() converts things to more basic types
> when they
> are added (via a filter() command).  If somebody really needs
> lazily
> evaluated parameters, it's easy enough via a custom Q-like
> object, but
> so far nobody has asked for that if they've gotten stuck doing
> it. It's
> even something we could consider adding to Django, although
> it's not a
> no-brainer given the potential to break caching.
> 
> I vaguely recall there being a ticket about this that you wontfixed,
> although that may have been about defering calling callables :).  In
> any event the caching issue was one I hadn't considered, although one
> solution would be not to pickle it with the ability to switch to a
> different query type, it's a bit of a strange restriction, but I don't
> think it's one that would practically affect people, and it's less
> restricitive.

You wrote a really long sentence there that didn't make a lot of sense
(too many prepositions and commas, not enough nouns and full stops).
Unclear which restriction you're arguing against, but the picklability
of querysets is pretty much a requirement. It's something people really
use.

However, before we go too far down this path: this is a very minor
thing. It's unlikely to be required. Adding it "because we can" is an
argument Eric can propose at some much later date if it's not absolutely
*required* for multi-db stuff. I think we won't need to worry about this
at all.

>  
> 
> [...]
> >
> > Thanks for all the review Malcolm.
> 
> 
> No problems.
> 
> > One question that I didn't really ask in the initial post is
> what
> > parameters should a "DatabaseManager" receieve on it's
> methods, one
> > suggestion is the Query object, since that gives the use the
> maximal
> > amount of information,, however my concerns there are that
> it's not a
> > public API, and having a private API as a part of the public
> API feels
> > klunky.
> 
> 
> At first glance, I believe the word you're looking for is
> "wrong". :-)
> 
> Yes, that's the one.
>  
> 
> Definitely a valid concern.
> 
> >   OTOH there isn't really another data structure that
> carries around
> > the information someone writing their sharding logic(or
> whatever other
> > scheme they want to implement) who inevitably want to have.
> 
> 
> Two solutions spring to mind, although I haven't thought this
> through a
> lot: it's not particularly germane to the proposal since it's
> something
> we can work out a bit later on. I've got limited time
> today(something
> about a beta release coming up), so I wanted to just get out
> responses
> to the two people who posted items for discussion. I suspect
> there's a
> lot of thinking 

Re: [GSoC] Serialization Refactor

2009-03-20 Thread Madhusudan C.S
Hi Malcolm and all,

On Sat, Mar 21, 2009 at 8:16 AM, Malcolm Tredinnick <
malc...@pointy-stick.com> wrote:

>
> > I want to work on Serialization Refactor for GSoC. Since what
> > the Django community requires exactly from that idea is still
> > not clear to me, I am requesting any of you to explain a bit
> > on what is expected of that project?
>
> Yes, that seems to be the problem here (in fact, it was what I was
> thinking to myself when reading your second mail).
>
> I thought this problem was going to arise. The one-line suggestions on
> the SoC wiki page aren't particularly specific, unfortunately. They also
> aren't QA'd in any real way for practicality or difficulty, so it's a
> bit of a combination of wishlist and brainstorming. A starting point for
> further research, really. The confusion there is our fault, but if you
> view it as a starting point for thinking, that will be a good point.
>

Ah OK. I did not see it that way, sorry. I thought some one who
wrote there on the wiki had specific set of things on mind. Will
definitely work on it. Thanks.

That particular item appears to be very poorly named in the wiki page.
> It's not about refactoring at all (which is changing code around to make
> new functionality easier, or remove redundancy). It's about adding new
> features to the serializers. Enhancing, extending and changing in
> various places, not refactoring.


Getting it now :)

Now, there are a bunch of things that could be worked on in the
> serialization space. Have a look at the currently open tickets in that
> area (I mean, read them *all*):
>
> http://code.djangoproject.com/query?status=new=assigned=reopened=Serialization=priority


I had already seen most of the tickets in Serialization and ORM before
making this post. Since it is a huge list combined, I had only glanced
through the tickets there. Will get into each of them and will study
them in detail.

You'll see a few consistent patterns for feature requests and
> awkwardness there (beyond the things that are just basic bugs we have to
> fix at some point). It's also worth having a look at mailing list posts
> (and tickets) that refer to "fixtures", since that's where those things
> are used in Django. You'll start to see problems there with items like
> content type values changing or references to pk value in other models
> that change upon loading.


Oh OK. I think most of the problems of this kind are already filed as
tickets in the tracker. I remember seeing them, like #6233, #7052,
#9422, IIRC. Those are the tickets I have bookmarked here :)
I will also look into Mailing list posts for them.


> We'd like to change the serialisation format
> to be a lot more robust when it's referring to other models in any way.
> One possibility is to use a label instead of a value for those fields.
> It can even be designed to be backwards compatible (by adding a version
> field to any new format).
>

Oh OK. I think I need to study a bit more about this and jump
into discussion. I will read the related code.

Adding support for non-model fields to be serialised is another option.


I am confused here a bit. Can you please tell me what you meant
by saying non-model field? Does it mean Foreign Keys or something like
that?

Most of the things I see on the DjangoFullSerializers project appear to
> be covered in the tickets in Trac. So the question then becomes whether
> the goal would be to merge in DjangoFullSerializers, but keeping things
> backwards compatible for existing users. Or to take the good ideas and
> merge them in in a more piecemeal fashion. Or work through the general
> problems raised in the serializer and fixture tickets and posts on the
> mailing list.
>
> Hopefully that gives you a bunch of ideas for a bit more research.


Yeah thanks. It definitely gave me a lot of ideas. Working on them right
away.


> [...]
> >
> > So the solution that appears to me now is to add a
> > Serialization Field support to Django Models. Say something
> > like JSONField and provide Meta Data for the JSON Field
> > Structure in some way, say defining a class for its structure
> > (as we do for ModelAdmin) or providing this in Class Meta
> > inside the Model Definition. This can be done since we will
> > already, at least, know what will be the format of serialized
> > data we recieve (quite obvious, we need to know this, since
> > we cannot process any random serialized data). Hope this is
> > somewhat similar in idea to what is pointed out as ModelAdmin
> > in the ideas list on the wiki page.
>
> Hmm .. fields that provide serialized data aren't really anything to do
> with the serializer. You can already write them now. In fact, people
> have. I'm not sure where Meta factors into this, either. Writing a new
> custom field type isn't really Summer of Code project (it's Weekend of
> Code difficulty, really).


Oh is it possible to deserialize the stream data obtained from external
source into various number of fields in a Database table 

Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Alex Gaynor
>
> > One suggestion Eric Florenzano had was that we go above and beyond
> > just storing the methods and parameters, we don't even excecute them
> > at all until absolutely necessary.
>
> Excuse me for a moment whilst I add Eric to a special list I've been
> keeping. He's trying to make trouble.
>
> Ok, back now... There are at least two problems with this.
>
> (a) Backwards incompatible in that some querysets would return
> noticeably different results before and after that change. It would be
> subtle, quiet and very difficult to detect without auditing every line
> of code that contributes to a queryset. The worst kind of change for us
> to make from the perspective of the users.
>

What scenario does it return different results, the one place I can think of
is:

query = queryset.order_by('I AM NOT A REAL FIELD, HAHA')
render_to_response('template.html', {'q': query})

which would raise an exception in the template instead of in the view.


>
> (b) Intentionally not done right now and not because I'm whimsical and
> arbitrary (although I am). The problem is it requires storing all sorts
> of arbitrarily complex Python objects. Which breaks pickling, which
> breaks caching. People tend to complain, a lot, about that last bit.
>
> That's why the Where.add() converts things to more basic types when they
> are added (via a filter() command).  If somebody really needs lazily
> evaluated parameters, it's easy enough via a custom Q-like object, but
> so far nobody has asked for that if they've gotten stuck doing it. It's
> even something we could consider adding to Django, although it's not a
> no-brainer given the potential to break caching.
>

I vaguely recall there being a ticket about this that you wontfixed,
although that may have been about defering calling callables :).  In any
event the caching issue was one I hadn't considered, although one solution
would be not to pickle it with the ability to switch to a different query
type, it's a bit of a strange restriction, but I don't think it's one that
would practically affect people, and it's less restricitive.


>
> [...]
> >
> > Thanks for all the review Malcolm.
>
> No problems.
>
> > One question that I didn't really ask in the initial post is what
> > parameters should a "DatabaseManager" receieve on it's methods, one
> > suggestion is the Query object, since that gives the use the maximal
> > amount of information,, however my concerns there are that it's not a
> > public API, and having a private API as a part of the public API feels
> > klunky.
>
> At first glance, I believe the word you're looking for is "wrong". :-)
>

Yes, that's the one.


>
> Definitely a valid concern.
>
> >   OTOH there isn't really another data structure that carries around
> > the information someone writing their sharding logic(or whatever other
> > scheme they want to implement) who inevitably want to have.
>
> Two solutions spring to mind, although I haven't thought this through a
> lot: it's not particularly germane to the proposal since it's something
> we can work out a bit later on. I've got limited time today(something
> about a beta release coming up), so I wanted to just get out responses
> to the two people who posted items for discussion. I suspect there's a
> lot of thinking needed here about the concept as a whole and I want to
> do that. Anyway...
>
> One option is to use the piece of public API that is available which
> will always be carrying around a Query object: the QuerySet. Query
> objects don't exist in isolation. However, this sounds problematic
> because the implementation is going to be working at a very low-level --
> database managers are only really interesting to Query.as_sql() and it's
> dependencies. But that leads to the next idea, ...
>
> The other is to work out a better place for this database manager in the
> hierarchy. It might be something that lives as an attribute on a
> QuerySet. Something like the user provides a function that picks the
> database based "some information" that is available to it and the base
> method selects the right database to use. Since it lives in the QuerySet
> namespace, it can happily access the "query" attribute there without any
> encapsulation violations. The database manager then becomes two pieces,
> an algorithm on QuerySet (that might just dispatch to the real algorithm
> on Query), plus some user-supplied code to make the right selections.
> That latter thing could be a callable object if you need the full class
> structure. But the stuff QuerySet/Query needs to know about is probably
> a much smaller interface than *requiring* a full class. (Did any of that
> make sense?)
>
> I think this -- the database manager concept -- is the part of your
> proposal that is most up in the air with respect to what the API looks
> like. Which is fine. The fact that it's something to consider is good
> enough to know. Certainly put some thought into the problem, but don't
> sweat the details too much just yet 

Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Malcolm Tredinnick

Trimming unused portions of the response to make it readable (which I
should have done the first time around, too)...

On Fri, 2009-03-20 at 23:41 -0400, Alex Gaynor wrote:
> 
> 
> On Fri, Mar 20, 2009 at 11:21 PM, Malcolm Tredinnick
>  wrote:
> 
> 
> On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote:
> > Hello all,

[...]

> > The greatest hurdle is changing the connection after we
> already have
> > our
> > ``Query`` partly created.  The issues here are that: we
> might have
> > done tests
> > against ``connection.features`` already, we might need to
> switch
> > either to or
> > from a custom ``Query`` object, amongst other issues.

[...]

> >  One possible solution
> > that is very powerful(though quite inellegant) is to have
> the
> > ``QuerySet`` keep
> > track of all public API method calls against it and what
> parameters
> > they took,
> > then when the ``connection`` is changed it will recreate the
> ``Query``
> > object
> > by creating a "blank" one with the new connection and
> reapplying all
> > the
> > methods it has stored.  This is basically a simple
> implementation of
> > the
> > command pattern.
> 
> 
> 
> 
> It's pretty yukky. There's a lot of Python level junk that we
> intentionally avoid storing in querysets so that they behave
> properly as
> persistent data structures (clones are independent copies) and
> can be
> pickled without trouble, etc. It would be really bad for
> performance to
> reintroduce those (I did a lot of profiling when developing
> that stuff
> and tried to throw out as much as possible). I think this
> fortunately
> isn't going to be a real issue. I was pretty careful
> originally to keep
> the leakage from django.db.connection into the Query class to
> as few
> places as possible and mostly when we're creating the SQL.
> 
> Some cases that might eb unavoidable could be replaced with
> delayed
> evaluation objects (essentially encapsulating the command
> pattern just
> for that fragment), which is a bit cleaner.
> 
> 
> One suggestion Eric Florenzano had was that we go above and beyond
> just storing the methods and parameters, we don't even excecute them
> at all until absolutely necessary.  

Excuse me for a moment whilst I add Eric to a special list I've been
keeping. He's trying to make trouble.

Ok, back now... There are at least two problems with this.

(a) Backwards incompatible in that some querysets would return
noticeably different results before and after that change. It would be
subtle, quiet and very difficult to detect without auditing every line
of code that contributes to a queryset. The worst kind of change for us
to make from the perspective of the users.

(b) Intentionally not done right now and not because I'm whimsical and
arbitrary (although I am). The problem is it requires storing all sorts
of arbitrarily complex Python objects. Which breaks pickling, which
breaks caching. People tend to complain, a lot, about that last bit.

That's why the Where.add() converts things to more basic types when they
are added (via a filter() command).  If somebody really needs lazily
evaluated parameters, it's easy enough via a custom Q-like object, but
so far nobody has asked for that if they've gotten stuck doing it. It's
even something we could consider adding to Django, although it's not a
no-brainer given the potential to break caching.

[...]
> 
> Thanks for all the review Malcolm.

No problems.

> One question that I didn't really ask in the initial post is what
> parameters should a "DatabaseManager" receieve on it's methods, one
> suggestion is the Query object, since that gives the use the maximal
> amount of information,, however my concerns there are that it's not a
> public API, and having a private API as a part of the public API feels
> klunky.

At first glance, I believe the word you're looking for is "wrong". :-)

Definitely a valid concern.

>   OTOH there isn't really another data structure that carries around
> the information someone writing their sharding logic(or whatever other
> scheme they want to implement) who inevitably want to have.

Two solutions spring to mind, although I haven't thought this through a
lot: it's not particularly germane to the proposal since it's something
we can work out a bit later on. I've got limited time today(something
about a beta release coming up), so I wanted to just get out responses
to the two people who posted items for discussion. I suspect there's a
lot of thinking needed here about the concept as a whole and I 

Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Alex Gaynor
On Fri, Mar 20, 2009 at 11:21 PM, Malcolm Tredinnick <
malc...@pointy-stick.com> wrote:

>
> On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote:
> > Hello all,
> >
> > To those who don't me I'm a freshman computer science student at
> > Rensselaer
> > Polytechnic Institute in Troy, New York.  I'm on the mailing lists
> > quite a bit
> > so you may have seen me around.
> >
> > A Multiple Database API For Django
> > ==
> >
> > Django current has the low level hooks necessary for multiple database
> > support,
> > but it doesn't have the high level API for using, nor any support
> > infrastructure, documentation, or tests.  The purpose of this project
> > would be
> > to implement the high level API necessary for the use of multiple
> > databases in
> > Django, along with requisit documentation and tests.
> >
> > There have been several previous proposals and implementation of
> > multiple-database support in Django, non of which has been complete,
> > or gained
> > sufficient traction in the community in order to be included in Django
> > itself.
> > As such this proposal will specifically address some of the reasons
> > for past
> > failures, and their remedies.
> >
> > The API
> > ---
> >
> > First there is the API for defining multiple connections.  A new
> > setting will
> > be created ``DATABASES`` (or something similar), which is a dictionary
> > mapping
> > database alias(internal name) to a dictionary containing the current
> > ``DATABASE_*`` settings:
> >
> > .. sourcecode:: python
> >
> > DATABASES = {
> > 'default': {
> > 'DATABASE_ENGINE': 'postgresql_psycopg2',
> > 'DATABASE_NAME': 'my_data_base',
> > 'DATABASE_USER': 'django',
> > 'DATABASE_PASSWORD': 'super_secret',
> > }
> > 'user': {
> > 'DATABASE_ENGINE': 'sqlite3',
> > 'DATABASE_NAME':
> > '/home/django_projects/universal/users.db',
> > }
> > }
> >
> > A database with the alias ``default`` will be the default
> > connection(it will be
> > used if no other one is specified for a query) and will be the direct
> > replacement for the ``DATABASE_*`` settings.  In compliance with
> > Django's
> > deprecation policy the ``DATABASE_*`` will automatically be handled as
> > if they
> > were defined in the ``DATABASES`` dict for at least 2 releases.
> >
> > Next a ``connections`` object will be implemented in ``django.db``,
> > analgous
> > to the ``django.db.connection`` object, the ``connections`` one will
> > be a
> > dictionary like object, that is subscripted by database alias, and
> > lazily
> > returns a connection to the database.  ``django.db.connection`` will
> > remain(at
> > least for the present, it's ultimate state will be by community
> > consensus) and
> > merely proxy to ``django.db.connections['default']``.  Using the
> > previously
> > defined database setting this might be used as:
> >
> > .. sourcecode:: python
> >
> > from django.db import connections
> >
> > conn = connections['user']
> > c = conn.cursor()
> > results = c.execute("""SELECT 1""")
> > results.fetchall()
> >
> > Now that there is the necessary infastructure to accompany the very
> > low level
> > plumbing we need our actual API.  The high level API will have 2
> > components.
> > First here will be a ``using()`` method on ``QuerySet`` and
> > ``Manager``
> > objects.  This method simply takes an alias to a connection(and
> > possibly a
> > connection object itself to allow for dynamic database usage) and
> > makes that
> > the connection that will be used for that query.  Secondly, a new
> > options will
> > be created in the inner Meta class of models.  This option will be
> > named
> > ``using`` and specify the default connection to use for all queries
> > against
> > this model, overiding the default specified in the settings:
> >
> > .. sourcecode:: python
> >
> > class MyUser(models.Model):
> > ...
> > class Meta:
> > using = 'user'
> >
> > # this queries the 'user' database
> > MyUser.objects.all()
> > # this queries the 'default' database
> > MyUser.objects.using('default')
> >
> > Lastly, various plumbing will need to be updated to reflect the new
> > multidb
> > API, such as transactions, breakpoints, management commands, etc.
> >
> > More Advanced Usage
> > ---
> >
> > While the above two methods are strictly speaking sufficient they
> > require the
> > user to write lots of boilerplate code in order to implement advanced
> > multi
> > database strategies such as replication and sharding.  Therefore we
> > also
> > introduce the concept of ``DatabaseManagers``, not to be confused with
> > Django's
> > current managers.  DatabaseManagers are classes that define how what
> > connection
> > should be used for a given query.  There are 2 levels at which to
> > specify what
> > ``DatabaseManager`` to use, as a setting, and at 

[Fwd: Re: [GSoC] Serialization Refactor]

2009-03-20 Thread Malcolm Tredinnick

Whoops... hadn't noticed this was sent to multiple lists, so only
replied to the first one.

Sending my technical discussion reply to django-dev, since that's where
the main audience participation is likely to be.

 Forwarded Message 
From: Malcolm Tredinnick 
Reply-to: django-g...@googlegroups.com
To: django-g...@googlegroups.com
Subject: Re: [GSoC] Serialization Refactor
Date: Sat, 21 Mar 2009 13:46:21 +1100

On Sat, 2009-03-21 at 01:38 +0530, Madhusudan C.S wrote:
> Hi all,
>I just wrote 2 mails about myself and my wish to 
> participate in GSoC as a Django student. Sorry if I am
> spamming your inboxes. I just want to keep my mails short
> so people who don't want to read everything in there can
> skip the mails that are irrelevant to them. Please correct
> me where ever I am wrong and if I am not doing it the
> way it must be done here in Django.
> 
> I hope I understand what Malcolm and Jacob meant when
> they said this.
> 
> We make changes because there are use-cases
> for them, not because we can. So any proposal should
> be driven by trying to fix some existing problem, not
> creating a "wouldn't it be nice if...?" situation.
> 
> I want to work on Serialization Refactor for GSoC. Since what
> the Django community requires exactly from that idea is still
> not clear to me, I am requesting any of you to explain a bit
> on what is expected of that project? 

Yes, that seems to be the problem here (in fact, it was what I was
thinking to myself when reading your second mail).

I thought this problem was going to arise. The one-line suggestions on
the SoC wiki page aren't particularly specific, unfortunately. They also
aren't QA'd in any real way for practicality or difficulty, so it's a
bit of a combination of wishlist and brainstorming. A starting point for
further research, really. The confusion there is our fault, but if you
view it as a starting point for thinking, that will be a good point.

That particular item appears to be very poorly named in the wiki page.
It's not about refactoring at all (which is changing code around to make
new functionality easier, or remove redundancy). It's about adding new
features to the serializers. Enhancing, extending and changing in
various places, not refactoring.

Now, there are a bunch of things that could be worked on in the
serialization space. Have a look at the currently open tickets in that
area (I mean, read them *all*):
http://code.djangoproject.com/query?status=new=assigned=reopened=Serialization=priority

You'll see a few consistent patterns for feature requests and
awkwardness there (beyond the things that are just basic bugs we have to
fix at some point). It's also worth having a look at mailing list posts
(and tickets) that refer to "fixtures", since that's where those things
are used in Django. You'll start to see problems there with items like
content type values changing or references to pk value in other models
that change upon loading. We'd like to change the serialisation format
to be a lot more robust when it's referring to other models in any way.
One possibility is to use a label instead of a value for those fields.
It can even be designed to be backwards compatible (by adding a version
field to any new format).

Adding support for non-model fields to be serialised is another option.
Most of the things I see on the DjangoFullSerializers project appear to
be covered in the tickets in Trac. So the question then becomes whether
the goal would be to merge in DjangoFullSerializers, but keeping things
backwards compatible for existing users. Or to take the good ideas and
merge them in in a more piecemeal fashion. Or work through the general
problems raised in the serializer and fixture tickets and posts on the
mailing list.

Hopefully that gives you a bunch of ideas for a bit more research.

[...]
> 
> So the solution that appears to me now is to add a 
> Serialization Field support to Django Models. Say something
> like JSONField and provide Meta Data for the JSON Field
> Structure in some way, say defining a class for its structure
> (as we do for ModelAdmin) or providing this in Class Meta
> inside the Model Definition. This can be done since we will
> already, at least, know what will be the format of serialized
> data we recieve (quite obvious, we need to know this, since
> we cannot process any random serialized data). Hope this is
> somewhat similar in idea to what is pointed out as ModelAdmin
> in the ideas list on the wiki page.

Hmm .. fields that provide serialized data aren't really anything to do
with the serializer. You can already write them now. In fact, people
have. I'm not sure where Meta factors into this, either. Writing a new
custom field type isn't really Summer of Code project (it's Weekend of
Code difficulty, really).

> 
> I would like to add support for JSON and Python serialization
> through this project during Summer Of Code period and 

Re: IPAddressField

2009-03-20 Thread Malcolm Tredinnick

On Mon, 2009-03-16 at 13:18 +0100, Gregor Kling wrote:
[...]
> Generally i do agree with the *usefulness* of not breaking compatibiltiy.
> But on the other hand, I think that correcting this weird handling of IP 
> addresses would legitimate the cut.
> Because the handling of IP addresses is not that intrinsic, like for 
> example the orm, it should be possible to cope with the change,
> and to get rid of this ward.

Right now we have an IPAddress field. People are using it. You *cannot*
break their code and requiring the database field to be changed does
just that. So, no. This would have to be a differently named field. We
might well deprecate the existing version, but breaking existing code
just because it's "neat" is not an option.

Regards,
Malcolm


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Malcolm Tredinnick

On Fri, 2009-03-20 at 09:45 -0400, Alex Gaynor wrote:
> Hello all,
> 
> To those who don't me I'm a freshman computer science student at
> Rensselaer 
> Polytechnic Institute in Troy, New York.  I'm on the mailing lists
> quite a bit 
> so you may have seen me around.
> 
> A Multiple Database API For Django
> ==
> 
> Django current has the low level hooks necessary for multiple database
> support, 
> but it doesn't have the high level API for using, nor any support 
> infrastructure, documentation, or tests.  The purpose of this project
> would be 
> to implement the high level API necessary for the use of multiple
> databases in 
> Django, along with requisit documentation and tests.
> 
> There have been several previous proposals and implementation of 
> multiple-database support in Django, non of which has been complete,
> or gained 
> sufficient traction in the community in order to be included in Django
> itself.  
> As such this proposal will specifically address some of the reasons
> for past 
> failures, and their remedies.
> 
> The API
> ---
> 
> First there is the API for defining multiple connections.  A new
> setting will 
> be created ``DATABASES`` (or something similar), which is a dictionary
> mapping 
> database alias(internal name) to a dictionary containing the current 
> ``DATABASE_*`` settings:
> 
> .. sourcecode:: python
> 
> DATABASES = {
> 'default': {
> 'DATABASE_ENGINE': 'postgresql_psycopg2',
> 'DATABASE_NAME': 'my_data_base',
> 'DATABASE_USER': 'django',
> 'DATABASE_PASSWORD': 'super_secret',
> }
> 'user': {
> 'DATABASE_ENGINE': 'sqlite3',
> 'DATABASE_NAME':
> '/home/django_projects/universal/users.db',
> }
> }
> 
> A database with the alias ``default`` will be the default
> connection(it will be 
> used if no other one is specified for a query) and will be the direct 
> replacement for the ``DATABASE_*`` settings.  In compliance with
> Django's 
> deprecation policy the ``DATABASE_*`` will automatically be handled as
> if they 
> were defined in the ``DATABASES`` dict for at least 2 releases.
> 
> Next a ``connections`` object will be implemented in ``django.db``,
> analgous 
> to the ``django.db.connection`` object, the ``connections`` one will
> be a 
> dictionary like object, that is subscripted by database alias, and
> lazily 
> returns a connection to the database.  ``django.db.connection`` will
> remain(at 
> least for the present, it's ultimate state will be by community
> consensus) and 
> merely proxy to ``django.db.connections['default']``.  Using the
> previously 
> defined database setting this might be used as:
> 
> .. sourcecode:: python
> 
> from django.db import connections
> 
> conn = connections['user']
> c = conn.cursor()
> results = c.execute("""SELECT 1""")
> results.fetchall()
> 
> Now that there is the necessary infastructure to accompany the very
> low level 
> plumbing we need our actual API.  The high level API will have 2
> components.  
> First here will be a ``using()`` method on ``QuerySet`` and
> ``Manager`` 
> objects.  This method simply takes an alias to a connection(and
> possibly a 
> connection object itself to allow for dynamic database usage) and
> makes that 
> the connection that will be used for that query.  Secondly, a new
> options will 
> be created in the inner Meta class of models.  This option will be
> named 
> ``using`` and specify the default connection to use for all queries
> against 
> this model, overiding the default specified in the settings:
> 
> .. sourcecode:: python
> 
> class MyUser(models.Model):
> ...
> class Meta:
> using = 'user'
> 
> # this queries the 'user' database
> MyUser.objects.all()
> # this queries the 'default' database
> MyUser.objects.using('default')
> 
> Lastly, various plumbing will need to be updated to reflect the new
> multidb 
> API, such as transactions, breakpoints, management commands, etc.
> 
> More Advanced Usage
> ---
> 
> While the above two methods are strictly speaking sufficient they
> require the 
> user to write lots of boilerplate code in order to implement advanced
> multi 
> database strategies such as replication and sharding.  Therefore we
> also 
> introduce the concept of ``DatabaseManagers``, not to be confused with
> Django's 
> current managers.  DatabaseManagers are classes that define how what
> connection 
> should be used for a given query.  There are 2 levels at which to
> specify what 
> ``DatabaseManager`` to use, as a setting, and at the class level.  For
> example 
> in one's settings.py one might have:
> 
> .. sourcecode:: python
> 
> DEFAULT_DB_MANAGER = 'django.db.multidb.round_robin.Random'
> 
> This tells Django that for each query it should use the
> ``DatabaseManager`` 
> specified at that location, unless it is 

Re: QuerySet.values() Shallow Copy

2009-03-20 Thread Malcolm Tredinnick

On Fri, 2009-03-20 at 05:08 -0700, Vitaly Peressada wrote:
> @Malcolm:
> 
> I agree with you that there are some holes in code - it was a quick
> hack to solve issue at hand. I did suspect that there should be some
> effort to implement this feature and tickets quoted confirm that. It
> is too bad that as of now it has not done yet even though tickets
> appear to be 2 years old. Is there anything I could to help, please
> let me know.

They've been open for two years because nobody has fixed them yet and we
think it's worthwhile doing (plus they're not entirely trivial to fix,
so that cuts down the number of people willing to put in the effort). In
the interim we've closed, you know, a *few thousand* other tickets, so
progress has definitely been made.

I'll also note that parts of #5768 have been fixed, e.g., in r7230. Lots
of those bigger items are multi-part projects that get fixed in a few
stages.

You could work on those tickets if you want to help. I've pointed out
the difficulties in comment 4 on #5768. Whether we restrict values() to
only allowing one multi-valued relation or, preferably, constructing the
correct SQL for querying many multi-valued relations (making sure we
only return 1 + n1 +n2 rows, not n1 * n2 rows, in the notation in that
comment). The latter situation is best, but hard to implement.

So start working on that if you want this solved. We aren't going to
commit a hack to work around something when the real problem is known.
If you get stuck, ask as many questions as you like on this list.

Regards,
Malcolm


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Proposal: enable CSRF middleware by default

2009-03-20 Thread Adrian Holovaty

On Thu, Mar 19, 2009 at 9:03 PM, James Bennett  wrote:
> Too late now since it's already committed, but I've got some serious
> reservations about this one. More development effort should have gone
> into improving and refactoring the middleware before it got
> automatically enabled.

Hmm, yeah... :-/

I've been traveling since Tuesday, and, shall we say, I'm not that
excited about this being in the default middleware. In fact, I'm +1
for reverting this change and might even want to exercise the
benevolent dictator veto on it, frankly.

My reasoning: it's more overhead for every request, and it's a clunky
implementation. I mean, parsing the HTML of every page with a regex?
Come on.

We ought to be making Django *faster*, not adding little pieces to it,
bit by bit, until it gets bloated.

And to raise a bit of bureaucracy in the process: there's something
particularly Big And Important about changing anything in the global
settings file -- whether it's adding a new setting, or changing a
setting as fundamental as MIDDLEWARE_CLASSES -- so in the future I
would ask that any such changes be given more discussion (and signoffs
by committers) before a quick commit. In fact, it should be entirely
opt-in, not opt-out. "Please let me know by Thursday evening (GMT) if
there are objections" is not acceptable, IMO.

Adrian

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: IPAddressField

2009-03-20 Thread Ian Kelly

On Fri, Mar 20, 2009 at 6:36 PM, pavel.schon  wrote:
>
> Hi, I'v written IPAddressField that stores IPy.IP instances. Look at
> http://www.djangosnippets.org/snippets/1381/ and try it. Thanks for
> bugreports.

I'm afraid Oracle only supports 38 digits of precision for numeric
columns.  Since this requires 39, it won't work.

Thanks,
Ian

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: IPAddressField

2009-03-20 Thread pavel.schon

Hi, I'v written IPAddressField that stores IPy.IP instances. Look at
http://www.djangosnippets.org/snippets/1381/ and try it. Thanks for
bugreports.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Summer of Code: mentors wanted

2009-03-20 Thread Gary Wilson Jr.

On Thu, Mar 19, 2009 at 5:41 PM, Jacob Kaplan-Moss  wrote:
> If you'd like to mentor a Summer of Code project, you can apply through
> Google's web app right now. Please also add your name here:
> http://code.djangoproject.com/wiki/SummerOfCode2009

FYI, django mentor signup is here:

http://socghop.appspot.com/mentor/request/google/gsoc2009/django

Gary

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



1.1 beta: Monday

2009-03-20 Thread Jacob Kaplan-Moss

Hi folks --

I met with James earlier and reviewed the outstanding list of stuff
for 1.1 beta. We agreed it'd be best to give everyone -- me included
:) -- a couple extra days, so we're going to push the 1.1 beta to
Monday, probably around noon US Central time.

Have a good, productive weekend,

Jacob

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Multiple admin forms

2009-03-20 Thread Stuart Jansen

On Fri, 2009-03-20 at 10:55 -0700, Collin Grady wrote:
> Usage questions belong on the django-users mailing list. This list is
> for the development of django itself.

So you're saying that what the original author wants to accomplish _is_
possible with the current Django admin?



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



[GSoC] Serialization Refactor

2009-03-20 Thread Madhusudan C.S
Hi all,
   I just wrote 2 mails about myself and my wish to
participate in GSoC as a Django student. Sorry if I am
spamming your inboxes. I just want to keep my mails short
so people who don't want to read everything in there can
skip the mails that are irrelevant to them. Please correct
me where ever I am wrong and if I am not doing it the
way it must be done here in Django.

I hope I understand what Malcolm and Jacob meant when
they said this.

We make changes because there are use-cases
> for them, not because we can. So any proposal should
> be driven by trying to fix some existing problem, not
> creating a "wouldn't it be nice if...?" situation.


I want to work on Serialization Refactor for GSoC. Since what
the Django community requires exactly from that idea is still
not clear to me, I am requesting any of you to explain a bit
on what is expected of that project? In the mean time,
before I get the response let me add few more ideas to it.

I am proposing this idea as a Django user initially, opening
it up for discussion for the rest of the community. I
personally feel this is a missing feature in Django and want
to see it happen as a "Django user" for sure. (Also please
tell me if it is worth opening a ticket on this and sending
this idea to Django users list as well for additional
feedback? )

Let me begin my idea with an interesting Use Case I have as
a Django User(Hope many other users would have felt the same).
I am not sure if this already exists in Django. I assume it
doesn't from what I have learnt. Please correct me if I am
wrong.

I have a Web app written in Django which gets its data in
a Serialized format. The data source is actually a third-
party script which fetches an HTML page from a website parses
the data and supplies it in json format to us. (The page
parsed is actually an University Result sheet, for which
the script has no access to its results Database). I now want
to store this data into my Database. But along with the data
provided by JSON I need to add some additional administration
stuff into the database table for each serialized data I get.
One can easily ask me, why can't I use deserialization. But
the problem here as I have understood(may be I am wrong,
please correct me if so) is, whenever I deserialize the
stream data I get, I can only obtain a DeserializedObject
that contains a Django object which should contain the full
Model data including any PK fields that exist in the model,
but not the subset of fields. This is not the case here.
I just want to make the Serialized data, I get, a part of the
Database Table, say a subset of fields in  the table along
with other fields too, for example like the time at which
this data was recorded in the Database, some indexing stuff
among other things. One can also ask me to write a Custom Field
which stores the serialized data in a String (i.e as varchar)
format or something like that. But from what I understand
(from the docs) I can use custom fields for single fields
but not for data that must be split over several fields.
Thats exactly what is required here, since I get the marks
in JSON, I must able to obtain class average over a
particular subject and stuff which becomes difficult if I
store JSON data as string. Since I need to deserialize the
entire string each time I need access to a single field in it.

So the solution that appears to me now is to add a
Serialization Field support to Django Models. Say something
like JSONField and provide Meta Data for the JSON Field
Structure in some way, say defining a class for its structure
(as we do for ModelAdmin) or providing this in Class Meta
inside the Model Definition. This can be done since we will
already, at least, know what will be the format of serialized
data we recieve (quite obvious, we need to know this, since
we cannot process any random serialized data). Hope this is
somewhat similar in idea to what is pointed out as ModelAdmin
in the ideas list on the wiki page.

I would like to add support for JSON and Python serialization
through this project during Summer Of Code period and take
take up XML and YAML post GSoC since I feel if we include
those also it would be too much for 12 weeks project. Just
my estimate :(

Python Serialization support has another interesting use case
I feel. If we allow Python buitin types, at least types like
lists, tuples and dictionary fields in Django Models, we will
be providing the highest level of Object Oriented Abstraction
for Relational Databases. We will make the lives of Django
users easier by allowing them to use those Python types
easily without having to worry too much about Normalization.
But how we implement them will also be interesting and
tricky. It obviously requires many design decisions from Django
Community. One idea that I get now is to apply the same kind
of Normalization we apply to the list of values we have to put
it into a Relational Database, like creating a new table for
list items and creating a foreign key 

Re: Multiple admin forms

2009-03-20 Thread Collin Grady

Usage questions belong on the django-users mailing list. This list is
for the development of django itself.

-- 
Collin Grady

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



[GSoC] An Introduction about me

2009-03-20 Thread Madhusudan C.S
Hi all,
   This is an introduction about myself. Since Jacob said,
"And if we don't know them at all, it's hard to trust they'll
get things done." I am writing my involvement in Django and
other FOSS communities in general here to let you all know
something about me. Hope this helps you people to tell me
what I need to learn and how to go about the idea I am
interested in.

I have been interested in contributing to Django even before
GSoC ever flashed to me. In January or so, I badly felt the
need for Multiple Primary Key support (Ticket #373) and
pinged David Crammer about it on #django-dev, since he had
done some work on it. Back then I started to read the Django
ORM code, but could not write any code after that. Thanks to
University coursework :(

Later from early March I have been trying to contribute
something to Django. (I am free from then and will be
mostly free henceforth and totally free in Summer without
any other commitments.) I have had some discussions about
fixing ticket #8161 on django-devel list (http://is.gd/obr2)
but unfortunately it was fixed. So was looking for few other
things and I thought I will apply for GSoC as Django student
since I felt it lowers the barrier to get started. I am
mostly interested in ORM related ideas since I have read
most of django.db.* code. (I am sorry, I am not claiming I
am very well versed in Django ORM, but I have a fair idea of
how the code is written and structured). I am searching for
other ORM related ideas from the ticket list. I will get
back to you all whenever I find I something interesting.

I am involved in FOSS communities from 3 years now and have
Python experience of around 1.25 years. I have contributed
few patches to projects like Melange
(http://code.google.com/p/soc/source/browse/trunk/AUTHORS,
the app on which this year GSoC is run, built on Django),
KDE Step (http://is.gd/oci7), GNUSim8085(worked for Windows
port), RTEMS and quite a few other FOSS projects.


-- 
Thanks and regards,
 Madhusudan.C.S

Blogs at: www.madhusudancs.info
Official Email ID: madhusu...@madhusudancs.info

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: Patch status for ticket #9122

2009-03-20 Thread Preston Timmons

Thanks, Brian. I appreciate you putting time into that.

The existing documentation explains the generic inline classes as
behaving the same as the normal inlines.
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#using-generic-relations-as-an-inline

Right above that is documented the normal inline options.
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#inlinemodeladmin-options

I thought I would mention that because it seems like a feature is
already documented as working.

Preston




On Mar 19, 8:54 pm, Brian Rosner  wrote:
> On Mar 19, 2009, at 5:47 PM, Preston Timmons wrote:
>
> > Might somebody be able to review the patch and tests for this ticket
> > to see if they are acceptable? I am hoping it can get in as a bug fix
> > for 1.1. If something is lacking here I would like to try to fix it.
>
> The patch looks generally acceptable. I'd like to see some  
> documentation on it. I will definitely review this in time for 1.1.  
> Thanks for the heads up.
>
> Brian Rosnerhttp://oebfare.com
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: [GSOC] Multiple Database API proposal

2009-03-20 Thread Tim Chase

> I'm here soliciting feedback on both the API, and any potential hurdles I
> may have missed.

While my vote may mean little, Alex has certainly been active and 
had quality code on the mailing list.  MultiDB has also been a 
frequent issue on the mailing-list, so Alex gets my +1

I'd hope to see "multiple databases" defined a little more 
clearly as discussed in this thread[1].  Whether the SoC project 
address *all* of the facets (wow, lots of work!) or just selects 
certain issues, I'd like to see them addressed in the proposal 
("addressing federation and load-balancing, but not sharding") to 
show that they're being considered during the implementation. 
 From what I gather in the description, Alex is only proposing 
load-balancing.

Depending on which definitions of multidb you plan to address, it 
also impacts areas such as aggregation (performing 
count/summation over shards requires extra consideration) and 
cross-database joining.  In the above thread, Malcolm also raises 
the issue of read/write consistency when doing load-balancing.

-tim

[1]
http://groups.google.com/group/django-users/browse_thread/thread/663046559fd0f9c1/




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



[GSOC] Multiple Database API proposal

2009-03-20 Thread Alex Gaynor
Hello all,

To those who don't me I'm a freshman computer science student at Rensselaer
Polytechnic Institute in Troy, New York.  I'm on the mailing lists quite a
bit
so you may have seen me around.

A Multiple Database API For Django
==

Django current has the low level hooks necessary for multiple database
support,
but it doesn't have the high level API for using, nor any support
infrastructure, documentation, or tests.  The purpose of this project would
be
to implement the high level API necessary for the use of multiple databases
in
Django, along with requisit documentation and tests.

There have been several previous proposals and implementation of
multiple-database support in Django, non of which has been complete, or
gained
sufficient traction in the community in order to be included in Django
itself.
As such this proposal will specifically address some of the reasons for past

failures, and their remedies.

The API
---

First there is the API for defining multiple connections.  A new setting
will
be created ``DATABASES`` (or something similar), which is a dictionary
mapping
database alias(internal name) to a dictionary containing the current
``DATABASE_*`` settings:

.. sourcecode:: python

DATABASES = {
'default': {
'DATABASE_ENGINE': 'postgresql_psycopg2',
'DATABASE_NAME': 'my_data_base',
'DATABASE_USER': 'django',
'DATABASE_PASSWORD': 'super_secret',
}
'user': {
'DATABASE_ENGINE': 'sqlite3',
'DATABASE_NAME': '/home/django_projects/universal/users.db',
}
}

A database with the alias ``default`` will be the default connection(it will
be
used if no other one is specified for a query) and will be the direct
replacement for the ``DATABASE_*`` settings.  In compliance with Django's
deprecation policy the ``DATABASE_*`` will automatically be handled as if
they
were defined in the ``DATABASES`` dict for at least 2 releases.

Next a ``connections`` object will be implemented in ``django.db``, analgous

to the ``django.db.connection`` object, the ``connections`` one will be a
dictionary like object, that is subscripted by database alias, and lazily
returns a connection to the database.  ``django.db.connection`` will
remain(at
least for the present, it's ultimate state will be by community consensus)
and
merely proxy to ``django.db.connections['default']``.  Using the previously
defined database setting this might be used as:

.. sourcecode:: python

from django.db import connections

conn = connections['user']
c = conn.cursor()
results = c.execute("""SELECT 1""")
results.fetchall()

Now that there is the necessary infastructure to accompany the very low
level
plumbing we need our actual API.  The high level API will have 2
components.
First here will be a ``using()`` method on ``QuerySet`` and ``Manager``
objects.  This method simply takes an alias to a connection(and possibly a
connection object itself to allow for dynamic database usage) and makes that

the connection that will be used for that query.  Secondly, a new options
will
be created in the inner Meta class of models.  This option will be named
``using`` and specify the default connection to use for all queries against
this model, overiding the default specified in the settings:

.. sourcecode:: python

class MyUser(models.Model):
...
class Meta:
using = 'user'

# this queries the 'user' database
MyUser.objects.all()
# this queries the 'default' database
MyUser.objects.using('default')

Lastly, various plumbing will need to be updated to reflect the new multidb
API, such as transactions, breakpoints, management commands, etc.

More Advanced Usage
---

While the above two methods are strictly speaking sufficient they require
the
user to write lots of boilerplate code in order to implement advanced multi
database strategies such as replication and sharding.  Therefore we also
introduce the concept of ``DatabaseManagers``, not to be confused with
Django's
current managers.  DatabaseManagers are classes that define how what
connection
should be used for a given query.  There are 2 levels at which to specify
what
``DatabaseManager`` to use, as a setting, and at the class level.  For
example
in one's settings.py one might have:

.. sourcecode:: python

DEFAULT_DB_MANAGER = 'django.db.multidb.round_robin.Random'

This tells Django that for each query it should use the ``DatabaseManager``
specified at that location, unless it is overidden by the ``using`` Meta
option,
or the ``using()`` method.

The more granular way to use ``DatabaseManagers`` is to provide them, in
place
of a string, as the ``using`` Meta option.  Here we pass an instance of the
class we want to use:

.. sourcecode:: python

class MyModel(models.Model):
class Meta:
using = Random(['my_db1', 'my_db2', 'my_db2'])

At this level it 

Serving static files with handler-specific sendfile()

2009-03-20 Thread mrts

http://code.djangoproject.com/ticket/2131 tracks adding
support for efficiently serving files from within Django via
handler-specific wrapper for sendfile().

A new response class, HttpResponseSendFile is added for that
purpose.

In my humble opinion it should visibly and loudly break if
the handler does not support sendfile() -- I want to know if
my files are served efficiently or not. Conversely, it
should not degrade to an ordinary HttpResponse behaviour of
opening the file in Python and returning it's content (as an
iterable).

Under these conditions, HttpResponseSendFile implementation
is simple and clean. It's always handled specially in
handlers. If some third-party handler is unaware of it, it
should break as per the rationale given above.

However, that's exactly what Jacob was concerned about (i.e.
he raised backwards-compatibility concerns with existing
third-party handlers and requested that the degraded
compatible behaviour should be supported).

HttpResponseSendFile is a new feature that does not exist in
1.0.X. Nothing breaks by adding it per se. People who
attempt to use the new 1.1 feature with old third-party
handlers should expect it to break -- neither will aggregate
code work with 3rd-party db backends that haven't been updated
for 1.1.

If compatibility is required, the implementation will not be
as clean and straightforward:  unneccessary clutter
is required to duplicate the behaviour that's already
available in ordinary HttpRequest (e.g. duplicated open()
calls for the same file in different code paths -- smells
bad to me).

Thoughts?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---



Re: QuerySet.values() Shallow Copy

2009-03-20 Thread Vitaly Peressada

@Malcolm:

I agree with you that there are some holes in code - it was a quick
hack to solve issue at hand. I did suspect that there should be some
effort to implement this feature and tickets quoted confirm that. It
is too bad that as of now it has not done yet even though tickets
appear to be 2 years old. Is there anything I could to help, please
let me know.

On Mar 19, 7:04 pm, Malcolm Tredinnick 
wrote:
> On Thu, 2009-03-19 at 05:17 -0700, Vitaly wrote:
> > I wanted json serialize a tree of django model objects: Schedule ->
> > Player -> django.models.User.
> > django.core.serializers.serialize does shallow serialization of
> > QuerySet but I want a deep one. Next, I looked at QuerySet.values()
> > plus simplejson but alas the shallow copy again.
>
> So it's not about "copying" -- taking one Python object and creating a
> similar, but independent one -- at all. You're talking about how far
> down the relation chain we descend when retrieving data.
>
> Bob Thomas has already point out one ticket and there are some others
> opened regarding pulling related models in via a values() call
> (searching for tickets about values() will reveal a bunch of different
> directions and proposals).
>
> Your patch isn't particularly neat (examining the string representation
> of the output of type() to determine a class when isinstance() exists,
> for example). It also looks like it will fail for infinitely recursive
> structures (which exist in practical situations).
>
> Utlimately, though, I think this situation is solved by allowing
> select_related() to work with values() -- ticket #5768 is one reference
> to that. It's not a trivial problem to solve, but we'll fix it one day.
> Multi-valued relations, in particular, require care to make them work
> efficiently.
>
> Regards,
> Malcolm
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~--~~~~--~~--~--~---