Re: queryset caching note in docs

2011-11-02 Thread Javier Guerra Giraldez
On Wed, Nov 2, 2011 at 11:33 AM, Tom Evans  wrote:
>> other connections in other transactions are locked too?
>
> Yes. The exact wording from the C API:
>
> """
> On the other hand, you shouldn't use mysql_use_result() if you are
> doing a lot of processing for each row on the client side, or if the
> output is sent to a screen on which the user may type a ^S (stop
> scroll). This ties up the server and prevent other threads from
> updating any tables from which the data is being fetched.
> """

this seems to be the case with MyISAM tables; on the InnoDB engine
docs, it says that SELECT statements don't set any lock, since it
reads from a snapshot of the table.

on MyISAM, there are (clumsy) workarounds by forcing the use of
scratch tables, explicitly copying to temporary tables, or buffering
the output.

"""
SELECT ... FROM is a consistent read, reading a snapshot of the
database and setting no locks unless the transaction isolation level
is set to SERIALIZABLE. For SERIALIZABLE level, the search sets shared
next-key locks on the index records it encounters.
"""
(http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html)


-- 
Javier

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



RE: queryset caching note in docs

2011-11-02 Thread Kääriäinen Anssi
"""
so, summarizing again:
  - mysql supports chunked fetch but will lock the table while fetching is in 
progress (likely causing deadlocks)
  - postgresql does not seem to suffer this issue and chunked fetch seems 
doable (not trivial) using named cursor
  - oracle does chunked fetch already (someone confirm this, please)
  - sqlite3 COULD do chunked fetch by using one connection per cursor 
(otherwise cursors will not be isolated)
"""

I did a little testing. It seems you can get the behavior you want if you just 
do this in PostgreSQL:
for obj in Model.objects.all().iterator(): # Note the extra .iterator()
# handle object here.

What is happening? Django correctly uses cursor.fetchmany(chunk_size) in 
models/sql/compiler.py. The chunk_size is hardcoded to 100. The problem is in 
db/models/query.py, and its __iter__ method. __iter__ will keep 
self._results_cache, and that is where the memory is consumed. Changing that is 
not wise, as in many cases you do want to keep the results around. The 
.iterator() call will skip the __iter__ and directly access the underlying 
generator.

You can also do objects.all()[0:10].iterator() and objects are correctly 
fetched without caching.

Here is a printout from my tests. The memory report is the total process memory 
use:

Code:
i = 0
for obj in User.objects.all()[0:10]:
i += 1
if i % 1000 == 0:
print memory()

25780.0kB
26304.0kB
26836.0kB
27380.0kB
27932.0kB
28468.0kB
29036.0kB
29580.0kB
29836.0kB
30388.0kB

And then:
i = 0
for obj in User.objects.all()[0:10].iterator():
i += 1
if i % 1000 == 0:
print memory()

25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB
25216.0kB


This would be worth documenting, with maybe a better named method wrapping the 
.iterator(). I have no ideas for a better name, though.

I would sure like a verification to this test, I am tired and this seems like 
too easy of an fix. Or am I missing the problem?

 - Anssi

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Ryan McIntosh
I think the discussion actually went a bit sideways.  Is there value in a model 
method to return an iterator which pulls results from a temporary table that 
gets filled from a model query?  This puts the onus on the django-user to use 
the correct method.

Model.foo().bar().buffered() or .from_tmp()

peace,

Ryan

- Original Message -
From: "Marco Paolini" 
To: django-developers@googlegroups.com
Sent: Wednesday, November 2, 2011 12:11:41 PM GMT -06:00 US/Canada Central
Subject: Re: queryset caching note in docs

On 02/11/2011 17:33, Tom Evans wrote:
> On Wed, Nov 2, 2011 at 4:22 PM, Marco Paolini  wrote:
>> On 02/11/2011 17:12, Tom Evans wrote:
>>> If you do a database query that quickly returns a lot of rows from the
>>> database, and each row returned from the database requires long
>>> processing in django, and you use mysql_use_result, then other mysql
>>> threads are unable to update any table being used, where as if you do
>>> the same thing with mysql_store_result, the tables are unlocked as
>>> soon as the client has retrieved all the data from the server.
>>>
>> other connections in other transactions are locked too?
>
> Yes. The exact wording from the C API:
>
> """
> On the other hand, you shouldn't use mysql_use_result() if you are
> doing a lot of processing for each row on the client side, or if the
> output is sent to a screen on which the user may type a ^S (stop
> scroll). This ties up the server and prevent other threads from
> updating any tables from which the data is being fetched.
> """
>
> mysql treats the table as in use until the result is freed.
>
> If this behaviour was in place, then you wouldn't have even raised the
> original query - the approach you were using was to iterate through a
> result set and modify the table you are fetching from. With
> mysql_use_result, you would have deadlocked that table in the mysql
> server as soon as you tried to update it without first completing the
> first query.
Yeah, the discussion has drifted a bit from it's staring point

so, summarizing again:
  - mysql supports chunked fetch but will lock the table while fetching is in 
progress (likely causing deadlocks)
  - postgresql does not seem to suffer this issue and chunked fetch seems 
doable (not trivial) using named cursor
  - oracle does chunked fetch already (someone confirm this, please)
  - sqlite3 COULD do chunked fetch by using one connection per cursor 
(otherwise cursors will not be isolated)

Marco

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 17:33, Tom Evans wrote:

On Wed, Nov 2, 2011 at 4:22 PM, Marco Paolini  wrote:

On 02/11/2011 17:12, Tom Evans wrote:

If you do a database query that quickly returns a lot of rows from the
database, and each row returned from the database requires long
processing in django, and you use mysql_use_result, then other mysql
threads are unable to update any table being used, where as if you do
the same thing with mysql_store_result, the tables are unlocked as
soon as the client has retrieved all the data from the server.


other connections in other transactions are locked too?


Yes. The exact wording from the C API:

"""
On the other hand, you shouldn't use mysql_use_result() if you are
doing a lot of processing for each row on the client side, or if the
output is sent to a screen on which the user may type a ^S (stop
scroll). This ties up the server and prevent other threads from
updating any tables from which the data is being fetched.
"""

mysql treats the table as in use until the result is freed.

If this behaviour was in place, then you wouldn't have even raised the
original query - the approach you were using was to iterate through a
result set and modify the table you are fetching from. With
mysql_use_result, you would have deadlocked that table in the mysql
server as soon as you tried to update it without first completing the
first query.

Yeah, the discussion has drifted a bit from it's staring point

so, summarizing again:
 - mysql supports chunked fetch but will lock the table while fetching is in 
progress (likely causing deadlocks)
 - postgresql does not seem to suffer this issue and chunked fetch seems doable 
(not trivial) using named cursor
 - oracle does chunked fetch already (someone confirm this, please)
 - sqlite3 COULD do chunked fetch by using one connection per cursor (otherwise 
cursors will not be isolated)

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Tom Evans
On Wed, Nov 2, 2011 at 4:22 PM, Marco Paolini  wrote:
> On 02/11/2011 17:12, Tom Evans wrote:
>> If you do a database query that quickly returns a lot of rows from the
>> database, and each row returned from the database requires long
>> processing in django, and you use mysql_use_result, then other mysql
>> threads are unable to update any table being used, where as if you do
>> the same thing with mysql_store_result, the tables are unlocked as
>> soon as the client has retrieved all the data from the server.
>>
> other connections in other transactions are locked too?

Yes. The exact wording from the C API:

"""
On the other hand, you shouldn't use mysql_use_result() if you are
doing a lot of processing for each row on the client side, or if the
output is sent to a screen on which the user may type a ^S (stop
scroll). This ties up the server and prevent other threads from
updating any tables from which the data is being fetched.
"""

mysql treats the table as in use until the result is freed.

If this behaviour was in place, then you wouldn't have even raised the
original query - the approach you were using was to iterate through a
result set and modify the table you are fetching from. With
mysql_use_result, you would have deadlocked that table in the mysql
server as soon as you tried to update it without first completing the
first query.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 17:12, Tom Evans wrote:

On Wed, Nov 2, 2011 at 11:28 AM, Marco Paolini  wrote:

mysql can do chunked row fetching from server, but only one row at a time

curs = connection.cursor(CursorUseResultMixIn)
curs.fetchmany(100) # fetches 100 rows, one by one

Marco



The downsides to mysql_use_result over mysql_store_result are that the
mysql thread is locked and unavailable to do anything until the query
is completed and mysql_free_result has been called.

If you do a database query that quickly returns a lot of rows from the
database, and each row returned from the database requires long
processing in django, and you use mysql_use_result, then other mysql
threads are unable to update any table being used, where as if you do
the same thing with mysql_store_result, the tables are unlocked as
soon as the client has retrieved all the data from the server.


other connections in other transactions are locked too?

In other words, you trade off memory usage against scalability. If
some part of the ORM was reworked to use mysql_use_result, then we
would need to add appropriate docs to explain the dangers of this
approach.

yes indeed.

Scalability is also affected by python thread memory consumption,
it all depends on how big is the queryset being fetched and how often
that queryset is iterated

if you fetch a huge queryset in one chunk and python eats up say 1G
of heap, that's not going to scale well either.

caveats should be clearly documented for both approaches, I think

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Tom Evans
On Wed, Nov 2, 2011 at 11:28 AM, Marco Paolini  wrote:
> mysql can do chunked row fetching from server, but only one row at a time
>
> curs = connection.cursor(CursorUseResultMixIn)
> curs.fetchmany(100) # fetches 100 rows, one by one
>
> Marco
>

The downsides to mysql_use_result over mysql_store_result are that the
mysql thread is locked and unavailable to do anything until the query
is completed and mysql_free_result has been called.

If you do a database query that quickly returns a lot of rows from the
database, and each row returned from the database requires long
processing in django, and you use mysql_use_result, then other mysql
threads are unable to update any table being used, where as if you do
the same thing with mysql_store_result, the tables are unlocked as
soon as the client has retrieved all the data from the server.

In other words, you trade off memory usage against scalability. If
some part of the ORM was reworked to use mysql_use_result, then we
would need to add appropriate docs to explain the dangers of this
approach.

Cheers

Tom

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 15:18, Anssi Kääriäinen wrote:

On 11/02/2011 01:36 PM, Marco Paolini wrote:

maybe we could implement something like:

for obj in qs.all().chunked(100):
pass

.chunked() will automatically issue LIMITed SELECTs

that should work with all backends

I don't think that will be a performance improvement - this will get rid of the 
memory overhead in Django, but would lead to a lot of overhead in the DB. 
Assuming you are fetching 1 objects from the DB, you would issue these 
commands to the DB (I hope I got the idea correctly):

SELECT ID, ...
ORDER BY order
LIMIT 0 OFFSET 100

SELECT ID, ...
ORDER BY order
LIMIT 100 OFFSET 100
...
SELECT ID, ...
ORDER BY order
LIMIT 100 OFFSET 9900


For each query the DB will need to do:
- query parse & plan
- If the order is not indexed, a top N sort.
- Fetch the items (even in indexed case you will need to travel the index for 
the OFFSET which is not free at all).

So, for the last fetch the DB would need to travel the first 9900 items in the 
index (or worse, do a top 1 sort) and then return the 100 items wanted. 
This is going to be expensive in the DB. The trade-off of saving some memory 
Django side at the expense of doing a lot more work at the DB side is not a 
good one. DB resources are in general much more harder to scale than the Django 
resources.

You are going to do in total 1 + 9900 + 9800 + ... + 100 index travels in 
the DB, which equals to somewhere around 0.5 million items traveled in the 
index. In addition, you will do 100 parse + plan stages. You really don't want 
to do that. In addition if there are concurrent updates to the items, it might 
be you will miss some objects and see some objects as duplicates.


Yes, that's right,

qs.chunked() can be easily implemented this way:

i = 0
while True:
 for chunk in qs[i:i+CHUNK_SIZE]:
   pass
 if not chunk:
  break
 i += CHUNK_SIZE

we should find another way for avoiding this memory-hungriness issue for huge 
querysets

or at least we should document the issue ;)

currently oracle is the only backend that DOES chunked row fetch, all others, 
for different reasons,
load all rows in memory

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 14:36, Ian Kelly wrote:

On Wed, Nov 2, 2011 at 5:05 AM, Anssi Kääriäinen
  wrote:

For PostgreSQL this would be a nice feature. Any idea what MySQL and Oracle
do currently?


If I'm following the thread correctly, the oracle backend already does
chunked reads.  The default chunk size is 100 rows, IIRC.


yes, in Oracle it looks like rows are ALWAYS [1] fetched from the server in 
chunks when using fetchmany()

[1] http://cx-oracle.sourceforge.net/html/cursor.html#Cursor.arraysize

summarizing (real) chunked fetch capability by backend:

 currently | supported | how to implement
sqlite   N | Y (1) | using one connection per cursor + shared cache 
mode
postgres N | Y (2) | using named cursors
mysqlN | Y (3) | using custom cursor class
oracle   Y | Y | default behavior

(1) in sqlite cursors are not isolated, so we can't effectively use chunked 
fetchmany
(2) postgres supports chunks when using named cursors
(3) mysql supports only single-row chunks

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Anssi Kääriäinen

On 11/02/2011 01:36 PM, Marco Paolini wrote:

maybe we could implement something like:

for obj in qs.all().chunked(100):
   pass

.chunked() will automatically issue LIMITed SELECTs

that should work with all backends
I don't think that will be a performance improvement - this will get rid 
of the memory overhead in Django, but would lead to a lot of overhead in 
the DB. Assuming you are fetching 1 objects from the DB, you would 
issue these commands to the DB (I hope I got the idea correctly):


SELECT ID, ...
ORDER BY order
LIMIT 0 OFFSET 100

SELECT ID, ...
ORDER BY order
LIMIT 100 OFFSET 100
...
SELECT ID, ...
ORDER BY order
LIMIT 100 OFFSET 9900


For each query the DB will need to do:
  - query parse & plan
  - If the order is not indexed, a top N sort.
  - Fetch the items (even in indexed case you will need to travel the 
index for the OFFSET which is not free at all).


So, for the last fetch the DB would need to travel the first 9900 items 
in the index (or worse, do a top 1 sort) and then return the 100 
items wanted. This is going to be expensive in the DB. The trade-off of 
saving some memory Django side at the expense of doing a lot more work 
at the DB side is not a good one. DB resources are in general much more 
harder to scale than the Django resources.


You are going to do in total 1 + 9900 + 9800 + ... + 100 index 
travels in the DB, which equals to somewhere around 0.5 million items 
traveled in the index. In addition, you will do 100 parse + plan stages. 
You really don't want to do that. In addition if there are concurrent 
updates to the items, it might be you will miss some objects and see 
some objects as duplicates.


 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Ian Kelly
On Wed, Nov 2, 2011 at 5:05 AM, Anssi Kääriäinen
 wrote:
> For PostgreSQL this would be a nice feature. Any idea what MySQL and Oracle
> do currently?

If I'm following the thread correctly, the oracle backend already does
chunked reads.  The default chunk size is 100 rows, IIRC.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 12:05, Anssi Kääriäinen wrote:

On 11/02/2011 12:47 PM, Marco Paolini wrote:

if that option is true, sqlite shoud open one connection per cursor
and psycopg2 should use named cursors


The sqlite behavior leads to some problems with transaction management -
different connections, different transactions (or is there some sort of
"shared
transaction" in sqlite?). I would just fetch all the data in one go when
Yes there is a "shared cache mode" that makes all connections share the 
same transaction


but this API is not exposed to python by sqlite3

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 12:05, Anssi Kääriäinen wrote:

On 11/02/2011 12:47 PM, Marco Paolini wrote:

if that option is true, sqlite shoud open one connection per cursor
and psycopg2 should use named cursors


The sqlite behavior leads to some problems with transaction management -
different connections, different transactions (or is there some sort of "shared
transaction" in sqlite?). I would just fetch all the data in one go when using
sqlite. I wouldn't worry about performance problems when using sqlite, it is
meant mostly for testing when using Django.


This will cause some overhead for small querysets but will save some memory
for huge ones

For PostgreSQL this would be a nice feature. Any idea what MySQL and Oracle
do currently?


maybe we could implement something like:

for obj in qs.all().chunked(100):
 pass

.chunked() will automatically issue LIMITed SELECTs

that should work with all backends

this could be a no-op for sqlite, where
cursor are not isolated and you have bad bad performance for LIMITed SELECTs

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 12:05, Anssi Kääriäinen wrote:

On 11/02/2011 12:47 PM, Marco Paolini wrote:

if that option is true, sqlite shoud open one connection per cursor
and psycopg2 should use named cursors


The sqlite behavior leads to some problems with transaction management -
different connections, different transactions (or is there some sort of "shared
transaction" in sqlite?). I would just fetch all the data in one go when using
sqlite. I wouldn't worry about performance problems when using sqlite, it is
meant mostly for testing when using Django.


This will cause some overhead for small querysets but will save some memory
for huge ones

For PostgreSQL this would be a nice feature. Any idea what MySQL and Oracle
do currently?


mysql can do chunked row fetching from server, but only one row at a time

curs = connection.cursor(CursorUseResultMixIn)
curs.fetchmany(100) # fetches 100 rows, one by one

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Anssi Kääriäinen

On 11/02/2011 12:47 PM, Marco Paolini wrote:

if that option is true, sqlite shoud open one connection per cursor
and psycopg2 should use named cursors


The sqlite behavior leads to some problems with transaction management -
different connections, different transactions (or is there some sort of 
"shared
transaction" in sqlite?). I would just fetch all the data in one go when 
using

sqlite. I wouldn't worry about performance problems when using sqlite, it is
meant mostly for testing when using Django.


This will cause some overhead for small querysets but will save some memory
for huge ones

For PostgreSQL this would be a nice feature. Any idea what MySQL and Oracle
do currently?

 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 10:10, Luke Plant wrote:

On 02/11/11 08:48, Marco Paolini wrote:


thanks for pointing that to me, do you see this as an issue to be fixed?

If there is some interest, I might give it a try.

Maybe it's not fixable, at least I can investigate a bit


Apparently, the protocol between the Postgres client and server only
does partial sends when using named cursors, which Django doesn't use.
Using named cursors with psycopg2 in certainly possible, but probably
not trivial. That's as much as I know.

Source:

http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/

The author of that page did say that he was working on a patch.

ok, I'll try to contact the author

talking about sqlite3, it looks like the only way to isolate two cursors
is to use two different connections.

Let's imagine there is a way to implement this (I'm not sure at this point)

We could have an option somewhere that tells django to use or not use
chunked cursor read from db.

if that option is true, sqlite shoud open one connection per cursor
and psycopg2 should use named cursors

This will cause some overhead for small querysets but will save some memory
for huge ones

cheers,

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Luke Plant
On 02/11/11 08:48, Marco Paolini wrote:

> thanks for pointing that to me, do you see this as an issue to be fixed?
> 
> If there is some interest, I might give it a try.
> 
> Maybe it's not fixable, at least I can investigate a bit

Apparently, the protocol between the Postgres client and server only
does partial sends when using named cursors, which Django doesn't use.
Using named cursors with psycopg2 in certainly possible, but probably
not trivial. That's as much as I know.

Source:

http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/

The author of that page did say that he was working on a patch.

Luke

-- 
"The number you have dialled is imaginary.  Please rotate your
telephone by 90 degrees and try again."

Luke Plant || http://lukeplant.me.uk/

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Marco Paolini

On 02/11/2011 09:43, Luke Plant wrote:

On 02/11/11 00:41, Marco Paolini wrote:


so if you do this:

for obj in Entry.objects.all():
  pass

django does this:
  - creates a cursor
  - then calls fetchmany(100) until ALL rows are fetched
  - creates a list containing ALL fetched rows
  - passes this list to queryset instance for lazy model instance creation

I didn't know that. (maybe we should document it somewhere...)

Now that I do, I also know it's time to move to postgresql...


And you will then find that the behaviour of the psycopg2 adapter means
that you get very similar behaviour - all rows are fetched as soon as
you start iterating - even if you do .iterator().

thanks for pointing that to me, do you see this as an issue to be fixed?

If there is some interest, I might give it a try.

Maybe it's not fixable, at least I can investigate a bit

Cheers,

Marco

--
You received this message because you are subscribed to the Google Groups "Django 
developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.



Re: queryset caching note in docs

2011-11-02 Thread Luke Plant
On 02/11/11 00:41, Marco Paolini wrote:

> so if you do this:
> 
> for obj in Entry.objects.all():
>  pass
> 
> django does this:
>  - creates a cursor
>  - then calls fetchmany(100) until ALL rows are fetched
>  - creates a list containing ALL fetched rows
>  - passes this list to queryset instance for lazy model instance creation
> 
> I didn't know that. (maybe we should document it somewhere...)
> 
> Now that I do, I also know it's time to move to postgresql...

And you will then find that the behaviour of the psycopg2 adapter means
that you get very similar behaviour - all rows are fetched as soon as
you start iterating - even if you do .iterator().

Luke

-- 
"The number you have dialled is imaginary.  Please rotate your
telephone by 90 degrees and try again."

Luke Plant || http://lukeplant.me.uk/

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.