On 03/01/2012 10:51 PM, Marti Raudsepp wrote:
The problem with IN() and ARRAY[] is that the whole list of numbers
has to be parsed by the SQL syntax parser, which has significant
memory and CPU overhead (it has to accept arbitrary expressions in the
list). But there's a shortcut around the parser: you can pass in the
list as an array literal string, e.g:
select * from the_table where id = ANY('{1,2,3,4,5}')
OK, that explains the memory usage.
Luckily there is no need to do any tricks. The question I was trying to seek answer for was should there be some default batch size for inserts and deletes in Django, and the answer seems clear: the problems appear only when the batch sizes are enormous, so there doesn't seem to be a reason to have default limits. Actually, the batch sizes are so large that it is likely the Python process will OOM before you can trigger problems in the DB.The SQL parser considers the value one long string and passes it to the array input function, which is a much simpler routine. This should scale up much better.Even better if you could pass in the array as a query parameter, so the SQL parser doesn't even see the long string -- but I think you have to jump through some hoops to do that in psycopg2.
- Anssi -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
