Hi Jorban.

Thanks for taking the time to advise on this issue. The query I used
is not the one that caused problem originally, I just chose it because
it returns a large number of results. So I can't really change
anything on this side.

Also the query runs fine by itself, without dictionaries loaded in
memory. So I think that it would be nice to try and get a sense of how
much big data structures loaded in memory impact large SQL queries.

Benoit.

On Thu, Dec 22, 2011 at 5:26 AM, Ferran Jorba <[email protected]> wrote:
> Hello Benoit,
>
> [...]
>>     In [4]: %time res = run_sql("SELECT id_bibrec FROM bibrec_bib03x
>> LIMIT 1000000")CPU times: user 1.96 s, sys: 0.06 s, total: 2.02 s
>>     Wall time: 2.30 s
>>
>> Any idea about why we're seeing this and how we can fix it? It is
>> quite a big problem for us as our citation dictionaries are so big.
>
> I have noticed in more than one case that for some minimally complex
> (?!)  operations the bottleneck is MySQL, not Python, so if can move
> part of the manipulation from one the other you have surprises.  I
> cannot remember the exact case, but the equivalent with yours should be
> changing:
>
>  res = run_sql("SELECT id_bibrec FROM bibrec_bib03x LIMIT 1000000")
>
> to:
>
>  res = run_sql("SELECT id_bibrec FROM bibrec_bib03x")
>  res = res[:1000000]
>
> I remember gains of 10x.  YMMV, but you can try it.
>
> Ferran



-- 
Benoit Thiell
The SAO/NASA Astrophysics Data System
http://adswww.harvard.edu/

Reply via email to