Hello Benoit,
[...]
> In [4]: %time res = run_sql("SELECT id_bibrec FROM bibrec_bib03x
> LIMIT 1000000")CPU times: user 1.96 s, sys: 0.06 s, total: 2.02 s
> Wall time: 2.30 s
>
> Any idea about why we're seeing this and how we can fix it? It is
> quite a big problem for us as our citation dictionaries are so big.
I have noticed in more than one case that for some minimally complex
(?!) operations the bottleneck is MySQL, not Python, so if can move
part of the manipulation from one the other you have surprises. I
cannot remember the exact case, but the equivalent with yours should be
changing:
res = run_sql("SELECT id_bibrec FROM bibrec_bib03x LIMIT 1000000")
to:
res = run_sql("SELECT id_bibrec FROM bibrec_bib03x")
res = res[:1000000]
I remember gains of 10x. YMMV, but you can try it.
Ferran