Hi Jorban. Thanks for taking the time to advise on this issue. The query I used is not the one that caused problem originally, I just chose it because it returns a large number of results. So I can't really change anything on this side.
Also the query runs fine by itself, without dictionaries loaded in memory. So I think that it would be nice to try and get a sense of how much big data structures loaded in memory impact large SQL queries. Benoit. On Thu, Dec 22, 2011 at 5:26 AM, Ferran Jorba <[email protected]> wrote: > Hello Benoit, > > [...] >> In [4]: %time res = run_sql("SELECT id_bibrec FROM bibrec_bib03x >> LIMIT 1000000")CPU times: user 1.96 s, sys: 0.06 s, total: 2.02 s >> Wall time: 2.30 s >> >> Any idea about why we're seeing this and how we can fix it? It is >> quite a big problem for us as our citation dictionaries are so big. > > I have noticed in more than one case that for some minimally complex > (?!) operations the bottleneck is MySQL, not Python, so if can move > part of the manipulation from one the other you have surprises. I > cannot remember the exact case, but the equivalent with yours should be > changing: > > res = run_sql("SELECT id_bibrec FROM bibrec_bib03x LIMIT 1000000") > > to: > > res = run_sql("SELECT id_bibrec FROM bibrec_bib03x") > res = res[:1000000] > > I remember gains of 10x. YMMV, but you can try it. > > Ferran -- Benoit Thiell The SAO/NASA Astrophysics Data System http://adswww.harvard.edu/

