Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

Adrian Price-Whelan Mon, 19 Jul 2010 10:53:45 -0700

Here is some more detailed information trying the query multiple ways:

Piping the command into psql and writing to a tmp file takes 12 seconds (tmp 
file is 241MB):


time echo "SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra, 
spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
'spPlate-3586-55181.fits';" | psql spectradb -U postgres > tmp

real    0m12.052s
user    0m2.501s
sys     0m0.274s

engine.execute on the same query took ~6 seconds:
spectra = engine.execute("SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS 
spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values, 
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk, 
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS 
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS 
spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk = 
spectrum.spectrum_header_pk WHERE spectrum_header.filename = 
'spPlate-3586-55181.fits';")

spectra = 
session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename == 
"spPlate-3665-55247.fits").all() clocked in at 489 seconds

Thanks, 
Adrian

On Jul 19, 2010, at 12:24 PM, David Gardner wrote:

> Try running that query directly against the database see how long that takes.
> Also try running explain on that query make sure it is using your indexes 
> properly.
> Since you are only using a single filter make sure that the 
> spectrum_header.filename
> has an index, and make sure your foreign key column 
> spectrum.spectrum_header_pk
> is indexed as well.
> 
> On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote:
>>> does it take a few seconds to fully fetch all the results and it only gets 
>>> 1000 rows ?  or is that just to get the initial result?
>>>     
>> I'm not sure what you mean by this - the query does return 1000 rows.
>> 
>>   
>>> also if any of the individual columns are very large BLOBs or perhaps very 
>>> large PG arrays that would add to the overhead of a fetch.
>>>     
>> There definitely are columns of PG arrays ~4000 elements each, so back
>> to my first email it seems like the culprit here could be the ARRAY's
>> 
>> Thanks for your help,
>> 
>> Adrian
>> 
>> On Jul 19, 10:10 am, Michael Bayer<[email protected]>  wrote:
>>   
>>> On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>>     
>>>> Hi,
>>>>       
>>>     
>>>> I was just wondering at first whether there was a known issue with
>>>> ARRAY types, but here is the query:
>>>> spectra =
>>>> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
>>>> == spPlateFilename).all()
>>>>       
>>>     
>>>> It should return ~1000 objects equaling about 120MB of data. In
>>>> Python, this query takes>10 minutes to complete,  but as a SQL query
>>>> (copying and pasting the echo'd command) it takes a few seconds:
>>>> SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
>>>> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
>>>> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
>>>> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
>>>> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
>>>> spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
>>>> spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
>>>> WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'
>>>>       
>>>     
>>>> autoflush and autocommit are both set to False.
>>>>       
>>>     
>>>> It seems like a straightforward query so I'm confused as to what could
>>>> be getting hung up.
>>>>       
>>> does it take a few seconds to fully fetch all the results and it only gets 
>>> 1000 rows ?  or is that just to get the initial result? these are different 
>>> things.   also if any of the individual columns are very large BLOBs or 
>>> perhaps very large PG arrays that would add to the overhead of a fetch.   
>>> You can also try writing a DBAPI-only script that runs the query, as well 
>>> as running engine.execute(myquery.statement) and fetching rows that way to 
>>> see if some in-object process is the factor (which is unlikely).
>>> 
>>> 
>>> 
>>> 
>>> 
>>>     
>>>> Thanks for any insight,
>>>>       
>>>     
>>>> Adrian
>>>>       
>>>     
>>>> On Jul 16, 10:24 pm, Michael Bayer<[email protected]>  wrote:
>>>>       
>>>>> You absolutely need to turn in echoing and locate the specific SQL query 
>>>>> which causes the issue.   Queries can take excessive time for a very wide 
>>>>> variety of reasons.
>>>>>         
>>>     
>>>>> On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:
>>>>>         
>>>     
>>>>>> Hello --
>>>>>>           
>>>     
>>>>>> I'm working with a database populated with data originally from a file 
>>>>>> structure of files that are ~150MB each. We are dealing with a lot of 
>>>>>> data that is being stored in the database using the 'ARRAY' datatype, 
>>>>>> specifically numeric[]. After loading some of the data into the database 
>>>>>> I tried performing a query to get back some data, and comparing it with 
>>>>>> code that reads directly from the file system - but the database query 
>>>>>> took ~50 times longer to complete. For instance, to retrieve 100 records 
>>>>>> that contain a few 4000 element arrays each using the code that reads 
>>>>>> the filesystem it took less than a second, but the query on the database 
>>>>>> took around 25 seconds to complete. Has anyone else had issues with 
>>>>>> array types slowing down queries or does this sound more like another 
>>>>>> issue?
>>>>>>           
>>>     
>>>>>> Thanks!
>>>>>> Adrian
>>>>>>           
>>>     
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "sqlalchemy" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to 
>>>>>> [email protected].
>>>>>> For more options, visit this group 
>>>>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>>>>>>           
>>>     
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "sqlalchemy" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to 
>>>> [email protected].
>>>> For more options, visit this group 
>>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>>>>       
>>   
> 
> 
> -- 
> David Gardner
> Pipeline Tools Programmer
> Jim Henson Creature Shop
> [email protected]
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Unknown Issue Causing Extremely Slow Query

Reply via email to