Here is some more detailed information trying the query multiple ways:
Piping the command into psql and writing to a tmp file takes 12 seconds (tmp
file is 241MB):
time echo "SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS
spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk =
spectrum.spectrum_header_pk WHERE spectrum_header.filename =
'spPlate-3586-55181.fits';" | psql spectradb -U postgres > tmp
real 0m12.052s
user 0m2.501s
sys 0m0.274s
engine.execute on the same query took ~6 seconds:
spectra = engine.execute("SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS
spectrum_ra, spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask, spectrum.or_mask AS
spectrum_or_mask FROM spectrum JOIN spectrum_header ON spectrum_header.pk =
spectrum.spectrum_header_pk WHERE spectrum_header.filename =
'spPlate-3586-55181.fits';")
spectra =
session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename ==
"spPlate-3665-55247.fits").all() clocked in at 489 seconds
Thanks,
Adrian
On Jul 19, 2010, at 12:24 PM, David Gardner wrote:
> Try running that query directly against the database see how long that takes.
> Also try running explain on that query make sure it is using your indexes
> properly.
> Since you are only using a single filter make sure that the
> spectrum_header.filename
> has an index, and make sure your foreign key column
> spectrum.spectrum_header_pk
> is indexed as well.
>
> On 07/19/2010 08:46 AM, Adrian Price-Whelan wrote:
>>> does it take a few seconds to fully fetch all the results and it only gets
>>> 1000 rows ? or is that just to get the initial result?
>>>
>> I'm not sure what you mean by this - the query does return 1000 rows.
>>
>>
>>> also if any of the individual columns are very large BLOBs or perhaps very
>>> large PG arrays that would add to the overhead of a fetch.
>>>
>> There definitely are columns of PG arrays ~4000 elements each, so back
>> to my first email it seems like the culprit here could be the ARRAY's
>>
>> Thanks for your help,
>>
>> Adrian
>>
>> On Jul 19, 10:10 am, Michael Bayer<[email protected]> wrote:
>>
>>> On Jul 19, 2010, at 9:52 AM, Adrian Price-Whelan wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>
>>>> I was just wondering at first whether there was a known issue with
>>>> ARRAY types, but here is the query:
>>>> spectra =
>>>> session.query(Spectrum).join(SpectrumHeader).filter(SpectrumHeader.filename
>>>> == spPlateFilename).all()
>>>>
>>>
>>>> It should return ~1000 objects equaling about 120MB of data. In
>>>> Python, this query takes>10 minutes to complete, but as a SQL query
>>>> (copying and pasting the echo'd command) it takes a few seconds:
>>>> SELECT spectrum.pk AS spectrum_pk, spectrum.ra AS spectrum_ra,
>>>> spectrum.dec AS spectrum_dec, spectrum.values AS spectrum_values,
>>>> spectrum.spectrum_header_pk AS spectrum_spectrum_header_pk,
>>>> spectrum.fiber_number AS spectrum_fiber_number, spectrum.inv_var AS
>>>> spectrum_inv_var, spectrum.and_mask AS spectrum_and_mask,
>>>> spectrum.or_mask AS spectrum_or_mask FROM spectrum JOIN
>>>> spectrum_header ON spectrum_header.pk = spectrum.spectrum_header_pk
>>>> WHERE spectrum_header.filename = 'spPlate-3586-55181.fits'
>>>>
>>>
>>>> autoflush and autocommit are both set to False.
>>>>
>>>
>>>> It seems like a straightforward query so I'm confused as to what could
>>>> be getting hung up.
>>>>
>>> does it take a few seconds to fully fetch all the results and it only gets
>>> 1000 rows ? or is that just to get the initial result? these are different
>>> things. also if any of the individual columns are very large BLOBs or
>>> perhaps very large PG arrays that would add to the overhead of a fetch.
>>> You can also try writing a DBAPI-only script that runs the query, as well
>>> as running engine.execute(myquery.statement) and fetching rows that way to
>>> see if some in-object process is the factor (which is unlikely).
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Thanks for any insight,
>>>>
>>>
>>>> Adrian
>>>>
>>>
>>>> On Jul 16, 10:24 pm, Michael Bayer<[email protected]> wrote:
>>>>
>>>>> You absolutely need to turn in echoing and locate the specific SQL query
>>>>> which causes the issue. Queries can take excessive time for a very wide
>>>>> variety of reasons.
>>>>>
>>>
>>>>> On Jul 16, 2010, at 12:56 PM, Adrian Price-Whelan wrote:
>>>>>
>>>
>>>>>> Hello --
>>>>>>
>>>
>>>>>> I'm working with a database populated with data originally from a file
>>>>>> structure of files that are ~150MB each. We are dealing with a lot of
>>>>>> data that is being stored in the database using the 'ARRAY' datatype,
>>>>>> specifically numeric[]. After loading some of the data into the database
>>>>>> I tried performing a query to get back some data, and comparing it with
>>>>>> code that reads directly from the file system - but the database query
>>>>>> took ~50 times longer to complete. For instance, to retrieve 100 records
>>>>>> that contain a few 4000 element arrays each using the code that reads
>>>>>> the filesystem it took less than a second, but the query on the database
>>>>>> took around 25 seconds to complete. Has anyone else had issues with
>>>>>> array types slowing down queries or does this sound more like another
>>>>>> issue?
>>>>>>
>>>
>>>>>> Thanks!
>>>>>> Adrian
>>>>>>
>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "sqlalchemy" group.
>>>>>> To post to this group, send email to [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group
>>>>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>>>>>>
>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "sqlalchemy" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group
>>>> athttp://groups.google.com/group/sqlalchemy?hl=en.
>>>>
>>
>
>
> --
> David Gardner
> Pipeline Tools Programmer
> Jim Henson Creature Shop
> [email protected]
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
>
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.