Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-27 Thread Tomas Vondra
Hi,

On 27.8.2013 06:06, Adam Ma'ruf wrote:
 Hi
 
 Thanks for the response.  I reran the query but first ran the statement
 you provided and set working mem to 2gb.  It ended up taking 133s and
 group aggregate was still used

OK.

 
 Here are the values you asked for:
 # - Planner Method Configuration -
 # - Planner Cost Constants -

All set to default, so seems fine to me.

 
 #seq_page_cost = 1.0# measured on an arbitrary scale
 #random_page_cost = 4.0# same scale as above
 #cpu_tuple_cost = 0.01# same scale as above
 #cpu_index_tuple_cost = 0.005# same scale as above
 #cpu_operator_cost = 0.0025# same scale as above
 #effective_cache_size = 6000MB

Well, if effective_cache_size is commented out, then it's still 128MB
(default). But I don't think that matters here.

 The output of select * from pg_statistics is large...should I attach it
 as a separate file (not sure if that's allowed on these mailing lists)

I haven't asked for pg_statistics dump. I asked for pg_settings (but I
already got most of the important pieces above).


 The data is ~2.5gb, I can't think of any place I can upload it.  I can

There's like a zillion of such places. E.g. Dropbox, Box, Wuala, Google
Drive, mega.co.nz or one of the many other alternatives. All of them
give you ~5GB space for free.

Or I could give you access to my FTP server, if that's what you prefer.

 provide the columns and data type.  it's a subset of public data from
 usaspending.gov http://usaspending.gov

Is there a simple way to download / filter the public data to get the
same dataset as you have?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-26 Thread Pavel Stehule
Hello


It is little bit strange - can you send a info about your PostgreSQL
version, send a query, and table description?

In this case, PostgreSQL should to use a hash aggregate, but from some
strange reason, pg didn't do it.

Second strange issue is speed of external sort - it is less than I can
expect.

What I know - a usual advice for MS Win is setting minimal shared bufferes
- 512MB can be too much there.

Regards

Pavel Stehule


2013/8/26 Adam Ma'ruf adam.ma...@gmail.com

 Hi,

 I wasn't whether or not to mail to the novice mailing list of this one.
  Since this is performance related I'm posting it here, but I am definitely
 a novice at postgresql - converting from mssql just now.

 I have a ~2.5gb table with ~5M rows of data.  A query that groups by two
 fields and sums a floating field takes approximately 122 seconds.  The
 equivalent query takes ~ 8seconds in my previous sql server express
 installation.

 I've tried to vary the parameters in postgresql.conf:
 I've tried wavering shared buffers from 512mb to 4000mb
 and working_mem from 64mb to 4000mb (i thought this might be the answer
 since the execution plan (referenced below) indicates that the sort relies
 on an External Merge Disk method)
 I've increased the default_statistics_target  to 1 and full vacuum
 analyzed
 I realize there are no indexes on this table.  My main concern is why I
 can't get this to run as fast as in sql server express (which also has no
 indexes, and the same query takes about 8 seconds)

 My system:  Windows Professional 64-bit
 8 gb of ram
 Intel i5-220M CPU @ 2.5GHz

 Here is the link to the execution plan:  http://explain.depesz.com/s/Ytx3

 Thanks a lot in advance and do let me know if you require any more
 information to make an informed opinion,
 A



Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-26 Thread Adam Ma'ruf
Sure

I just upgraded to 9.2.4.  The query is:
SELECTquebec_four
, sierra
, SUM(dollaramount) as dollaramount
  FROM alpha_quebec_echo
  GROUP BY   quebec_four
 , sierra

alpha_quebec_echo has 5,409,743 rows and 39 columns.  Quebec_four and
sierra are both varchar, dollar amount is a floating point field.  It has
no indexes (but neither did the mssql express table).  Any other details
you need?

Thanks,
A


On Mon, Aug 26, 2013 at 2:36 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello


 It is little bit strange - can you send a info about your PostgreSQL
 version, send a query, and table description?

 In this case, PostgreSQL should to use a hash aggregate, but from some
 strange reason, pg didn't do it.

 Second strange issue is speed of external sort - it is less than I can
 expect.

 What I know - a usual advice for MS Win is setting minimal shared bufferes
 - 512MB can be too much there.

 Regards

 Pavel Stehule


 2013/8/26 Adam Ma'ruf adam.ma...@gmail.com

 Hi,

 I wasn't whether or not to mail to the novice mailing list of this one.
  Since this is performance related I'm posting it here, but I am definitely
 a novice at postgresql - converting from mssql just now.

 I have a ~2.5gb table with ~5M rows of data.  A query that groups by two
 fields and sums a floating field takes approximately 122 seconds.  The
 equivalent query takes ~ 8seconds in my previous sql server express
 installation.

 I've tried to vary the parameters in postgresql.conf:
 I've tried wavering shared buffers from 512mb to 4000mb
 and working_mem from 64mb to 4000mb (i thought this might be the answer
 since the execution plan (referenced below) indicates that the sort relies
 on an External Merge Disk method)
 I've increased the default_statistics_target  to 1 and full vacuum
 analyzed
 I realize there are no indexes on this table.  My main concern is why I
 can't get this to run as fast as in sql server express (which also has no
 indexes, and the same query takes about 8 seconds)

 My system:  Windows Professional 64-bit
 8 gb of ram
 Intel i5-220M CPU @ 2.5GHz

 Here is the link to the execution plan:  http://explain.depesz.com/s/Ytx3

 Thanks a lot in advance and do let me know if you require any more
 information to make an informed opinion,
 A





Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-26 Thread Tomas Vondra
On 26 Srpen 2013, 15:02, Adam Ma'ruf wrote:
 Sure

 I just upgraded to 9.2.4.  The query is:
 SELECTquebec_four
 , sierra
 , SUM(dollaramount) as dollaramount
   FROM alpha_quebec_echo
   GROUP BY   quebec_four
  , sierra

 alpha_quebec_echo has 5,409,743 rows and 39 columns.  Quebec_four and
 sierra are both varchar, dollar amount is a floating point field.  It has
 no indexes (but neither did the mssql express table).  Any other details
 you need?

 Thanks,
 A

Hi,

It's quite clear why the query is so slow - the plan is using on-disk sort
with ~5M rows, and that's consuming a lot of time (almost 120 seconds).

I'm wondering why it chose the sort in the first place. I'd guess it'll
choose hash aggregate, which does not require sorted input.

Can you try running set enable_sort = false and then explain of the query?

If that does not change the plan to HashAggregate instead of
GroupAggregate, please check and post values of enable_* and cost_*
variables.

Another question is why it's doing the sort on disk and not in memory. The
explain you've posted shows it requires ~430MB on disk, and in my
experience it usually requires ~3x that much to do the in-memory sort.

I see you've set work_mem=4GB, is that correct? Can you try with a lower
value - say, 1 or 2GB? I'm not sure how this works on Windows, though.
Maybe there's some other limit (and SQL Server is not hitting it, because
it's native Windows application).

Can you prepare a testcase (table structure + data) and post it somewhere?
Or at least the structure, if it's not possible to share the data.

Also, output from select * from pg_settings would be helpful.

Tomas




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance on simple queries compared to sql server express

2013-08-26 Thread Adam Ma'ruf
Hi

Thanks for the response.  I reran the query but first ran the statement you
provided and set working mem to 2gb.  It ended up taking 133s and group
aggregate was still used

Here are the values you asked for:
# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#seq_page_cost = 1.0 # measured on an arbitrary scale
#random_page_cost = 4.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
#effective_cache_size = 6000MB


The output of select * from pg_statistics is large...should I attach it as
a separate file (not sure if that's allowed on these mailing lists)

The data is ~2.5gb, I can't think of any place I can upload it.  I can
provide the columns and data type.  it's a subset of public data from
usaspending.gov

column_name,   datatype,  ordinal
position, nullable?
 idx integer 1 YES  obligatedamount double precision 2 YES
baseandexercisedoptionsvalue double precision 3 YES
baseandalloptionsvalue double
precision 4 YES  maj_fund_agency_cat character varying 5 YES
contractingofficeagencyid character varying 6 YES  contractingofficeid
character
varying 7 YES  fundingrequestingagencyid character varying 8 YES
fundingrequestingofficeid character varying 9 YES  signeddate date 10 YES
effectivedate date 11 YES  currentcompletiondate date 12 YES
ultimatecompletiondate date 13 YES  lastdatetoorder character varying 14 YES
typeofcontractpricing character varying 15 YES  multiyearcontract character
varying 16 YES  vendorname character varying 17 YES  dunsnumber character
varying 18 YES  parentdunsnumber character varying 19 YES  psc_cat character
varying 20 YES  productorservicecode character varying 21 YES
principalnaicscode character varying 22 YES  piid character varying 23 YES
modnumber character varying 24 YES  fiscal_year character varying 25 YES
idvpiid character varying 26 YES  extentcompeted character varying 27 YES
numberofoffersreceived double precision 28 YES  competitiveprocedures character
varying 29 YES  solicitationprocedures character varying 30 YES
evaluatedpreference character varying 31 YES  firm8aflag character varying
32 YES  sdbflag character varying 33 YES
issbacertifiedsmalldisadvantagedbusiness character varying 34 YES
womenownedflag character varying 35 YES  veteranownedflag character varying
36 YES  minorityownedbusinessflag character varying 37 YES  data_source text
38 YES  psc_cd character varying 39 YES




On Mon, Aug 26, 2013 at 9:40 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 26 Srpen 2013, 15:02, Adam Ma'ruf wrote:
  Sure
 
  I just upgraded to 9.2.4.  The query is:
  SELECTquebec_four
  , sierra
  , SUM(dollaramount) as dollaramount
FROM alpha_quebec_echo
GROUP BY   quebec_four
   , sierra
 
  alpha_quebec_echo has 5,409,743 rows and 39 columns.  Quebec_four and
  sierra are both varchar, dollar amount is a floating point field.  It has
  no indexes (but neither did the mssql express table).  Any other details
  you need?
 
  Thanks,
  A

 Hi,

 It's quite clear why the query is so slow - the plan is using on-disk sort
 with ~5M rows, and that's consuming a lot of time (almost 120 seconds).

 I'm wondering why it chose the sort in the first place. I'd guess it'll
 choose hash aggregate, which does not require sorted input.

 Can you try running set enable_sort = false and then explain of the
 query?

 If that does not change the plan to HashAggregate instead of
 GroupAggregate, please check and post values of enable_* and cost_*
 variables.

 Another question is why it's doing the sort on disk and not in memory. The
 explain you've posted shows it requires ~430MB on disk, and in my
 experience it usually requires ~3x that much to do the in-memory sort.

 I see you've set work_mem=4GB, is that correct? Can you try with a lower
 value - say, 1 or 2GB? I'm not sure how this works on Windows, though.
 Maybe there's some other limit (and SQL Server is not hitting it, because
 it's native Windows application).

 Can you prepare a testcase (table structure + data) and post it somewhere?
 Or at least the structure, if it's not possible to share the data.

 Also, output from select * from pg_settings would be helpful.

 Tomas