date:20110318

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-03-18 Thread Scott Marlowe

On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula  wrote:
> autovacuum                   = on
> autovacuum_vacuum_cost_delay = 20
>
> vacuum_cost_delay            = 20
> vacuum_cost_limit            = 600
>
> autovacuum_naptime           = 10

also, if vacuum can't keep up you can increase the vacuum cost limit,
and lower the cost delay.  Anything above 1ms is still quite a wait
compared to 0.  And most systems don't have the real granularity to go
that low anyway, so 5ms is about as low as you can go and get a change
before 0.  Also, if you've got a lot of large relations you might need
to increase the max workers as well.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-03-18 Thread Scott Marlowe

On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula  wrote:
> I have a large table but not as large as the kind of numbers that get
> discussed on this list. It has 125 million rows.
>
> REINDEXing the table takes half a day, and it's still not finished.
>
> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>
>    select count(*) from links;
>       count
>    ---
>     125418191
>    (1 row)
>
>    Time: 1270405.373 ms
>
> That's 1270 seconds!
>
> I suppose the vaccuum analyze is not doing its job? As you can see
> from settings below, I have autovacuum set to ON, and there's also a
> cronjob every 10 hours to do a manual vacuum analyze on this table,
> which is largest.
>
> PG is version 8.2.9.
>
> Any thoughts on what I can do to improve performance!?
>
> Below are my settings.
>
>
>
> max_connections              = 300
> shared_buffers               = 500MB
> effective_cache_size         = 1GB
> max_fsm_relations            = 1500
> max_fsm_pages                = 95
>
> work_mem                     = 100MB

What is the output of running vacuum verbose as a superuser (you can
run it on the postgres database so it returns fast.)  We're looking
for the output that looks like this:

INFO:  free space map contains 1930193 pages in 749 relations
DETAIL:  A total of 1787744 page slots are in use (including overhead).
1787744 page slots are required to track all free space.
Current limits are:  1000 page slots, 3000 relations, using 58911 kB.

If the space needed exceeds page slots then you need to crank up your
free space map.  If the relations exceeds the available then you'll
need to crank up max relations.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] REINDEX takes half a day (and still not complete!)

2011-03-18 Thread Phoenix Kiula

I have a large table but not as large as the kind of numbers that get
discussed on this list. It has 125 million rows.

REINDEXing the table takes half a day, and it's still not finished.

To write this post I did "SELECT COUNT(*)", and here's the output -- so long!

select count(*) from links;
   count
---
 125418191
(1 row)

Time: 1270405.373 ms

That's 1270 seconds!

I suppose the vaccuum analyze is not doing its job? As you can see
from settings below, I have autovacuum set to ON, and there's also a
cronjob every 10 hours to do a manual vacuum analyze on this table,
which is largest.

PG is version 8.2.9.

Any thoughts on what I can do to improve performance!?

Below are my settings.



max_connections  = 300
shared_buffers   = 500MB
effective_cache_size = 1GB
max_fsm_relations= 1500
max_fsm_pages= 95

work_mem = 100MB
temp_buffers = 4096
authentication_timeout   = 10s
ssl  = off
checkpoint_warning   = 3600
random_page_cost = 1

autovacuum   = on
autovacuum_vacuum_cost_delay = 20

vacuum_cost_delay= 20
vacuum_cost_limit= 600

autovacuum_naptime   = 10
stats_start_collector= on
stats_row_level  = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01

wal_buffers  = 64
checkpoint_segments  = 128
checkpoint_timeout   = 900
fsync= on
maintenance_work_mem = 512MB

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Scott Marlowe

On Fri, Mar 18, 2011 at 10:32 AM, Scott Marlowe  wrote:
> On Fri, Mar 18, 2011 at 6:44 AM, Arjen van der Meijden
>  wrote:
>> On 18-3-2011 10:11, Scott Marlowe wrote:
>>>
>>> On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
>>>   wrote:

 On 18-3-2011 4:02 Scott Marlowe wrote:
 We have several 1U boxes (mostly Dell and Sun) running and had several in
 the past. And we've never had any heating problems with them. That
 includes
 machines with more power hungry processors than are currently available,
 all
 power slurping FB-dimm slots occupied and two raid cards installed.
>>>
>>> Note I am talking specifically about the ability to cool the RAID
>>> card, not the CPUS etc.  Many 1U boxes have poor air flow across the
>>> expansion slots for PCI / etc cards, while doing a great job cooling
>>> the CPUs and memory.  If you don't use high performance RAID cards
>>> (LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
>>> and look at the air flow for the expansion slots, it's often just not
>>> very much.
>>>
>>
>> I was referring to amongst others two machines that have both a Dell Perc
>> 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those
>> also had processors that produce quite some heat (2x X5160 and 2x X5355)
>> combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of
>> heat. Those Dell Perc's are similar to the LSI's from the same period in
>> time.
>>
>> So the produced heat form the other components was already pretty high.
>> Still, I've seen no problems with heat for any component, including all four
>> raid controllers. But I agree, there are some 1U servers that skimp on fans
>> and thus air flow in the system. We've not had that problem with any of our
>> systems. But both Sun and Dell seem to add quite a bit of fans in the middle
>> of the system, where others may do it a bit less heavy duty and less
>> over-dimensioned.
>
> Most machines have different pathways for cooling airflow over their
> RAID cards, and they don't share that air flow with the CPUs.  Also,
> the PERC RAID controllers do not produce a lot of heat.  The CPUs on
> the high performance LSI or Areca controllers are often dual core high
> performance CPUs in their own right, and those cards have heat sinks
> with fans on them to cool them.  The cards themselves are what make so
> much heat and don't get enough cooling in many 1U servers.  It has
> nothing to do with what else is in the server, again because the
> airflow for the cards is usually separate.

As a followup to this subject, the problem wasn't bad until the server
load increased, thus increasing the load on the LSI MegaRAID card, at
which point it started producing more heat than it had before.  When
the machine wasn't working too hard the LSI was fine.  Once we started
hitting higher and higher load is when the card had issues.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Kevin Grittner

Red Maple  wrote:

> Here is my function. If I comment out the update then it would run
> all the cores, if not then only one core will run

> CREATE OR REPLACE FUNCTION

> [...]

>   select sysuptime
> into this_sysuptime
> from ap_sysuptime
> where ap_id = this_id
> for update;
> 
>   -- ==
>   --  if I comment out the next update
>   --    then all cores will be running,
>   --    else only one core will be running
>   -- ==
>   update ap_sysuptime
> set sysuptime  = this_sysuptime,
> last_contacted = now()
> where ap_id = this_id;

This proves that you're not showing us the important part.  The
update locks the same row previously locked by the SELECT FOR
UPDATE, so any effect at the row level would be a serialization
failure based on a write conflict, which doesn't sound like your
problem.  They get different locks at the table level, though:

http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html#LOCKING-TABLES

Somewhere in code you're not showing us you're acquiring a lock on
the ap_sysuptime table which conflicts with a ROW EXCLUSIVE lock but
not with a ROW SHARE lock.  The lock types which could do that are
SHARE and SHARE ROW EXCLUSIVE.  CREATE INDEX (without CONCURRENTLY)
could do that; otherwise it seems that you would need to be
explicitly issuing a LOCK statement at one of these levels somewhere
in your transaction.  That is what is causing the transactions to
run one at a time.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Fastest pq_restore?

2011-03-18 Thread Andy Colson


On 3/18/2011 9:38 AM, Kevin Grittner wrote:

Andy Colson  wrote:

On 03/17/2011 09:25 AM, Michael Andreasen wrote:



I've been looking around for information on doing a pg_restore as
fast as possible.



bgwriter_lru_maxpages = 0


I hadn't thought much about that last one -- do you have benchmarks
to confirm that it helped with a bulk load?



Nope, I got it from the "running with scissors" thread (I think), (maybe 
from Greg Smith)



or here:

http://rhaas.blogspot.com/2010/06/postgresql-as-in-memory-only-database_24.html

I dont recall exactly.  I saw it, add added a comment to my .conf just 
incase I ever needed it.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Red Maple

Hi,

Here is my function. If I comment out the update then it would run all the
cores, if not then only one core will run


CREATE OR REPLACE FUNCTION my_update_device(this_mac text, number_of_devices
integer, this_sysuptime integer)
  RETURNS integer AS
$BODY$
   DECLARE
fake_mac macaddr;
this_id integer;
new_avgld integer;
 BEGIN
 new_avgld = (this_sysuptime / 120) % 100;
 for i in 1..Number_of_devices loop
   fake_mac = substring(this_mac from 1 for 11) ||  ':' ||
upper(to_hex((i-1)/256)) || ':' || upper(to_hex((i-1)%256));
   select into this_id id from ap where lan_mac =
upper(fake_mac::text);
   if not found then
  return -1;
   end if;
   select into this_sysuptime sysuptime from ap_sysuptime where
ap_id = this_id for update;
-- 
==
--  if I comment out the next update then all cores will be running,
else only one core will be running
-- 
==
  update ap_sysuptime set sysuptime = this_sysuptime, last_contacted
= now() where ap_id = this_id;
  select into new_avgld avg_ld_1min from colubris_device
where node_id = this_id for update;
  new_avgld = (this_avgld / 120 ) % 100;
 end loop;
  return this_id;
  END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;






On Fri, Mar 18, 2011 at 12:06 PM, Kevin Grittner <
kevin.gritt...@wicourts.gov> wrote:

> [rearranged - please don't top-post]
>
> [also, bringing this back to the list - please keep the list copied]
>
> Red Maple  wrote:
> > Kevin Grittner  wrote:
>
> >> It should be parallel by default.  Are you taking out any
> >> explicit locks?
>
> > my clients use psql to remotely run an update function on the
> > postgres server. Each client run its own psql to connect to the
> > server. What I have noticed is that if I commented out the update
> > in the function so that only query is being done then all the core
> > would kick in and run at 100%. However if I allow the update on
> > the function then only one core would run.
>
> > Currently it take 40min to update all the client statistics
>
> Please show us the part you commented out to get the faster run
> time, and the source code for the function you mentioned.
>
> > Do you know if I have configured something incorrectly?
> >
> > I am running postgres 9.0.2 on fedora core 14. Here is my
> > postgres.conf file
> >
> >
> > [over 500 lines of configuration, mostly comments, wrapped]
>
> If you're going to post that, please strip the comments or post the
> results of this query:
>
>  http://wiki.postgresql.org/wiki/Server_Configuration
>
> I don't think anything in your configuration will affect this
> particular problem, but it seems likely that you could do some
> overall tuning.  If you want to do that, you should probably start a
> new thread after this issue is sorted out.
>
> -Kevin
>
>

Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Scott Marlowe

On Fri, Mar 18, 2011 at 6:44 AM, Arjen van der Meijden
 wrote:
> On 18-3-2011 10:11, Scott Marlowe wrote:
>>
>> On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
>>   wrote:
>>>
>>> On 18-3-2011 4:02 Scott Marlowe wrote:
>>> We have several 1U boxes (mostly Dell and Sun) running and had several in
>>> the past. And we've never had any heating problems with them. That
>>> includes
>>> machines with more power hungry processors than are currently available,
>>> all
>>> power slurping FB-dimm slots occupied and two raid cards installed.
>>
>> Note I am talking specifically about the ability to cool the RAID
>> card, not the CPUS etc.  Many 1U boxes have poor air flow across the
>> expansion slots for PCI / etc cards, while doing a great job cooling
>> the CPUs and memory.  If you don't use high performance RAID cards
>> (LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
>> and look at the air flow for the expansion slots, it's often just not
>> very much.
>>
>
> I was referring to amongst others two machines that have both a Dell Perc
> 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those
> also had processors that produce quite some heat (2x X5160 and 2x X5355)
> combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of
> heat. Those Dell Perc's are similar to the LSI's from the same period in
> time.
>
> So the produced heat form the other components was already pretty high.
> Still, I've seen no problems with heat for any component, including all four
> raid controllers. But I agree, there are some 1U servers that skimp on fans
> and thus air flow in the system. We've not had that problem with any of our
> systems. But both Sun and Dell seem to add quite a bit of fans in the middle
> of the system, where others may do it a bit less heavy duty and less
> over-dimensioned.

Most machines have different pathways for cooling airflow over their
RAID cards, and they don't share that air flow with the CPUs.  Also,
the PERC RAID controllers do not produce a lot of heat.  The CPUs on
the high performance LSI or Areca controllers are often dual core high
performance CPUs in their own right, and those cards have heat sinks
with fans on them to cool them.  The cards themselves are what make so
much heat and don't get enough cooling in many 1U servers.  It has
nothing to do with what else is in the server, again because the
airflow for the cards is usually separate.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Kevin Grittner

[rearranged - please don't top-post]

[also, bringing this back to the list - please keep the list copied]
 
Red Maple  wrote:
> Kevin Grittner  wrote:
 
>> It should be parallel by default.  Are you taking out any
>> explicit locks?
 
> my clients use psql to remotely run an update function on the
> postgres server. Each client run its own psql to connect to the
> server. What I have noticed is that if I commented out the update
> in the function so that only query is being done then all the core
> would kick in and run at 100%. However if I allow the update on
> the function then only one core would run.
 
> Currently it take 40min to update all the client statistics
 
Please show us the part you commented out to get the faster run
time, and the source code for the function you mentioned.
 
> Do you know if I have configured something incorrectly?
> 
> I am running postgres 9.0.2 on fedora core 14. Here is my
> postgres.conf file
> 
> 
> [over 500 lines of configuration, mostly comments, wrapped]
 
If you're going to post that, please strip the comments or post the
results of this query:
 
  http://wiki.postgresql.org/wiki/Server_Configuration  
 
I don't think anything in your configuration will affect this
particular problem, but it seems likely that you could do some
overall tuning.  If you want to do that, you should probably start a
new thread after this issue is sorted out.
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Claudio Freire

On Fri, Mar 18, 2011 at 3:19 AM, Jesper Krogh  wrote:
> * Dependent on your workload of-course, you're typically not
>  bottlenecked by the amount of cpu-cores, so strive for fewer
>  faster cores.

Depending on your workload again, but faster memory is even more
important than faster math.

So go for the architecture with the fastest memory bus.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread tv

> Thanks , it works now .. :-)
>
> Here is the output :
>
> pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like
> '%2010%' and
> pdc_uima-# content_language='en' and content is not null and
> isprocessable = 1 and
> pdc_uima-# to_tsvector('english',content) @@
> to_tsquery('english','Mujahid' || ' | '
> pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' |
> '
> pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' ||
> ' | '
> pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' ||
> 'crpf' || ' | '
> pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );
>
>  count
> 
>  137193
> (1 row)
>
> Time: 195441.894 ms
>
>
> But my original query is to use AND also i.e

Hi, just replace "AND" and "OR" (used with LIKE operator) for "&" and "|"
(used with to_tsquery).

So this

(content like '%Militant%' OR content like '%jihad%') AND (content like
'%kill%' OR content like '%injure%')

becomes

to_tsvector('english',content) @@ to_tsquery('english', '(Militant |
jihad) & (kill | injure)')

BTW it seems you somehow believe you'll get exactly the same result from
those two queries (LIKE vs. tsearch) - that's false expectation. I believe
the fulltext query is much better and more appropriate in this case, just
don't expect the same results.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Claudio Freire

On Fri, Mar 18, 2011 at 7:52 AM, Vitalii Tymchyshyn  wrote:
> 18.03.11 09:15, Anssi Kääriäinen написав(ла):
> Hello.
>
> If your queries work on single attribute, you can try adding partial indexes
> for different attributes. Note that in this case parameterized statements
> may prevent index usage, so check also with attribute id inlined.

And if your queries work on a single entity instead, you can partition
the table per-entity thus "teach" the database enging about the
correlation.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Fastest pq_restore?

2011-03-18 Thread Kevin Grittner

Andy Colson  wrote:
> On 03/17/2011 09:25 AM, Michael Andreasen wrote:

>> I've been looking around for information on doing a pg_restore as
>> fast as possible.

>> I am using a twin processor box with 2GB of memory

>> shared_buffers = 496MB

Probably about right.

>> maintenance_work_mem = 160MB

You might get a benefit from a bit more there; hard to say what's
best with so little RAM.

>> checkpoint_segments = 30

This one is hard to call without testing.  Oddly, some machines do
better with the default of 3.  Nobody knows why.

>> autovacuum = false
>> full_page_writes=false

Good.

> fsync = off
> synchronous_commit = off

Absolutely.

> bgwriter_lru_maxpages = 0

I hadn't thought much about that last one -- do you have benchmarks
to confirm that it helped with a bulk load?

You might want to set max_connections to something lower to free up
more RAM for caching, especially considering that you have so little
RAM.

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Nicholson, Brad (Toronto, ON, CA)

>From: pgsql-performance-ow...@postgresql.org 
>[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Red Maple
>Sent: Friday, March 18, 2011 9:05 AM
>To: pgsql-performance@postgresql.org
>Subject: [PERFORM] Help: massive parallel update to the same table
>
>Hi all,
>
>Our system has a postgres database that has a table for statistic which is 
>updated every hour by about 10K clients. Each client only make update to its 
>own row in the table. So far >I am only seeing one core out of eight cores on 
>my server being active which tells me that the update is being done serial 
>instead of being parallel. Do you know if there is a way >for me to make these 
>independent updates happen in parallel?
>
>Thank you, your help is very much appreciated!

If they are all happening on one core, you are probably using one DB connection 
to do the updates.  To split them across multiple cores, you need to use 
multiple DB connections.  Be careful if/when you restructure things to filter 
these requests into a reasonable number of backend DB connections - turning a 
huge number of clients loose against a DB is not going end well.  

Brad.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Kevin Grittner

Red Maple  wrote:

> Our system has a postgres database that has a table for statistic
> which is updated every hour by about 10K clients. Each client only
> make update to its own row in the table. So far I am only seeing
> one core out of eight cores on my server being active which tells
> me that the update is being done serial instead of being parallel.
> Do you know if there is a way for me to make these independent
> updates happen in parallel?

It should be parallel by default.  Are you taking out any explicit
locks?

Also, it seems like you're only doing about three updates per
second.  I would expect a single-row update to run in a couple ms or
less, so it would be rare that two requests would be active at the
same time, so you wouldn't often see multiple cores active at the
same time.  (Of course, the background writer, autovacuum, etc.,
should occasionally show up concurrently with update queries.)

Is there some particular problem you're trying to solve?  (For
example, is something too slow?)

-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Red Maple

Hi all,

Our system has a postgres database that has a table for statistic which is
updated every hour by about 10K clients. Each client only make update to its
own row in the table. So far I am only seeing one core out of eight cores on
my server being active which tells me that the update is being done serial
instead of being parallel. Do you know if there is a way for me to make
these independent updates happen in parallel?

Thank you, your help is very much appreciated!

Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread Reid Thompson



On 03/18/2011 12:17 AM, Adarsh Sharma wrote:

Thanks , it works now ..:-)

Here is the output :

pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like 
'%2010%' and
pdc_uima-# content_language='en' and content is not null and isprocessable = 1 
and
pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || 
' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || 
' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

  count

  137193
(1 row)

Time: 195441.894 ms


what is the type/content for column publishing_date?
based on what you show above, I assume it's text? -- if so, whats the format of 
the date string?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Arjen van der Meijden


On 18-3-2011 10:11, Scott Marlowe wrote:

On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
  wrote:

On 18-3-2011 4:02 Scott Marlowe wrote:
We have several 1U boxes (mostly Dell and Sun) running and had several in
the past. And we've never had any heating problems with them. That includes
machines with more power hungry processors than are currently available, all
power slurping FB-dimm slots occupied and two raid cards installed.


Note I am talking specifically about the ability to cool the RAID
card, not the CPUS etc.  Many 1U boxes have poor air flow across the
expansion slots for PCI / etc cards, while doing a great job cooling
the CPUs and memory.  If you don't use high performance RAID cards
(LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
and look at the air flow for the expansion slots, it's often just not
very much.



I was referring to amongst others two machines that have both a Dell 
Perc 5/i for internal disks and a Perc 5/e for an external disk 
enclosure. Those also had processors that produce quite some heat (2x 
X5160 and 2x X5355) combined with all fb-dimm (8x 2GB) slots filled, 
which also produce a lot of heat. Those Dell Perc's are similar to the 
LSI's from the same period in time.


So the produced heat form the other components was already pretty high. 
Still, I've seen no problems with heat for any component, including all 
four raid controllers. But I agree, there are some 1U servers that skimp 
on fans and thus air flow in the system. We've not had that problem with 
any of our systems. But both Sun and Dell seem to add quite a bit of 
fans in the middle of the system, where others may do it a bit less 
heavy duty and less over-dimensioned.


Best regards,

Arjen


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Xeon twice the performance of opteron

2011-03-18 Thread Jeff



On Mar 17, 2011, at 9:39 PM, Scott Marlowe wrote:



My experience puts the 23xx series opterons in a same general
neighborhood as the E5300 and a little behind the E5400 series Xeons.
OTOH, the newer Magny Cours Opterons stomp both of those into the
ground.

Do any of those machines have zone.reclaim.mode = 1 ???

i.e.:

sysctl -a|grep zone.reclaim
vm.zone_reclaim_mode = 0

I had a machine that had just high enough interzone communications
cost to get it turned on by default and it slowed it right to a crawl
under pgsql.



It is set to zero on this machine.

I've tried PG compiled on the box itself, same result.

As for power savings, according to cpuinfo all the cores are running  
at 2.1ghz


We had another machine which typically runs as a web server running on  
an AMD Opteron(tm) Processor 6128
which after diddling the speed governor to performance (thus bumping  
cpu speed to 2ghz from 800mhz) query speed increased to 100ms, still  
not as fast as the xeon, but close enough.


I think I'm just hitting some wall of the architecture. I tried  
getting some oprofile love from it but oprofile seems to not work on  
that box. however it worked on the xeon box:

33995 9.6859  postgres j2date
21925 6.2469  postgres ExecMakeFunctionResultNoSets
20500 5.8409  postgres slot_deform_tuple
17623 5.0212  postgres BitmapHeapNext
13059 3.7208  postgres dt2time
12271 3.4963  postgres slot_getattr
11509

aside from j2date (probably coming up due to that Join filter I'd  
wager) nothing unexpected.



--
Jeff Trout 
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen


On 03/18/2011 01:14 PM, Thomas Kellerer wrote:

Did you consider using hstore instead?

I think in the PostgreSQL world, this is a better alternative than EAV and most 
probably faster as well.
No, we did not. The reason is that we want to track each attribute with 
bi-temporal timestamps. The actual database schema for the attribute 
value table is:


CREATE TABLE attr_value (
id SERIAL PRIMARY KEY,
olio_id INTEGER NOT NULL REFERENCES base_olio, -- entity identifier
attr_tunniste VARCHAR(20) NOT NULL REFERENCES base_attr, -- attr 
identifier

kieli_tunniste VARCHAR(20) REFERENCES kieli, -- lang identifier
arvo_number DECIMAL(18, 9), -- value number
arvo_ts timestamptz, -- value timestamp
arvo_text TEXT, -- value text
arvo_valinta_tunniste VARCHAR(20), -- for choice lists: 
"value_choice_identifier"
real_valid_from TIMESTAMPTZ NOT NULL, -- real_valid_from - 
real_valid_until define when things have been in "real" world

real_valid_until TIMESTAMPTZ NOT NULL,
db_valid_from TIMESTAMPTZ NOT NULL, -- db_valid_* defines when 
things have been in the database

db_valid_until TIMESTAMPTZ NOT NULL,
tx_id_insert INTEGER default txid_current(),
tx_id_delete INTEGER,
-- foreign keys & checks skipped
);

Naturally, we have other tables defining the objects, joins between 
objects and metadata for the EAV. All data modifications are done 
through procedures, which ensure uniqueness etc. for the attributes and 
joins.


The data set is small, and performance in general is not that important, 
as long as the UI is responsive and data can be transferred to other 
systems in reasonable time. Insert performance is at least 10x worse 
than when using traditional schema, but it doesn't matter (we have 
somewhere around 1000 inserts / updates a day max). The only real 
problem so far is the chained nested loop problem, which really kills 
performance for some queries.


Surprisingly (at least to me) this schema has worked really well, 
although sometimes there is a feeling that we are implementing a 
database using a database...


 - Anssi

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen


On 03/18/2011 12:52 PM, Vitalii Tymchyshyn wrote:

If your queries work on single attribute, you can try adding partial
indexes for different attributes. Note that in this case parameterized
statements may prevent index usage, so check also with attribute id inlined.

Best regards, Vitalii Tymchyshyn


Unfortunately this does not help for the statistics, and (I guess) 
nested loops will still be used when joining:


hot2=> explain analyze select * from attr_value where attr_tunniste = 
'suhde_hyvaksytty' and arvo_text = 't';
  QUERY 
PLAN

---
 Index Scan using attr_value_arvo_text_idx1 on attr_value  
(cost=0.00..343.59 rows=152 width=118) (actual time=0.076..7.768 
rows=3096 loops=1)

   Index Cond: (arvo_text = 't'::text)
   Filter: ((attr_tunniste)::text = 'suhde_hyvaksytty'::text)
 Total runtime: 10.855 ms
(4 rows)

hot2=> create index suhde_hyvaksytty_idx on attr_value(arvo_text) where 
attr_tunniste = 'suhde_hyvaksytty';

CREATE INDEX
hot2=> analyze attr_value;
hot2=> explain analyze select * from attr_value where attr_tunniste = 
'suhde_hyvaksytty' and arvo_text = 't';

   QUERY PLAN
-
 Index Scan using suhde_hyvaksytty_idx on attr_value  (cost=0.00..43.72 
rows=152 width=118) (actual time=0.093..4.776 rows=3096 loops=1)

   Index Cond: (arvo_text = 't'::text)
 Total runtime: 7.817 ms
(3 rows)

 - Anssi

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Thomas Kellerer


Anssi Kääriäinen, 18.03.2011 08:15:

Hello list,

I am working on a Entity-Attribute-Value (EAV) database using
PostgreSQL 8.4.7. The basic problem is that when joining multiple
times different entities the planner thinks that there is vastly less
rows to join than there is in reality and decides to use multiple
nested loops for the join chain.


Did you consider using hstore instead?

I think in the PostgreSQL world, this is a better alternative than EAV and most 
probably faster as well.

Regards
Thomas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Vitalii Tymchyshyn


18.03.11 09:15, Anssi Kääriäinen написав(ла):

Hello list,

I am working on a Entity-Attribute-Value (EAV) database using 
PostgreSQL 8.4.7. The basic problem is that when joining multiple 
times different entities the planner thinks that there is vastly less 
rows to join than there is in reality and decides to use multiple 
nested loops for the join chain. This results in queries where when 
nested loops are enabled, query time is somewhere around 35 seconds, 
but with nested loops disabled, the performance is somewhere around 
100ms. I don't think there is much hope for getting better statistics, 
as EAV is just not statistics friendly. The values of an attribute 
depend on the type of the attribute, and different entities have 
different attributes defined. The planner has no idea of these 
correlations.


Hello.

If your queries work on single attribute, you can try adding partial 
indexes for different attributes. Note that in this case parameterized 
statements may prevent index usage, so check also with attribute id inlined.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen


On 03/18/2011 09:02 AM, Pavel Stehule wrote:

for example queries with LIMIT clause can be significantly faster with
nested loop. But you don't need to disable nested loop globally.

You can wrap your query to sql functions and disable nested loop just
for these functions.


Thank you for your help, the LIMIT example was something I was not aware of.

The problem is we are replacing an old database, and we need to 
replicate certain views for external users. Minimal impact for these 
users is required. Maybe it would be best to create special user 
accounts for these external users and disable nested loops only for 
those accounts. Otherwise we will disable nested loops when absolutely 
necessary.


 - Anssi

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Scott Marlowe

On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
 wrote:
> On 18-3-2011 4:02 Scott Marlowe wrote:
>>
>> On Thu, Mar 17, 2011 at 6:51 PM, Oliver Charles
>>   wrote:
>>
>> Another point.  My experience with 1U chassis and cooling is that they
>> don't move enough air across their cards to make sure they stay cool.
>> You'd be better off ordering a 2U chassis with 8 3.5" drive bays so
>> you can add drives later if you need to, and it'll provide more
>> cooling air across the card.
>>
>> Our current big 48 core servers are running plain LSI SAS adapters
>> without HW RAID because the LSI s we were using overheated and
>> cooked themselves to death after about 3 months.  Those are 1U chassis
>> machines, and our newer machines are all 2U boxes now.
>
> We have several 1U boxes (mostly Dell and Sun) running and had several in
> the past. And we've never had any heating problems with them. That includes
> machines with more power hungry processors than are currently available, all
> power slurping FB-dimm slots occupied and two raid cards installed.

Note I am talking specifically about the ability to cool the RAID
card, not the CPUS etc.  Many 1U boxes have poor air flow across the
expansion slots for PCI / etc cards, while doing a great job cooling
the CPUs and memory.  If you don't use high performance RAID cards
(LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
and look at the air flow for the expansion slots, it's often just not
very much.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Arjen van der Meijden


On 18-3-2011 4:02 Scott Marlowe wrote:

On Thu, Mar 17, 2011 at 6:51 PM, Oliver Charles
  wrote:

Another point.  My experience with 1U chassis and cooling is that they
don't move enough air across their cards to make sure they stay cool.
You'd be better off ordering a 2U chassis with 8 3.5" drive bays so
you can add drives later if you need to, and it'll provide more
cooling air across the card.

Our current big 48 core servers are running plain LSI SAS adapters
without HW RAID because the LSI s we were using overheated and
cooked themselves to death after about 3 months.  Those are 1U chassis
machines, and our newer machines are all 2U boxes now.


We have several 1U boxes (mostly Dell and Sun) running and had several 
in the past. And we've never had any heating problems with them. That 
includes machines with more power hungry processors than are currently 
available, all power slurping FB-dimm slots occupied and two raid cards 
installed.


But than again, a 2U box will likely have more cooling capacity, no 
matter how you look at it.


Another tip that may be useful; look at 2.5" drives. Afaik there is no 
really good reason to use 3.5" drives for new servers. The 2.5" drives 
save power and room - and thus may allow more air flowing through the 
enclosure - and offer the same performance and reliability (the first I 
know for sure, the second I'm pretty sure of but haven't seen much proof 
of lately).


You could even have a 8- or 10-disk 1U enclosure in that way or up to 24 
disks in 2U. But those configurations will require some attention to 
cooling again.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Pavel Stehule

Hello

for example queries with LIMIT clause can be significantly faster with
nested loop. But you don't need to disable nested loop globally.

You can wrap your query to sql functions and disable nested loop just
for these functions.

Regards

Pavel Stehule

2011/3/18 Anssi Kääriäinen :
> Hello list,
>
> I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL
> 8.4.7. The basic problem is that when joining multiple times different
> entities the planner thinks that there is vastly less rows to join than
> there is in reality and decides to use multiple nested loops for the join
> chain. This results in queries where when nested loops are enabled, query
> time is somewhere around 35 seconds, but with nested loops disabled, the
> performance is somewhere around 100ms. I don't think there is much hope for
> getting better statistics, as EAV is just not statistics friendly. The
> values of an attribute depend on the type of the attribute, and different
> entities have different attributes defined. The planner has no idea of these
> correlations.
>
> Now, my question is: if I disable nested loops completely for the users of
> the EAV database what kind of worst case performance loss can I expect? I
> don't mind if a query that normally runs in 100ms now takes 200ms, but about
> problems where the query will take much more time to complete than with
> nested loops enabled. As far as I understand these cases should be pretty
> rare if non-existent?
>
>  - Anssi
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

Re: [PERFORM] REINDEX takes half a day (and still not complete!)

[PERFORM] REINDEX takes half a day (and still not complete!)

Re: [PERFORM] Request for feedback on hardware for a new database server

Re: [PERFORM] Help: massive parallel update to the same table

Re: [PERFORM] Fastest pq_restore?

Re: [PERFORM] Help: massive parallel update to the same table

Re: [PERFORM] Request for feedback on hardware for a new database server

Re: [PERFORM] Help: massive parallel update to the same table

Re: [PERFORM] Request for feedback on hardware for a new database server

Re: [PERFORM] Help with Query Tuning

Re: [PERFORM] Disabling nested loops - worst case performance

Re: [PERFORM] Fastest pq_restore?

Re: [PERFORM] Help: massive parallel update to the same table

Re: [PERFORM] Help: massive parallel update to the same table

[PERFORM] Help: massive parallel update to the same table

Re: [PERFORM] Help with Query Tuning

Re: [PERFORM] Request for feedback on hardware for a new database server

Re: [PERFORM] Xeon twice the performance of opteron

Re: [PERFORM] Disabling nested loops - worst case performance

Re: [PERFORM] Disabling nested loops - worst case performance

Re: [PERFORM] Disabling nested loops - worst case performance

Re: [PERFORM] Disabling nested loops - worst case performance

Re: [PERFORM] Disabling nested loops - worst case performance

Re: [PERFORM] Request for feedback on hardware for a new database server

Re: [PERFORM] Request for feedback on hardware for a new database server

Re: [PERFORM] Disabling nested loops - worst case performance

27 matches

Site Navigation

Mail list logo

Footer information