Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Jesper Krogh

On 2011-03-18 01:51, Oliver Charles wrote:

Hello,

At MusicBrainz we're looking to get a new database server, and are
hoping to buy this in the next couple of days. I'm mostly a software
guy, but I'm posting this on behalf of Rob, who's actually going to be
buying the hardware. Here's a quote of what we're looking to get:


I think most of it has been said already:
* Battery backed write cache
* See if you can get enough memory to make all of your active
   dataset fit in memory. (typically not that hard in 2011).
* Dependent on your workload of-course, you're typically not
  bottlenecked by the amount of cpu-cores, so strive for fewer
  faster cores.
* As few sockets as you can screeze you memory and cpu-requirements
  onto.
* If you can live with (or design around) the tradeoffs with SSD it
  will buy you way more performance than any significant number
  of rotating drives. (a good backup plan with full WAL-log to a second
  system as an example).


--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen

Hello list,

I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL 
8.4.7. The basic problem is that when joining multiple times different 
entities the planner thinks that there is vastly less rows to join than 
there is in reality and decides to use multiple nested loops for the 
join chain. This results in queries where when nested loops are enabled, 
query time is somewhere around 35 seconds, but with nested loops 
disabled, the performance is somewhere around 100ms. I don't think there 
is much hope for getting better statistics, as EAV is just not 
statistics friendly. The values of an attribute depend on the type of 
the attribute, and different entities have different attributes defined. 
The planner has no idea of these correlations.


Now, my question is: if I disable nested loops completely for the users 
of the EAV database what kind of worst case performance loss can I 
expect? I don't mind if a query that normally runs in 100ms now takes 
200ms, but about problems where the query will take much more time to 
complete than with nested loops enabled. As far as I understand these 
cases should be pretty rare if non-existent?


 - Anssi




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Pavel Stehule
Hello

for example queries with LIMIT clause can be significantly faster with
nested loop. But you don't need to disable nested loop globally.

You can wrap your query to sql functions and disable nested loop just
for these functions.

Regards

Pavel Stehule

2011/3/18 Anssi Kääriäinen anssi.kaariai...@thl.fi:
 Hello list,

 I am working on a Entity-Attribute-Value (EAV) database using PostgreSQL
 8.4.7. The basic problem is that when joining multiple times different
 entities the planner thinks that there is vastly less rows to join than
 there is in reality and decides to use multiple nested loops for the join
 chain. This results in queries where when nested loops are enabled, query
 time is somewhere around 35 seconds, but with nested loops disabled, the
 performance is somewhere around 100ms. I don't think there is much hope for
 getting better statistics, as EAV is just not statistics friendly. The
 values of an attribute depend on the type of the attribute, and different
 entities have different attributes defined. The planner has no idea of these
 correlations.

 Now, my question is: if I disable nested loops completely for the users of
 the EAV database what kind of worst case performance loss can I expect? I
 don't mind if a query that normally runs in 100ms now takes 200ms, but about
 problems where the query will take much more time to complete than with
 nested loops enabled. As far as I understand these cases should be pretty
 rare if non-existent?

  - Anssi




 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Arjen van der Meijden

On 18-3-2011 4:02 Scott Marlowe wrote:

On Thu, Mar 17, 2011 at 6:51 PM, Oliver Charles
postgresql-p...@ocharles.org.uk  wrote:

Another point.  My experience with 1U chassis and cooling is that they
don't move enough air across their cards to make sure they stay cool.
You'd be better off ordering a 2U chassis with 8 3.5 drive bays so
you can add drives later if you need to, and it'll provide more
cooling air across the card.

Our current big 48 core servers are running plain LSI SAS adapters
without HW RAID because the LSI s we were using overheated and
cooked themselves to death after about 3 months.  Those are 1U chassis
machines, and our newer machines are all 2U boxes now.


We have several 1U boxes (mostly Dell and Sun) running and had several 
in the past. And we've never had any heating problems with them. That 
includes machines with more power hungry processors than are currently 
available, all power slurping FB-dimm slots occupied and two raid cards 
installed.


But than again, a 2U box will likely have more cooling capacity, no 
matter how you look at it.


Another tip that may be useful; look at 2.5 drives. Afaik there is no 
really good reason to use 3.5 drives for new servers. The 2.5 drives 
save power and room - and thus may allow more air flowing through the 
enclosure - and offer the same performance and reliability (the first I 
know for sure, the second I'm pretty sure of but haven't seen much proof 
of lately).


You could even have a 8- or 10-disk 1U enclosure in that way or up to 24 
disks in 2U. But those configurations will require some attention to 
cooling again.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Scott Marlowe
On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
acmmail...@tweakers.net wrote:
 On 18-3-2011 4:02 Scott Marlowe wrote:

 On Thu, Mar 17, 2011 at 6:51 PM, Oliver Charles
 postgresql-p...@ocharles.org.uk  wrote:

 Another point.  My experience with 1U chassis and cooling is that they
 don't move enough air across their cards to make sure they stay cool.
 You'd be better off ordering a 2U chassis with 8 3.5 drive bays so
 you can add drives later if you need to, and it'll provide more
 cooling air across the card.

 Our current big 48 core servers are running plain LSI SAS adapters
 without HW RAID because the LSI s we were using overheated and
 cooked themselves to death after about 3 months.  Those are 1U chassis
 machines, and our newer machines are all 2U boxes now.

 We have several 1U boxes (mostly Dell and Sun) running and had several in
 the past. And we've never had any heating problems with them. That includes
 machines with more power hungry processors than are currently available, all
 power slurping FB-dimm slots occupied and two raid cards installed.

Note I am talking specifically about the ability to cool the RAID
card, not the CPUS etc.  Many 1U boxes have poor air flow across the
expansion slots for PCI / etc cards, while doing a great job cooling
the CPUs and memory.  If you don't use high performance RAID cards
(LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
and look at the air flow for the expansion slots, it's often just not
very much.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen

On 03/18/2011 09:02 AM, Pavel Stehule wrote:

for example queries with LIMIT clause can be significantly faster with
nested loop. But you don't need to disable nested loop globally.

You can wrap your query to sql functions and disable nested loop just
for these functions.


Thank you for your help, the LIMIT example was something I was not aware of.

The problem is we are replacing an old database, and we need to 
replicate certain views for external users. Minimal impact for these 
users is required. Maybe it would be best to create special user 
accounts for these external users and disable nested loops only for 
those accounts. Otherwise we will disable nested loops when absolutely 
necessary.


 - Anssi

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Vitalii Tymchyshyn

18.03.11 09:15, Anssi Kääriäinen написав(ла):

Hello list,

I am working on a Entity-Attribute-Value (EAV) database using 
PostgreSQL 8.4.7. The basic problem is that when joining multiple 
times different entities the planner thinks that there is vastly less 
rows to join than there is in reality and decides to use multiple 
nested loops for the join chain. This results in queries where when 
nested loops are enabled, query time is somewhere around 35 seconds, 
but with nested loops disabled, the performance is somewhere around 
100ms. I don't think there is much hope for getting better statistics, 
as EAV is just not statistics friendly. The values of an attribute 
depend on the type of the attribute, and different entities have 
different attributes defined. The planner has no idea of these 
correlations.


Hello.

If your queries work on single attribute, you can try adding partial 
indexes for different attributes. Note that in this case parameterized 
statements may prevent index usage, so check also with attribute id inlined.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Thomas Kellerer

Anssi Kääriäinen, 18.03.2011 08:15:

Hello list,

I am working on a Entity-Attribute-Value (EAV) database using
PostgreSQL 8.4.7. The basic problem is that when joining multiple
times different entities the planner thinks that there is vastly less
rows to join than there is in reality and decides to use multiple
nested loops for the join chain.


Did you consider using hstore instead?

I think in the PostgreSQL world, this is a better alternative than EAV and most 
probably faster as well.

Regards
Thomas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen

On 03/18/2011 12:52 PM, Vitalii Tymchyshyn wrote:

If your queries work on single attribute, you can try adding partial
indexes for different attributes. Note that in this case parameterized
statements may prevent index usage, so check also with attribute id inlined.

Best regards, Vitalii Tymchyshyn


Unfortunately this does not help for the statistics, and (I guess) 
nested loops will still be used when joining:


hot2= explain analyze select * from attr_value where attr_tunniste = 
'suhde_hyvaksytty' and arvo_text = 't';
  QUERY 
PLAN

---
 Index Scan using attr_value_arvo_text_idx1 on attr_value  
(cost=0.00..343.59 rows=152 width=118) (actual time=0.076..7.768 
rows=3096 loops=1)

   Index Cond: (arvo_text = 't'::text)
   Filter: ((attr_tunniste)::text = 'suhde_hyvaksytty'::text)
 Total runtime: 10.855 ms
(4 rows)

hot2= create index suhde_hyvaksytty_idx on attr_value(arvo_text) where 
attr_tunniste = 'suhde_hyvaksytty';

CREATE INDEX
hot2= analyze attr_value;
hot2= explain analyze select * from attr_value where attr_tunniste = 
'suhde_hyvaksytty' and arvo_text = 't';

   QUERY PLAN
-
 Index Scan using suhde_hyvaksytty_idx on attr_value  (cost=0.00..43.72 
rows=152 width=118) (actual time=0.093..4.776 rows=3096 loops=1)

   Index Cond: (arvo_text = 't'::text)
 Total runtime: 7.817 ms
(3 rows)

 - Anssi

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Anssi Kääriäinen

On 03/18/2011 01:14 PM, Thomas Kellerer wrote:

Did you consider using hstore instead?

I think in the PostgreSQL world, this is a better alternative than EAV and most 
probably faster as well.
No, we did not. The reason is that we want to track each attribute with 
bi-temporal timestamps. The actual database schema for the attribute 
value table is:


CREATE TABLE attr_value (
id SERIAL PRIMARY KEY,
olio_id INTEGER NOT NULL REFERENCES base_olio, -- entity identifier
attr_tunniste VARCHAR(20) NOT NULL REFERENCES base_attr, -- attr 
identifier

kieli_tunniste VARCHAR(20) REFERENCES kieli, -- lang identifier
arvo_number DECIMAL(18, 9), -- value number
arvo_ts timestamptz, -- value timestamp
arvo_text TEXT, -- value text
arvo_valinta_tunniste VARCHAR(20), -- for choice lists: 
value_choice_identifier
real_valid_from TIMESTAMPTZ NOT NULL, -- real_valid_from - 
real_valid_until define when things have been in real world

real_valid_until TIMESTAMPTZ NOT NULL,
db_valid_from TIMESTAMPTZ NOT NULL, -- db_valid_* defines when 
things have been in the database

db_valid_until TIMESTAMPTZ NOT NULL,
tx_id_insert INTEGER default txid_current(),
tx_id_delete INTEGER,
-- foreign keys  checks skipped
);

Naturally, we have other tables defining the objects, joins between 
objects and metadata for the EAV. All data modifications are done 
through procedures, which ensure uniqueness etc. for the attributes and 
joins.


The data set is small, and performance in general is not that important, 
as long as the UI is responsive and data can be transferred to other 
systems in reasonable time. Insert performance is at least 10x worse 
than when using traditional schema, but it doesn't matter (we have 
somewhere around 1000 inserts / updates a day max). The only real 
problem so far is the chained nested loop problem, which really kills 
performance for some queries.


Surprisingly (at least to me) this schema has worked really well, 
although sometimes there is a feeling that we are implementing a 
database using a database...


 - Anssi

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Xeon twice the performance of opteron

2011-03-18 Thread Jeff


On Mar 17, 2011, at 9:39 PM, Scott Marlowe wrote:



My experience puts the 23xx series opterons in a same general
neighborhood as the E5300 and a little behind the E5400 series Xeons.
OTOH, the newer Magny Cours Opterons stomp both of those into the
ground.

Do any of those machines have zone.reclaim.mode = 1 ???

i.e.:

sysctl -a|grep zone.reclaim
vm.zone_reclaim_mode = 0

I had a machine that had just high enough interzone communications
cost to get it turned on by default and it slowed it right to a crawl
under pgsql.



It is set to zero on this machine.

I've tried PG compiled on the box itself, same result.

As for power savings, according to cpuinfo all the cores are running  
at 2.1ghz


We had another machine which typically runs as a web server running on  
an AMD Opteron(tm) Processor 6128
which after diddling the speed governor to performance (thus bumping  
cpu speed to 2ghz from 800mhz) query speed increased to 100ms, still  
not as fast as the xeon, but close enough.


I think I'm just hitting some wall of the architecture. I tried  
getting some oprofile love from it but oprofile seems to not work on  
that box. however it worked on the xeon box:

33995 9.6859  postgres j2date
21925 6.2469  postgres ExecMakeFunctionResultNoSets
20500 5.8409  postgres slot_deform_tuple
17623 5.0212  postgres BitmapHeapNext
13059 3.7208  postgres dt2time
12271 3.4963  postgres slot_getattr
11509

aside from j2date (probably coming up due to that Join filter I'd  
wager) nothing unexpected.



--
Jeff Trout j...@jefftrout.com
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Arjen van der Meijden

On 18-3-2011 10:11, Scott Marlowe wrote:

On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
acmmail...@tweakers.net  wrote:

On 18-3-2011 4:02 Scott Marlowe wrote:
We have several 1U boxes (mostly Dell and Sun) running and had several in
the past. And we've never had any heating problems with them. That includes
machines with more power hungry processors than are currently available, all
power slurping FB-dimm slots occupied and two raid cards installed.


Note I am talking specifically about the ability to cool the RAID
card, not the CPUS etc.  Many 1U boxes have poor air flow across the
expansion slots for PCI / etc cards, while doing a great job cooling
the CPUs and memory.  If you don't use high performance RAID cards
(LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
and look at the air flow for the expansion slots, it's often just not
very much.



I was referring to amongst others two machines that have both a Dell 
Perc 5/i for internal disks and a Perc 5/e for an external disk 
enclosure. Those also had processors that produce quite some heat (2x 
X5160 and 2x X5355) combined with all fb-dimm (8x 2GB) slots filled, 
which also produce a lot of heat. Those Dell Perc's are similar to the 
LSI's from the same period in time.


So the produced heat form the other components was already pretty high. 
Still, I've seen no problems with heat for any component, including all 
four raid controllers. But I agree, there are some 1U servers that skimp 
on fans and thus air flow in the system. We've not had that problem with 
any of our systems. But both Sun and Dell seem to add quite a bit of 
fans in the middle of the system, where others may do it a bit less 
heavy duty and less over-dimensioned.


Best regards,

Arjen


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread Reid Thompson


On 03/18/2011 12:17 AM, Adarsh Sharma wrote:

Thanks , it works now ..:-)

Here is the output :

pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like 
'%2010%' and
pdc_uima-# content_language='en' and content is not null and isprocessable = 1 
and
pdc_uima-# to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || 
' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' || ' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' || 'crpf' || 
' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

  count

  137193
(1 row)

Time: 195441.894 ms


what is the type/content for column publishing_date?
based on what you show above, I assume it's text? -- if so, whats the format of 
the date string?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Red Maple
Hi all,

Our system has a postgres database that has a table for statistic which is
updated every hour by about 10K clients. Each client only make update to its
own row in the table. So far I am only seeing one core out of eight cores on
my server being active which tells me that the update is being done serial
instead of being parallel. Do you know if there is a way for me to make
these independent updates happen in parallel?

Thank you, your help is very much appreciated!


Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Kevin Grittner
Red Maple redmaplel...@gmail.com wrote:
 
 Our system has a postgres database that has a table for statistic
 which is updated every hour by about 10K clients. Each client only
 make update to its own row in the table. So far I am only seeing
 one core out of eight cores on my server being active which tells
 me that the update is being done serial instead of being parallel.
 Do you know if there is a way for me to make these independent
 updates happen in parallel?
 
It should be parallel by default.  Are you taking out any explicit
locks?
 
Also, it seems like you're only doing about three updates per
second.  I would expect a single-row update to run in a couple ms or
less, so it would be rare that two requests would be active at the
same time, so you wouldn't often see multiple cores active at the
same time.  (Of course, the background writer, autovacuum, etc.,
should occasionally show up concurrently with update queries.)
 
Is there some particular problem you're trying to solve?  (For
example, is something too slow?)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Nicholson, Brad (Toronto, ON, CA)
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Red Maple
Sent: Friday, March 18, 2011 9:05 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Help: massive parallel update to the same table

Hi all,

Our system has a postgres database that has a table for statistic which is 
updated every hour by about 10K clients. Each client only make update to its 
own row in the table. So far I am only seeing one core out of eight cores on 
my server being active which tells me that the update is being done serial 
instead of being parallel. Do you know if there is a way for me to make these 
independent updates happen in parallel?

Thank you, your help is very much appreciated!

If they are all happening on one core, you are probably using one DB connection 
to do the updates.  To split them across multiple cores, you need to use 
multiple DB connections.  Be careful if/when you restructure things to filter 
these requests into a reasonable number of backend DB connections - turning a 
huge number of clients loose against a DB is not going end well.  

Brad.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fastest pq_restore?

2011-03-18 Thread Kevin Grittner
Andy Colson a...@squeakycode.net wrote:
 On 03/17/2011 09:25 AM, Michael Andreasen wrote:
 
 I've been looking around for information on doing a pg_restore as
 fast as possible.

 I am using a twin processor box with 2GB of memory
 
 shared_buffers = 496MB
 
Probably about right.
 
 maintenance_work_mem = 160MB
 
You might get a benefit from a bit more there; hard to say what's
best with so little RAM.
 
 checkpoint_segments = 30
 
This one is hard to call without testing.  Oddly, some machines do
better with the default of 3.  Nobody knows why.
 
 autovacuum = false
 full_page_writes=false
 
Good.
 
 fsync = off
 synchronous_commit = off
 
Absolutely.
 
 bgwriter_lru_maxpages = 0
 
I hadn't thought much about that last one -- do you have benchmarks
to confirm that it helped with a bulk load?
 
You might want to set max_connections to something lower to free up
more RAM for caching, especially considering that you have so little
RAM.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disabling nested loops - worst case performance

2011-03-18 Thread Claudio Freire
On Fri, Mar 18, 2011 at 7:52 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 18.03.11 09:15, Anssi Kääriäinen написав(ла):
 Hello.

 If your queries work on single attribute, you can try adding partial indexes
 for different attributes. Note that in this case parameterized statements
 may prevent index usage, so check also with attribute id inlined.

And if your queries work on a single entity instead, you can partition
the table per-entity thus teach the database enging about the
correlation.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with Query Tuning

2011-03-18 Thread tv
 Thanks , it works now .. :-)

 Here is the output :

 pdc_uima=# SELECT count(*)  from page_content WHERE publishing_date like
 '%2010%' and
 pdc_uima-# content_language='en' and content is not null and
 isprocessable = 1 and
 pdc_uima-# to_tsvector('english',content) @@
 to_tsquery('english','Mujahid' || ' | '
 pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' |
 '
 pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' ||
 ' | '
 pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' ||
 'crpf' || ' | '
 pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );

  count
 
  137193
 (1 row)

 Time: 195441.894 ms


 But my original query is to use AND also i.e

Hi, just replace AND and OR (used with LIKE operator) for  and |
(used with to_tsquery).

So this

(content like '%Militant%' OR content like '%jihad%') AND (content like
'%kill%' OR content like '%injure%')

becomes

to_tsvector('english',content) @@ to_tsquery('english', '(Militant |
jihad)  (kill | injure)')

BTW it seems you somehow believe you'll get exactly the same result from
those two queries (LIKE vs. tsearch) - that's false expectation. I believe
the fulltext query is much better and more appropriate in this case, just
don't expect the same results.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Claudio Freire
On Fri, Mar 18, 2011 at 3:19 AM, Jesper Krogh jes...@krogh.cc wrote:
 * Dependent on your workload of-course, you're typically not
  bottlenecked by the amount of cpu-cores, so strive for fewer
  faster cores.

Depending on your workload again, but faster memory is even more
important than faster math.

So go for the architecture with the fastest memory bus.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Kevin Grittner
[rearranged - please don't top-post]

[also, bringing this back to the list - please keep the list copied]
 
Red Maple redmaplel...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 It should be parallel by default.  Are you taking out any
 explicit locks?
 
 my clients use psql to remotely run an update function on the
 postgres server. Each client run its own psql to connect to the
 server. What I have noticed is that if I commented out the update
 in the function so that only query is being done then all the core
 would kick in and run at 100%. However if I allow the update on
 the function then only one core would run.
 
 Currently it take 40min to update all the client statistics
 
Please show us the part you commented out to get the faster run
time, and the source code for the function you mentioned.
 
 Do you know if I have configured something incorrectly?
 
 I am running postgres 9.0.2 on fedora core 14. Here is my
 postgres.conf file
 
 
 [over 500 lines of configuration, mostly comments, wrapped]
 
If you're going to post that, please strip the comments or post the
results of this query:
 
  http://wiki.postgresql.org/wiki/Server_Configuration  
 
I don't think anything in your configuration will affect this
particular problem, but it seems likely that you could do some
overall tuning.  If you want to do that, you should probably start a
new thread after this issue is sorted out.
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Scott Marlowe
On Fri, Mar 18, 2011 at 6:44 AM, Arjen van der Meijden
acmmail...@tweakers.net wrote:
 On 18-3-2011 10:11, Scott Marlowe wrote:

 On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
 acmmail...@tweakers.net  wrote:

 On 18-3-2011 4:02 Scott Marlowe wrote:
 We have several 1U boxes (mostly Dell and Sun) running and had several in
 the past. And we've never had any heating problems with them. That
 includes
 machines with more power hungry processors than are currently available,
 all
 power slurping FB-dimm slots occupied and two raid cards installed.

 Note I am talking specifically about the ability to cool the RAID
 card, not the CPUS etc.  Many 1U boxes have poor air flow across the
 expansion slots for PCI / etc cards, while doing a great job cooling
 the CPUs and memory.  If you don't use high performance RAID cards
 (LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
 and look at the air flow for the expansion slots, it's often just not
 very much.


 I was referring to amongst others two machines that have both a Dell Perc
 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those
 also had processors that produce quite some heat (2x X5160 and 2x X5355)
 combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of
 heat. Those Dell Perc's are similar to the LSI's from the same period in
 time.

 So the produced heat form the other components was already pretty high.
 Still, I've seen no problems with heat for any component, including all four
 raid controllers. But I agree, there are some 1U servers that skimp on fans
 and thus air flow in the system. We've not had that problem with any of our
 systems. But both Sun and Dell seem to add quite a bit of fans in the middle
 of the system, where others may do it a bit less heavy duty and less
 over-dimensioned.

Most machines have different pathways for cooling airflow over their
RAID cards, and they don't share that air flow with the CPUs.  Also,
the PERC RAID controllers do not produce a lot of heat.  The CPUs on
the high performance LSI or Areca controllers are often dual core high
performance CPUs in their own right, and those cards have heat sinks
with fans on them to cool them.  The cards themselves are what make so
much heat and don't get enough cooling in many 1U servers.  It has
nothing to do with what else is in the server, again because the
airflow for the cards is usually separate.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Red Maple
Hi,

Here is my function. If I comment out the update then it would run all the
cores, if not then only one core will run


CREATE OR REPLACE FUNCTION my_update_device(this_mac text, number_of_devices
integer, this_sysuptime integer)
  RETURNS integer AS
$BODY$
   DECLARE
fake_mac macaddr;
this_id integer;
new_avgld integer;
 BEGIN
 new_avgld = (this_sysuptime / 120) % 100;
 for i in 1..Number_of_devices loop
   fake_mac = substring(this_mac from 1 for 11) ||  ':' ||
upper(to_hex((i-1)/256)) || ':' || upper(to_hex((i-1)%256));
   select into this_id id from ap where lan_mac =
upper(fake_mac::text);
   if not found then
  return -1;
   end if;
   select into this_sysuptime sysuptime from ap_sysuptime where
ap_id = this_id for update;
-- 
==
--  if I comment out the next update then all cores will be running,
else only one core will be running
-- 
==
  update ap_sysuptime set sysuptime = this_sysuptime, last_contacted
= now() where ap_id = this_id;
  select into new_avgld avg_ld_1min from colubris_device
where node_id = this_id for update;
  new_avgld = (this_avgld / 120 ) % 100;
 end loop;
  return this_id;
  END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;






On Fri, Mar 18, 2011 at 12:06 PM, Kevin Grittner 
kevin.gritt...@wicourts.gov wrote:

 [rearranged - please don't top-post]

 [also, bringing this back to the list - please keep the list copied]

 Red Maple redmaplel...@gmail.com wrote:
  Kevin Grittner kevin.gritt...@wicourts.gov wrote:

  It should be parallel by default.  Are you taking out any
  explicit locks?

  my clients use psql to remotely run an update function on the
  postgres server. Each client run its own psql to connect to the
  server. What I have noticed is that if I commented out the update
  in the function so that only query is being done then all the core
  would kick in and run at 100%. However if I allow the update on
  the function then only one core would run.

  Currently it take 40min to update all the client statistics

 Please show us the part you commented out to get the faster run
 time, and the source code for the function you mentioned.

  Do you know if I have configured something incorrectly?
 
  I am running postgres 9.0.2 on fedora core 14. Here is my
  postgres.conf file
 
 
  [over 500 lines of configuration, mostly comments, wrapped]

 If you're going to post that, please strip the comments or post the
 results of this query:

  http://wiki.postgresql.org/wiki/Server_Configuration

 I don't think anything in your configuration will affect this
 particular problem, but it seems likely that you could do some
 overall tuning.  If you want to do that, you should probably start a
 new thread after this issue is sorted out.

 -Kevin




Re: [PERFORM] Fastest pq_restore?

2011-03-18 Thread Andy Colson

On 3/18/2011 9:38 AM, Kevin Grittner wrote:

Andy Colsona...@squeakycode.net  wrote:

On 03/17/2011 09:25 AM, Michael Andreasen wrote:



I've been looking around for information on doing a pg_restore as
fast as possible.



bgwriter_lru_maxpages = 0


I hadn't thought much about that last one -- do you have benchmarks
to confirm that it helped with a bulk load?



Nope, I got it from the running with scissors thread (I think), (maybe 
from Greg Smith)



or here:

http://rhaas.blogspot.com/2010/06/postgresql-as-in-memory-only-database_24.html

I dont recall exactly.  I saw it, add added a comment to my .conf just 
incase I ever needed it.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help: massive parallel update to the same table

2011-03-18 Thread Kevin Grittner
Red Maple redmaplel...@gmail.com wrote:
 
 Here is my function. If I comment out the update then it would run
 all the cores, if not then only one core will run
 
 CREATE OR REPLACE FUNCTION
 
 [...]

   select sysuptime
 into this_sysuptime
 from ap_sysuptime
 where ap_id = this_id
 for update;
 
   -- ==
   --  if I comment out the next update
   --then all cores will be running,
   --else only one core will be running
   -- ==
   update ap_sysuptime
 set sysuptime  = this_sysuptime,
 last_contacted = now()
 where ap_id = this_id;
 
This proves that you're not showing us the important part.  The
update locks the same row previously locked by the SELECT FOR
UPDATE, so any effect at the row level would be a serialization
failure based on a write conflict, which doesn't sound like your
problem.  They get different locks at the table level, though:
 
http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html#LOCKING-TABLES
 
Somewhere in code you're not showing us you're acquiring a lock on
the ap_sysuptime table which conflicts with a ROW EXCLUSIVE lock but
not with a ROW SHARE lock.  The lock types which could do that are
SHARE and SHARE ROW EXCLUSIVE.  CREATE INDEX (without CONCURRENTLY)
could do that; otherwise it seems that you would need to be
explicitly issuing a LOCK statement at one of these levels somewhere
in your transaction.  That is what is causing the transactions to
run one at a time.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Request for feedback on hardware for a new database server

2011-03-18 Thread Scott Marlowe
On Fri, Mar 18, 2011 at 10:32 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Fri, Mar 18, 2011 at 6:44 AM, Arjen van der Meijden
 acmmail...@tweakers.net wrote:
 On 18-3-2011 10:11, Scott Marlowe wrote:

 On Fri, Mar 18, 2011 at 1:16 AM, Arjen van der Meijden
 acmmail...@tweakers.net  wrote:

 On 18-3-2011 4:02 Scott Marlowe wrote:
 We have several 1U boxes (mostly Dell and Sun) running and had several in
 the past. And we've never had any heating problems with them. That
 includes
 machines with more power hungry processors than are currently available,
 all
 power slurping FB-dimm slots occupied and two raid cards installed.

 Note I am talking specifically about the ability to cool the RAID
 card, not the CPUS etc.  Many 1U boxes have poor air flow across the
 expansion slots for PCI / etc cards, while doing a great job cooling
 the CPUs and memory.  If you don't use high performance RAID cards
 (LSI 9xxx  Areca 16xx 18xx) then it's not an issue.  Open up your 1U
 and look at the air flow for the expansion slots, it's often just not
 very much.


 I was referring to amongst others two machines that have both a Dell Perc
 5/i for internal disks and a Perc 5/e for an external disk enclosure. Those
 also had processors that produce quite some heat (2x X5160 and 2x X5355)
 combined with all fb-dimm (8x 2GB) slots filled, which also produce a lot of
 heat. Those Dell Perc's are similar to the LSI's from the same period in
 time.

 So the produced heat form the other components was already pretty high.
 Still, I've seen no problems with heat for any component, including all four
 raid controllers. But I agree, there are some 1U servers that skimp on fans
 and thus air flow in the system. We've not had that problem with any of our
 systems. But both Sun and Dell seem to add quite a bit of fans in the middle
 of the system, where others may do it a bit less heavy duty and less
 over-dimensioned.

 Most machines have different pathways for cooling airflow over their
 RAID cards, and they don't share that air flow with the CPUs.  Also,
 the PERC RAID controllers do not produce a lot of heat.  The CPUs on
 the high performance LSI or Areca controllers are often dual core high
 performance CPUs in their own right, and those cards have heat sinks
 with fans on them to cool them.  The cards themselves are what make so
 much heat and don't get enough cooling in many 1U servers.  It has
 nothing to do with what else is in the server, again because the
 airflow for the cards is usually separate.

As a followup to this subject, the problem wasn't bad until the server
load increased, thus increasing the load on the LSI MegaRAID card, at
which point it started producing more heat than it had before.  When
the machine wasn't working too hard the LSI was fine.  Once we started
hitting higher and higher load is when the card had issues.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] REINDEX takes half a day (and still not complete!)

2011-03-18 Thread Phoenix Kiula
I have a large table but not as large as the kind of numbers that get
discussed on this list. It has 125 million rows.

REINDEXing the table takes half a day, and it's still not finished.

To write this post I did SELECT COUNT(*), and here's the output -- so long!

select count(*) from links;
   count
---
 125418191
(1 row)

Time: 1270405.373 ms

That's 1270 seconds!

I suppose the vaccuum analyze is not doing its job? As you can see
from settings below, I have autovacuum set to ON, and there's also a
cronjob every 10 hours to do a manual vacuum analyze on this table,
which is largest.

PG is version 8.2.9.

Any thoughts on what I can do to improve performance!?

Below are my settings.



max_connections  = 300
shared_buffers   = 500MB
effective_cache_size = 1GB
max_fsm_relations= 1500
max_fsm_pages= 95

work_mem = 100MB
temp_buffers = 4096
authentication_timeout   = 10s
ssl  = off
checkpoint_warning   = 3600
random_page_cost = 1

autovacuum   = on
autovacuum_vacuum_cost_delay = 20

vacuum_cost_delay= 20
vacuum_cost_limit= 600

autovacuum_naptime   = 10
stats_start_collector= on
stats_row_level  = on
autovacuum_vacuum_threshold  = 75
autovacuum_analyze_threshold = 25
autovacuum_analyze_scale_factor  = 0.02
autovacuum_vacuum_scale_factor   = 0.01

wal_buffers  = 64
checkpoint_segments  = 128
checkpoint_timeout   = 900
fsync= on
maintenance_work_mem = 512MB

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-03-18 Thread Scott Marlowe
On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 I have a large table but not as large as the kind of numbers that get
 discussed on this list. It has 125 million rows.

 REINDEXing the table takes half a day, and it's still not finished.

 To write this post I did SELECT COUNT(*), and here's the output -- so long!

    select count(*) from links;
       count
    ---
     125418191
    (1 row)

    Time: 1270405.373 ms

 That's 1270 seconds!

 I suppose the vaccuum analyze is not doing its job? As you can see
 from settings below, I have autovacuum set to ON, and there's also a
 cronjob every 10 hours to do a manual vacuum analyze on this table,
 which is largest.

 PG is version 8.2.9.

 Any thoughts on what I can do to improve performance!?

 Below are my settings.



 max_connections              = 300
 shared_buffers               = 500MB
 effective_cache_size         = 1GB
 max_fsm_relations            = 1500
 max_fsm_pages                = 95

 work_mem                     = 100MB

What is the output of running vacuum verbose as a superuser (you can
run it on the postgres database so it returns fast.)  We're looking
for the output that looks like this:

INFO:  free space map contains 1930193 pages in 749 relations
DETAIL:  A total of 1787744 page slots are in use (including overhead).
1787744 page slots are required to track all free space.
Current limits are:  1000 page slots, 3000 relations, using 58911 kB.

If the space needed exceeds page slots then you need to crank up your
free space map.  If the relations exceeds the available then you'll
need to crank up max relations.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] REINDEX takes half a day (and still not complete!)

2011-03-18 Thread Scott Marlowe
On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula phoenix.ki...@gmail.com wrote:
 autovacuum                   = on
 autovacuum_vacuum_cost_delay = 20

 vacuum_cost_delay            = 20
 vacuum_cost_limit            = 600

 autovacuum_naptime           = 10

also, if vacuum can't keep up you can increase the vacuum cost limit,
and lower the cost delay.  Anything above 1ms is still quite a wait
compared to 0.  And most systems don't have the real granularity to go
that low anyway, so 5ms is about as low as you can go and get a change
before 0.  Also, if you've got a lot of large relations you might need
to increase the max workers as well.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance