Re: [PERFORM] Raid 10 chunksize

2009-04-03 Thread Hannes Dorbath

Ron Mayer wrote:

Greg Smith wrote:

On Wed, 1 Apr 2009, Scott Carey wrote:


Write caching on SATA is totally fine.  There were some old ATA drives
that when paried with some file systems or OS's would not be safe.  There are
some combinations that have unsafe write barriers.  But there is a
standard
well supported ATA command to sync and only return after the data is on
disk.  If you are running an OS that is anything recent at all, and any
disks that are not really old, you're fine.

While I would like to believe this, I don't trust any claims in this
area that don't have matching tests that demonstrate things working as
expected.  And I've never seen this work.

My laptop has a 7200 RPM drive, which means that if fsync is being
passed through to the disk correctly I can only fsync 120
times/second.  Here's what I get when I run sysbench on it, starting
with the default ext3 configuration:


I believe it's ext3 who's cheating in this scenario.


I assume so too. Here the same test using XFS, first with barriers (XFS 
default) and then without:


Linux 2.6.28-gentoo-r2 #1 SMP Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz 
GenuineIntel GNU/Linux


/dev/sdb /data2 xfs rw,noatime,attr2,logbufs=8,logbsize=256k,noquota 0 0

# sysbench --test=fileio --file-fsync-freq=1 --file-num=1 
--file-total-size=16384 --file-test-mode=rndwr run

sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
1 files, 16Kb each
16Kb total file size
Block size 16Kb
Number of random requests for random IO: 1
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Threads started!
Done.

Operations performed:  0 Read, 1 Write, 1 Other = 2 Total
Read 0b  Written 156.25Mb  Total transferred 156.25Mb  (463.9Kb/sec)
   28.99 Requests/sec executed

Test execution summary:
total time:  344.9013s
total number of events:  1
total time taken by event execution: 0.1453
per-request statistics:
 min:  0.01ms
 avg:  0.01ms
 max:  0.07ms
 approx.  95 percentile:   0.01ms

Threads fairness:
events (avg/stddev):   1./0.00
execution time (avg/stddev):   0.1453/0.00


And now without barriers:

/dev/sdb /data2 xfs 
rw,noatime,attr2,nobarrier,logbufs=8,logbsize=256k,noquota 0 0


# sysbench --test=fileio --file-fsync-freq=1 --file-num=1 
--file-total-size=16384 --file-test-mode=rndwr run

sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
1 files, 16Kb each
16Kb total file size
Block size 16Kb
Number of random requests for random IO: 1
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Threads started!
Done.

Operations performed:  0 Read, 1 Write, 1 Other = 2 Total
Read 0b  Written 156.25Mb  Total transferred 156.25Mb  (62.872Mb/sec)
 4023.81 Requests/sec executed

Test execution summary:
total time:  2.4852s
total number of events:  1
total time taken by event execution: 0.1325
per-request statistics:
 min:  0.01ms
 avg:  0.01ms
 max:  0.06ms
 approx.  95 percentile:   0.01ms

Threads fairness:
events (avg/stddev):   1./0.00
execution time (avg/stddev):   0.1325/0.00


--
Best regards,
Hannes Dorbath

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgPool query cache

2008-05-04 Thread Hannes Dorbath

Rauan Maemirov wrote:

I want to ask, if anyone used query_cache of pgPool. The problem is
there is no detailed installation steps on how to configure it
correctly. Itried to follow it, but guess that it doesn't cache my
queries. So, maybe someone adviced me or give link.


Nobody use??


I'd say caching is better done on a higher level -- like HTML fragments 
or what ever you generate from those queries..


Just my two cent.


--
Best regards,
Hannes Dorbath

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Fastest way / best practice to calculate next birthdays

2008-05-04 Thread Hannes Dorbath
The subject basically says it all, I'm looking for the fastest 
(indexable) way to calculate the next birthdays relative to NOW() from a 
dataset of about 1 million users.


I'm currently using a function based index, but leap year handling / 
mapping February 29 to February 28 gives me some headaches.


Is there any best practice to do that in PostgreSQL?


--
Best regards,
Hannes Dorbath

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.3 synchronous_commit

2008-01-22 Thread Hannes Dorbath

Guillaume Smet wrote:

Also with SATA? If your SATA disk is lying about effectively SYNCing
the data, I'm not that surprised you don't see any improvement. Being
slower is a bit surprising though.


The disc is not lying, but LVM does not support write barriers, so the 
result is the same. Indeed nothing is flushing the disc's write cache on 
fsync. However I could disable the disc's write cache entirely. One 
reason why I'm a ZFS fan boy lately. It just get's all of this right by 
*default*.


Anyway, with some further testing my benchmark results vary so much that 
 further discussion seems pointless. I'll repost when I really have 
reproducible values and followed Greg Smith's advice.



--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 8.3 synchronous_commit

2008-01-22 Thread Hannes Dorbath

Greg Smith wrote:
Try something more in the range of 4 clients/CPU and set the scale to 
closer to twice that (so with a dual-core system you might do 8 clients 
and a scale of 16).  If you really want to simulate a large number of 
clients, do that on another system and connect to the server remotely.


With 4 clients and scale 10 I get 246 TPS for synchronous_commit 
disabled and 634 TPS for synchronous_commit enabled. So the effect just 
got even stronger. That was for CFQ.


For deadline they are now pretty close, but synchronous_commit disabled 
is still slower. 690 to 727.


Values are AVG from 3 runs each. DROP/CREATE DATABASE and CHECKPOINT; 
before each run.



--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] 8.3 synchronous_commit

2008-01-21 Thread Hannes Dorbath
I might completely misunderstand this feature. Shouldn't 
synchronous_commit = off improve performance?


Whatever I do, I find synchronous_commit = off to degrade performance. 

Especially it doesn't like the CFQ I/O scheduler, it's not so bad with 
deadline. Synthetic load like


pgbench -i -s 10 -U pgsql -d bench  pgbench -t 1000 -c 100 -U pgsql -d 
bench


or the same with scale 100.

Maybe it's just my test box.. single SATA-II drive, XFS on top of LVM.

I'll retry without LVM once I have another drive.. I've seen LVM mess 
with other things in the past.



--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-05 Thread Hannes Dorbath

On 05.09.2007 01:15, Scott Marlowe wrote:

On 9/4/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

Carlo Stonebanks wrote:

A client is moving their postgresql db to a brand new Windows 2003 x64
server with 2 quad cores and 32GB of RAM. It is a dedicated server to run
8.2.4.

Large shared_buffers and Windows do not mix.  Perhaps you should leave
the shmem config low, so that the kernel can cache the file pages.


But yeah, the I/O, that's the big one.  If it's just a single or a
couple of IDE drives, it's not gonna be able to handle much load.


Right, additionally NTFS is really nothing to use on any serious disc array.


--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] FORGOT TO CONFIGURE RAID! DELL POWEREDGE 2950

2007-07-14 Thread Hannes Dorbath
Joseph wrote:
 We just got a DELL POWEREDGE 2950.   So I was tasked with putting
 Linux Redhat and dumped our software/packages on it.  Contrary to
 common sense, I didn't bother reading the manuals that came with te
 2950.  I went right ahead and installed Redhat server on it, then went
 and loaded the backups software/data etc onto it and started having
 the team use it.

And this has to do with pgsql.performance exactly what?

Anyway, as someone who seems to administrates a PostgreSQL production
box, you sure have a good backup plan. So just call DELL's support, fix
you RAID and restore form backup.

From the DELL site it seems this `PERC 5/i' on board controller
(assuming that's what you have) doesn't even have a BBU. If you don't
plan to post here in a few weeks again about data corruption, go out and
shop a serious controller.

And before you move that box in production, check:

Is my hardware and software setup fsync/fua clean?
Is my backup plan working?


-- 
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] FORGOT TO CONFIGURE RAID! DELL POWEREDGE 2950

2007-07-14 Thread Hannes Dorbath
Gregory Stark wrote:
 From the DELL site it seems this `PERC 5/i' on board controller
 (assuming that's what you have) doesn't even have a BBU. If you don't
 plan to post here in a few weeks again about data corruption, go out and
 shop a serious controller.
 
 This is a bit of a strange comment. A BBU will improve performance but
 Postgres doesn't require one to guarantee data integrity.
 
 If your drives have write caching disabled (ie write-through) and your
 controller does write-through caching and you leave fsync=on and
 full_page_writes=on which is the default then you shouldn't have any data
 integrity issues.

That was my point, controllers without BBU usually leave drive caches
turned on, as with drive caches off performance would be unbearable bad.


-- 
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000

2007-04-18 Thread Hannes Dorbath

Arnau wrote:

Hi Thor,

Thor-Michael Støre wrote:

On 2007-04-04 Arnau wrote:

Josh Berkus wrote:

Arnau,


Is there anything similar in PostgreSQL? The idea behind this
is how I can do in PostgreSQL to have tables where I can query
on them very often something like every few seconds and get
results very fast without overloading the postmaster.

If you're only querying the tables every few seconds, then you
don't really need to worry about performance.



Well, the idea behind this is to have events tables, and a
monitoring system polls that table every few seconds.  I'd like to
have a kind of FIFO stack. From the events producer point of view
he'll be pushing rows into that table, when it's filled the oldest
one will be removed to leave room to the newest one. From the
consumer point of view he'll read all the contents of that table.



So I'll not only querying the tables, I'll need to also modify that
tables.


Please try to refrain from doing this. This is the Database as an
IPC antipattern (Antipatterns are commonly-reinvented bad solutions
to problems, I.E. you can be sure someone has tried this very thing
before and found it to be a bad solution)

http://en.wikipedia.org/wiki/Database_as_an_IPC

Best solution is (like Ansgar hinted at) to use a real IPC system.

Ofcourse, I've done it myself (not on PostgreSQL though) when working
at a large corporation where corporate politics prevented me from
introducing any new interdependency between systems (like having two
start talking with eachother when they previously didn't), the only
common ground for systems that needed to communicate was a
database, and one of the systems was only able to run simple SQL
statements and not stored procedures.



  First of all, thanks for your interested but let me explain what I 
need to do.


  We have a web application where customers want to monitor how it's 
performing, but not performing in terms of speed but how many customers 
are now browsing in the application, how many have payed browsing 
sessions, how many payments have been done, ... More or less is to have 
a control panel. The difference is that they want that the information 
displayed on a web browser must be real-time that is a query every 
1-10 seconds.



Though that has been suggested earlier, but why not use pgmemcache and 
push each event as a new key? As memcached is FIFO by design that is 
exacly what you ask for. Besides that memcached is so fast that your OS 
is more busy with handling all that TCP connections than running memcached.


And in case you'd like to display statistical data and not tailing 
events, let PG push that to memcached keys as well. See memcached as a 
materialized view in that case.


As middleware I'd recommend lighttpd with mod_magnet.

You should be able to delivery that admin page way more than 5000 times 
/ sec with some outdated desktop hardware. If that's not enough read up 
on things like 
http://blog.lighttpd.net/articles/2006/11/27/comet-meets-mod_mailbox



--
Best regards,
Hannes Dorbath

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Running PG on cluster files systems

2007-01-09 Thread Hannes Dorbath

GFS2, OFCS2, lustre, CXFS, GPFS, Veritas and what else there is..

..has someone experience with any of those? Is it bearable to run PG on 
them from a performance point of view? I guess not, but any positive 
reports?


Thanks

--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Yet another question on LIMIT performance :/

2006-11-06 Thread Hannes Dorbath

Though I've read recent threads, I'm unsure if any matches my case.

We have 2 tables: revisions and revisions_active. revisions contains 
117707 rows, revisions_active 17827 rows.


DDL: http://hannes.imos.net/ddl.sql.txt

Joining the 2 tables without an additional condition seems ok for me 
(given our outdated hardware): http://hannes.imos.net/query_1.sql.txt


What worries me is the performance when limiting the recordset:
http://hannes.imos.net/query_2.sql.txt

Though it should only have to join a few rows it seems to scan all rows. 
From experience I thought that adding an ORDER BY on the index columns 
should speed it up. But no effect: http://hannes.imos.net/query_3.sql.txt


I'm on 8.1.5, statistics (ANALYZE) are up to date, the tables have each 
been CLUSTERed by PK, statistic target for the join columns has been set 
to 100 (without any effect).



Thanks in advance!


--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Yet another question on LIMIT performance :/

2006-11-06 Thread Hannes Dorbath

On 06.11.2006 15:13, Heikki Linnakangas wrote:

Hannes Dorbath wrote:
Though it should only have to join a few rows it seems to scan all rows. 


What makes you think that's the case?


Sorry, not all rows, but 80753. It's not clear to me why this number is 
so high with LIMIT 10.



--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Opteron vs. Xeon benchmark

2006-09-22 Thread Hannes Dorbath

A colleague pointed me to this site tomorrow:

http://tweakers.net/reviews/642/13

I can't read the language, so can't get a grip on what exactly the 
benchmark was about.


Their diagrams show `Request per seconds'. What should that mean? How 
many connections PG accepted per second? So they measured the OS fork 
performance? Should that value be of any interrest? Anyone with heavy 
OLTP workload will use persistent connections or a connection pool in front.


Do they mean TPS? That woulnd't make much sense in a CPU benchmark, as 
OLTP workload is typically limited by the disc subsystem.


Can someone enlighten me what this site is about?


--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Arguments Pro/Contra Software Raid

2006-05-09 Thread Hannes Dorbath

Hi,

I've just had some discussion with colleagues regarding the usage of 
hardware or software raid 1/10 for our linux based database servers.


I myself can't see much reason to spend $500 on high end controller 
cards for a simple Raid 1.


Any arguments pro or contra would be desirable.

From my experience and what I've read here:

+ Hardware Raids might be a bit easier to manage, if you never spend a 
few hours to learn Software Raid Tools.


+ There are situations in which Software Raids are faster, as CPU power 
has advanced dramatically in the last years and even high end controller 
cards cannot keep up with that.


+ Using SATA drives is always a bit of risk, as some drives are lying 
about whether they are caching or not.


+ Using hardware controllers, the array becomes locked to a particular 
vendor. You can't switch controller vendors as the array meta 
information is stored proprietary. In case the Raid is broken to a level 
the controller can't recover automatically this might complicate manual 
recovery by specialists.


+ Even battery backed controllers can't guarantee that data written to 
the drives is consistent after a power outage, neither that the drive 
does not corrupt something during the involuntary shutdown / power 
irregularities. (This is theoretical as any server will be UPS backed)



--
Regards,
Hannes Dorbath

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Query Optimizer Failure / Possible Bug

2005-04-10 Thread Hannes Dorbath
Mhh. I have no clue about the internals of PostgreSQL and query planing, 
but to me as user this should really be a thing the optimizer has to 
work out..

On 03.04.2005 10:01, PFC wrote:
Noticed this problem,too.
You can always make the calculation you want done once inside a set  
returning function so it'll behave like a table, but that's ugly.

On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath  
[EMAIL PROTECTED] wrote:

hm, a few days and not a single reply :|
any more information needed? test data? simplified test case? anything?
thanks
Hannes Dorbath wrote:
The query and the corresponding EXPLAIN is at
 http://hannes.imos.net/query.txt
 I'd like to use the column q.replaced_serials for multiple calculations
in the SELECT clause, but every time it is referenced there in some way
the whole query in the FROM clause returning q is executed again.
 This doesn't make sense to me at all and eats performance.
 If this wasn't clear enough, for every
 q.replaced_serials insert_random_calculation AS some_column
 in the SELECT clause there is new block of
 ---
-  Aggregate  (cost=884.23..884.23 rows=1 width=0)
  -  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
-  Index Scan using ix_rma_ticket_serials_replace on
rma_ticket_serials rts  (cost=0.00..122.35
rows=190 width=4)
  Index Cond: (replace = false)
-  Index Scan using pk_serials on serials s
  (cost=0.00..3.51 rows=1 width=4)
  Index Cond: (s.serial_id = outer.serial_id)
  Filter: ((article_no = $0) AND (delivery_id = $1))
---
 in the EXPLAIN result.
 For those who wonder why I do this FROM (SELECT...). I was 
searching  for
a way to use the result of an subselect for multiple calculations in the
SELECT clause and return that calculation results as individual columns.
 I tested a bit further and found out that PG behaves the same in case q
is a view. This makes me wonder how efficient the optimizer can work
with views - or even worse - nested views.
 Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
  Thanks in advance,
Hannes Dorbath


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Query Optimizer Failure / Possible Bug

2005-04-02 Thread Hannes Dorbath
hm, a few days and not a single reply :|
any more information needed? test data? simplified test case? anything?
thanks
Hannes Dorbath wrote:
The query and the corresponding EXPLAIN is at
http://hannes.imos.net/query.txt
I'd like to use the column q.replaced_serials for multiple calculations
in the SELECT clause, but every time it is referenced there in some way
the whole query in the FROM clause returning q is executed again.
This doesn't make sense to me at all and eats performance.
If this wasn't clear enough, for every
q.replaced_serials insert_random_calculation AS some_column
in the SELECT clause there is new block of
---
-  Aggregate  (cost=884.23..884.23 rows=1 width=0)
  -  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
-  Index Scan using ix_rma_ticket_serials_replace on
rma_ticket_serials rts  (cost=0.00..122.35
rows=190 width=4)
  Index Cond: (replace = false)
-  Index Scan using pk_serials on serials s
  (cost=0.00..3.51 rows=1 width=4)
  Index Cond: (s.serial_id = outer.serial_id)
  Filter: ((article_no = $0) AND (delivery_id = $1))
---
in the EXPLAIN result.
For those who wonder why I do this FROM (SELECT...). I was searching for
a way to use the result of an subselect for multiple calculations in the
SELECT clause and return that calculation results as individual columns.
I tested a bit further and found out that PG behaves the same in case q
is a view. This makes me wonder how efficient the optimizer can work
with views - or even worse - nested views.
Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
Thanks in advance,
Hannes Dorbath
--
imos  Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen  // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] Query Optimizer Failure / Possible Bug

2005-03-27 Thread Hannes Dorbath
The query and the corresponding EXPLAIN is at
http://hannes.imos.net/query.txt
I'd like to use the column q.replaced_serials for multiple calculations
in the SELECT clause, but every time it is referenced there in some way
the whole query in the FROM clause returning q is executed again.
This doesn't make sense to me at all and eats performance.
If this wasn't clear enough, for every
q.replaced_serials insert_random_calculation AS some_column
in the SELECT clause there is new block of
---
-  Aggregate  (cost=884.23..884.23 rows=1 width=0)
  -  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
-  Index Scan using ix_rma_ticket_serials_replace on 

rma_ticket_serials rts  (cost=0.00..122.35
rows=190 width=4)
  Index Cond: (replace = false)
-  Index Scan using pk_serials on serials s
  (cost=0.00..3.51 rows=1 width=4)
  Index Cond: (s.serial_id = outer.serial_id)
  Filter: ((article_no = $0) AND (delivery_id = $1))
---
in the EXPLAIN result.
For those who wonder why I do this FROM (SELECT...). I was searching for
a way to use the result of an subselect for multiple calculations in the
SELECT clause and return that calculation results as individual columns.
I tested a bit further and found out that PG behaves the same in case q
is a view. This makes me wonder how efficient the optimizer can work
with views - or even worse - nested views.
Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
Thanks in advance,
Hannes Dorbath
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org