Re: [PERFORM] Optimize query for listing un-read messages

2014-05-02 Thread Vitalii Tymchyshyn
What statistics do you have on the data? I suppose most messages are read by low number of users, mostly 0 or one. I can see two options to consider: 1) Use arrays to store information on which users have already read the message. You may need GIN/GIST index to search fast. 2) Introduce some kind o

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-02 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 02:17:58, skrev Craig James mailto:cja...@emolecules.com>>: On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Kroghmailto:andr...@visena.com>> wrote: I have a schema where I have lots of messages and some users who might have read some of them. When a message is read by a

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Craig James
On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Krogh wrote: > I have a schema where I have lots of messages and some users who might > have read some of them. When a message is read by a user I create an entry > i a table message_property holding the property (is_read) for that user. > > The schem

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 01:58:04, skrev David G Johnston < david.g.johns...@gmail.com >: Per-User caching does seem to be something that is going to be needed... Depending on how many users are being tracked would storing the "reader_id" in an indexed arra

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread David G Johnston
> > > Per-User caching does seem to be something that is going to be needed... > > Depending on how many users are being tracked would storing the "reader_id" > in an indexed array improve matters? " SELECT ... FROM message WHERE NOT > (1 > = ANY(reader_ids)) ; UPDATE message SET reader_ids = read

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 00:55:25, skrev David G Johnston < david.g.johns...@gmail.com >: Andreas Joseph Krogh-2 wrote > I will end up with that only if > all users read all messages, which is not nearly the case. These observations probably won't help but.

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread David G Johnston
Andreas Joseph Krogh-2 wrote > I will end up with that only if > all users read all messages, which is not nearly the case. These observations probably won't help but... You have what amounts to a mathematical "spare matrix" problem on your hands... Is there any way to expire messages so that d

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 00:34:34, skrev Tomas Vondra mailto:t...@fuzzy.cz>>: On 1.5.2014 23:58, Andreas Joseph Krogh wrote: > På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra >: > >     On 1.5.2014 23:19, Andreas Joseph Krogh wrote: >     > Just curious: >   

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Tomas Vondra
On 1.5.2014 23:58, Andreas Joseph Krogh wrote: > På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra >: > > On 1.5.2014 23:19, Andreas Joseph Krogh wrote: > > Just curious: > > Is such a JOIN index possible in other DBs, if so - which? > > Can other DBs

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra mailto:t...@fuzzy.cz>>: On 1.5.2014 23:19, Andreas Joseph Krogh wrote: > På torsdag 01. mai 2014 kl. 23:02:13, skrev Pavel Stehule > mailto:pavel.steh...@gmail.com>>: > >      >      >     2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Tomas Vondra
On 1.5.2014 23:19, Andreas Joseph Krogh wrote: > På torsdag 01. mai 2014 kl. 23:02:13, skrev Pavel Stehule > mailto:pavel.steh...@gmail.com>>: > > > > 2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh >: > > På torsdag 01. mai 2014 kl. 21:53:3

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 23:19:55, skrev David G Johnston < david.g.johns...@gmail.com >: How does something like: WITH unreads AS ( SELECT messageid FROM message EXCEPT SELECT messageid FROM message_property WHERE personid=1 AND has_read ) SELECT ...

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 23:02:13, skrev Pavel Stehule < pavel.steh...@gmail.com >:     2014-05-01 22:30 GMT+02:00 Andreas Joseph Kroghmailto:andr...@visena.com>>: På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule < pavel.steh...@gmail.com

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread David G Johnston
How does something like: WITH unreads AS ( SELECT messageid FROM message EXCEPT SELECT messageid FROM message_property WHERE personid=1 AND has_read ) SELECT ... FROM unreads JOIN messages USING (messageid) ; perform? David J. -- View this message in context: http://postgresql.1045698.n5.na

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh : > På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule < > pavel.steh...@gmail.com>: > > > > 2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh : >> >> På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule < >> pavel.steh...@gmail.com>: >> >>

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule < pavel.steh...@gmail.com >:     2014-05-01 21:39 GMT+02:00 Andreas Joseph Kroghmailto:andr...@visena.com>>: På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule < pavel.steh...@gmail.com

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh : > På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule < > pavel.steh...@gmail.com>: > > Hello > [snip] > > I had a perfect success on similar use case with descent ordered partial > index > > http://www.postgresql.org/docs/9.3/interactive/sql

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule < pavel.steh...@gmail.com >: Hello [snip]   I had a perfect success on similar use case with descent ordered partial index http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
Hello 2014-05-01 21:17 GMT+02:00 Andreas Joseph Krogh : > På torsdag 01. mai 2014 kl. 20:35:07, skrev Jochem Berndsen < > joc...@functor.nl>: > > > Hi Andreas, > > [New to this list, forgive my ignorance.] > [snip] > I'm getting better performance with: > > SELECT > m.id AS message_id, > 1 AS pe

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 20:35:07, skrev Jochem Berndsen mailto:joc...@functor.nl>>: Hi Andreas, [New to this list, forgive my ignorance.] [snip] I'm getting better performance with: SELECT m.id AS message_id, 1 AS person_id, FALSE AS is_read, m.subject FROM message m WHERE 1 = 1

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Jochem Berndsen
Hi Andreas, [New to this list, forgive my ignorance.] On 05/01/2014 01:26 PM, Andreas Joseph Krogh wrote: I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu My machine has PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu. I have a schema where I have lots of messages and some users who might

[PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
Hi all,   I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu I have a schema where I have lots of messages and some users who might have read some of them. When a message is read by a user I create an entry i a table message_property holding the property (is_read) for that user.   The schem

Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-06 Thread Jeff Janes
On Tue, Mar 5, 2013 at 4:07 PM, Joshua D. Drake wrote: > > On 03/05/2013 03:51 PM, Niels Kristian Schjødt wrote: > > 3ms isn't slow >>> >>> Sorry, it's 3323ms! >> >> Can I do anything to optimize that query or maybe the index or something? >>> >>> your index is already used >>> >> >> Okay

Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-06 Thread Julien Cigar
On 03/06/2013 00:51, Niels Kristian Schjødt wrote: Hi, thanks for answering. See comments inline. Den 05/03/2013 kl. 15.26 skrev Julien Cigar : On 03/05/2013 15:00, Niels Kristian Schjødt wrote: Hi, I'm running a rails app, where I have a model called Car that has_many Images. Now when I te

Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Joshua D. Drake
On 03/05/2013 03:51 PM, Niels Kristian Schjødt wrote: 3ms isn't slow Sorry, it's 3323ms! Can I do anything to optimize that query or maybe the index or something? your index is already used Okay this leaves me with - "get better hardware" or? What does explain analyze say versus just

Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Niels Kristian Schjødt
Hi, thanks for answering. See comments inline. Den 05/03/2013 kl. 15.26 skrev Julien Cigar : > On 03/05/2013 15:00, Niels Kristian Schjødt wrote: >> Hi, >> >> I'm running a rails app, where I have a model called Car that has_many >> Images. Now when I tell rails to include those images, when qu

Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Julien Cigar
On 03/05/2013 15:00, Niels Kristian Schjødt wrote: Hi, I'm running a rails app, where I have a model called Car that has_many Images. Now when I tell rails to include those images, when querying say 50 cars, then it often decides to use a SELECT * from images WHERE car_id IN (id1,id2,id3,id4…

[PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Niels Kristian Schjødt
Hi, I'm running a rails app, where I have a model called Car that has_many Images. Now when I tell rails to include those images, when querying say 50 cars, then it often decides to use a SELECT * from images WHERE car_id IN (id1,id2,id3,id4…) instead of doing a join. Now either way it uses t

Re: [PERFORM] Optimize update query

2012-12-03 Thread Niels Kristian Schjødt
Den 30/11/2012 kl. 17.06 skrev Shaun Thomas : > On 11/30/2012 09:44 AM, Niels Kristian Schjødt wrote: > > Just a note on your iostat numbers. The first reading is actually just a > summary. You want the subsequent readings. > >> The pgsql_tmp dir is not changing at all it's constantly empty (a

Re: [PERFORM] Optimize update query

2012-12-02 Thread Mark Kirkwood
Yeah, this area is changing very fast! I agree - choosing carefully is important, as there are still plenty of older models around that are substantially slower. Also choice of motherboard chipset can strongly effect overall performance too. The 6 Gbit/s ports on Sandy and Ivy bridge Mobos [1]

Re: [PERFORM] Optimize update query

2012-12-02 Thread Vitalii Tymchyshyn
Well, it seems that my data can be outdated, sorry for that. I've just checked performance numbers on Tom's hardware and it seems that best sad really do 500 MB/s. Some others do 100. So, I'd say one must choose wisely (as always :-) ). Best regards, Vitalii Tymchyshyn 1 груд. 2012 00:43, "Mark Ki

Re: [PERFORM] Optimize update query

2012-11-30 Thread Mark Kirkwood
Hmm - not strictly true as stated: 1 SSD will typically do 500MB/s sequential read/write. 1 HDD will be lucky to get a 1/3 that. We are looking at replacing 4 to 6 disk RAID10 arrays of HDD with a RAID1 pair of SSD, as they perform about the same for sequential work and vastly better at random

Re: [PERFORM] Optimize update query

2012-11-30 Thread Shaun Thomas
On 11/30/2012 09:44 AM, Niels Kristian Schjødt wrote: Just a note on your iostat numbers. The first reading is actually just a summary. You want the subsequent readings. The pgsql_tmp dir is not changing at all it's constantly empty (a size of 4.0K). Good. Filesystem 1K-blocksUsed A

Re: [PERFORM] Optimize update query

2012-11-30 Thread Niels Kristian Schjødt
Hmm very very interesting. Currently I run at "medium" load compared to the very high loads in the night. This is what the CPU I/O on new relic show: https://rpm.newrelic.com/public/charts/8RnSOlWjfBy And this is what iostat shows: Linux 3.2.0-33-generic (master-db) 11/30/2012 _x86_64_

Re: [PERFORM] Optimize update query

2012-11-30 Thread Shaun Thomas
On 11/30/2012 08:48 AM, Niels Kristian Schjødt wrote: I forgot to run 'sudo mount -a' I feel so embarrassed now :-( - In other words no the drive was not mounted to the /ssd dir. Yeah, that'll get ya. I still see a lot of CPU I/O when doing a lot of writes, so the question is, what's next. S

Re: [PERFORM] Optimize update query

2012-11-30 Thread Niels Kristian Schjødt
Den 30/11/2012 kl. 15.02 skrev Shaun Thomas : > On 11/29/2012 08:32 PM, Niels Kristian Schjødt wrote: > >> If I do a "sudo iostat -k 1" >> I get a lot of output like this: >> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn >> sda 0.00 0.00 0.00

Re: [PERFORM] Optimize update query

2012-11-30 Thread Shaun Thomas
On 11/30/2012 02:37 AM, Vitalii Tymchyshyn wrote: Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. While this is true, Niels' problem is that his regular HDs are getting saturated. In that case, moving any activity off of t

Re: [PERFORM] Optimize update query

2012-11-30 Thread Shaun Thomas
On 11/29/2012 08:32 PM, Niels Kristian Schjødt wrote: If I do a "sudo iostat -k 1" I get a lot of output like this: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda 0.00 0.00 0.00 0 0 sdb 0.00 0.00

Re: [PERFORM] Optimize update query

2012-11-30 Thread Shaun Thomas
On 11/30/2012 07:31 AM, Niels Kristian Schjødt wrote: In theory what difference should it make to the performance, to have a pool in front of the database, that all my workers and web servers connect to instead of connecting directly? Where is the performance gain coming from in that situation?

Re: [PERFORM] Optimize update query

2012-11-30 Thread Niels Kristian Schjødt
Okay, So to understand this better before I go with that solution: In theory what difference should it make to the performance, to have a pool in front of the database, that all my workers and web servers connect to instead of connecting directly? Where is the performance gain coming from in tha

Re: [PERFORM] Optimize update query

2012-11-30 Thread Kevin Grittner
Niels Kristian Schjødt wrote: >> You said before that you were seeing high disk wait numbers. Now >> it is zero accourding to your disk utilization graph. That >> sounds like a change to me. > Hehe, I'm sorry if it somehow was misleading, I just wrote "a lot > of I/O" it was CPU I/O >>> A lot of

Re: [PERFORM] Optimize update query

2012-11-30 Thread Vitalii Tymchyshyn
SSDs are not faster for sequential IO as I know. That's why (with BBU or synchronious_commit=off) I prefer to have logs on regular HDDs. Best reag 2012/11/30 Willem Leenen > > Actually, what's the point in putting logs to ssd? SSDs are good for > random access and logs are accessed sequential

Re: [PERFORM] Optimize update query

2012-11-30 Thread Willem Leenen
Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd 30 лист. 2012 04:33, "Niels Kristian Schjødt" напис. Because SSD's are considered faster. Then you have to put the most p

Re: [PERFORM] Optimize update query

2012-11-30 Thread Vitalii Tymchyshyn
Oh, yes. I don't imagine DB server without RAID+BBU :) When there is no BBU, SSD can be handy. But you know, SSD is worse in linear read/write than HDD. Best regards, Vitalii Tymchyshyn 2012/11/30 Mark Kirkwood > Most modern SSD are much faster for fsync type operations than a spinning > disk

Re: [PERFORM] Optimize update query

2012-11-30 Thread Mark Kirkwood
When I try your command sequence I end up with the contents of the new pg_xlog owned by root. Postgres will not start: PANIC: could not open file "pg_xlog/000100060080" (log file 6, segment 128): Permission denied While this is fixable, I suspect you have managed to leave the xlo

Re: [PERFORM] Optimize update query

2012-11-30 Thread Mark Kirkwood
Most modern SSD are much faster for fsync type operations than a spinning disk - similar performance to spinning disk + writeback raid controller + battery. However as you mention, they are great at random IO too, so Niels, it might be worth putting your postgres logs *and* data on the SSDs an

Re: [PERFORM] Optimize update query

2012-11-30 Thread Vitalii Tymchyshyn
Actually, what's the point in putting logs to ssd? SSDs are good for random access and logs are accessed sequentially. I'd put table spaces on ssd and leave logs on hdd 30 лист. 2012 04:33, "Niels Kristian Schjødt" напис. > Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's

Re: [PERFORM] Optimize update query

2012-11-29 Thread Niels Kristian Schjødt
Hmm I'm getting suspicious here. Maybe my new great setup with the SSD's is not really working as it should., and maybe new relic is not monitoring as It should. If I do a "sudo iostat -k 1" I get a lot of output like this: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn s

Re: [PERFORM] Optimize update query

2012-11-29 Thread Niels Kristian Schjødt
Den 30/11/2012 kl. 02.24 skrev "Kevin Grittner" : > Niels Kristian Schjødt wrote: > >> Okay, now I'm done the updating as described above. I did the >> postgres.conf changes. I did the kernel changes, i added two >> SSD's in a software RAID1 where the pg_xlog is now located - >> unfortunately the

Re: [PERFORM] Optimize update query

2012-11-29 Thread Kevin Grittner
Niels Kristian Schjødt wrote: > Okay, now I'm done the updating as described above. I did the > postgres.conf changes. I did the kernel changes, i added two > SSD's in a software RAID1 where the pg_xlog is now located - > unfortunately the the picture is still the same :-( You said before that y

Re: [PERFORM] Optimize update query

2012-11-29 Thread Niels Kristian Schjødt
Den 28/11/2012 kl. 17.54 skrev Shaun Thomas : > On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote: > >> https://rpm.newrelic.com/public/charts/h2dtedghfsv > > Doesn't this answer your question? > > That iowait is crushing your server into the ground. It's no surprise updates > are taking s

Re: [PERFORM] Optimize update query

2012-11-28 Thread Mark Kirkwood
In later kernels these have been renamed: Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-32-generic x86_64) $ sysctl -a|grep dirty vm.dirty_background_ratio = 5 vm.dirty_background_bytes = 0 vm.dirty_ratio = 10 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 30

Re: [PERFORM] Optimize update query

2012-11-28 Thread Niels Kristian Schjødt
Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio = 1" command rerurns: error: "vm.dirty_writeback_ratio" is an unknown key I'm on ubuntu 12.04 Den 28/11/2012 kl. 17.54 skrev Shaun Thomas : > On 11/28/2012 10:19 AM, Niels Kri

Re: [PERFORM] Optimize update query

2012-11-28 Thread Shaun Thomas
On 11/28/2012 11:44 AM, Niels Kristian Schjødt wrote: Thanks a lot - on the server I already have one additional SSD 250gb disk, that I don't use for anything at the moment. God. An SSD would actually be better for your data, as it follows more random access patterns, and xlogs are more se

Re: [PERFORM] Optimize update query

2012-11-28 Thread Shaun Thomas
On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote: https://rpm.newrelic.com/public/charts/h2dtedghfsv Doesn't this answer your question? That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update you sent us *should* execute on the

Re: [PERFORM] Optimize update query

2012-11-28 Thread Bèrto ëd Sèra
max_connections = 1000 looks bad... why not a pooler in place? Cheers Bèrto On 28 November 2012 16:19, Niels Kristian Schjødt wrote: > max_connections = 1000 -- == If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and

Re: [PERFORM] Optimize update query

2012-11-28 Thread Niels Kristian Schjødt
Okay guys, Thanks for all the great help and advice already! Let me just clear some things, to make my question a little easier to answer :-) Now my site is a search engine for used cars - not just a car shop with a few hundred cars. The update query you look at, is an update that is executed on

Re: [PERFORM] Optimize update query

2012-11-28 Thread Willem Leenen
I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool. > +1, sql databases has limited number of inserts/updates per second. Even > with highend hardware you won't have more than XXX operations per > second. As Thomas said, you shoul

Re: [PERFORM] Optimize update query

2012-11-28 Thread Kevin Grittner
Niels Kristian Schjødt wrote: > PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and > two 3T disks in a software raid 1 setup. In addtion to the excellent advice from Shaun, I would like to point out a few other things. One query runs on one core. In a test of a single query, the othe

Re: [PERFORM] Optimize update query

2012-11-28 Thread Marcin Mirosław
W dniu 28.11.2012 15:07, Shaun Thomas pisze: > On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote: > > Before I go crazy, here... you really need to tell us what "not enough" > means. You didn't provide an explain analyze, so we don't know what your > actual performance is. But I have my suspici

Re: [PERFORM] Optimize update query

2012-11-28 Thread Shaun Thomas
On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote: Before I go crazy, here... you really need to tell us what "not enough" means. You didn't provide an explain analyze, so we don't know what your actual performance is. But I have my suspicions. So as you can see, it's already pretty optimize

[PERFORM] Optimize update query

2012-11-28 Thread Niels Kristian Schjødt
Hi, i have these update queries, that run very often, and takes too long time, in order for us to reach the throughput we are aiming at. However, the update query is very simple, and I can't figure out any way to improve the situation. The query looks like this: UPDATE "adverts" SET "last_obser

Re: [PERFORM] Optimize the database performance

2011-10-17 Thread Cédric Villemain
2011/10/17 Micka : > Hi, > > I've a postgres 9.1 database used for map generating ( tiles ). > The system has 24Go RAM and 5 processors. > I'm using geoserver to generate the tiles. > > My data used 8486 MB  => psql -d gis -c "SELECT > pg_size_pretty(pg_database_size('gis'))" > > I've carefully ind

Re: [PERFORM] Optimize the database performance

2011-10-17 Thread Andy Colson
On 10/17/2011 04:48 AM, Micka wrote: Hi, I've a postgres 9.1 database used for map generating ( tiles ). The system has 24Go RAM and 5 processors. I'm using geoserver to generate the tiles. My data used 8486 MB => psql -d gis -c "SELECT pg_size_pretty(pg_database_size('gis'))" I've carefully

Re: [PERFORM] Optimize the database performance

2011-10-17 Thread MirrorX
hello Micha, i think that noone can tell you much without more information about your system. roughly i would say that you could change the following parameters: shared_buffers = 1024MB -> 6GB work_mem = 256MB -> 30-50 MB effective_cache_size = 5120MB -> 16GB (depends on whether its a dedicated d

[PERFORM] Optimize the database performance

2011-10-17 Thread Micka
Hi, I've a postgres 9.1 database used for map generating ( tiles ). The system has 24Go RAM and 5 processors. I'm using geoserver to generate the tiles. My data used 8486 MB => psql -d gis -c "SELECT pg_size_pretty(pg_database_size('gis'))" I've carefully indexes the table by the "the_geom" col

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-02 Thread David Jarvis
Hi, Hmm, that's nice, though I cannot but wonder whether the exclusive lock > required by CLUSTER is going to be a problem in the long run. > Not an issue; the inserts are one-time (or very rare; at most: once a year). > Hm, keep in mind that if the station clause alone is not selective > enou

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-02 Thread David Jarvis
Sorry, Alvaro. I was contemplating using a GIN or GiST index as a way of optimizing the query. Instead, I found that re-inserting the data in order of station ID (the primary look-up column) and then CLUSTER'ing on the station ID, taken date, and category index increased the speed by an order of

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from David Jarvis's message of mar jun 01 14:01:22 -0400 2010: > Sorry, Alvaro. > > I was contemplating using a GIN or GiST index as a way of optimizing the > query. My fault -- I didn't read the whole thread. > Instead, I found that re-inserting the data in order of station ID (the > p

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Alvaro Herrera
Excerpts from Matthew Wakeling's message of mar jun 01 05:55:35 -0400 2010: > On Sun, 23 May 2010, David Jarvis wrote: > > The measurement table indexes (on date and weather station) were not being > > used because the only given date ranges (e.g., 1900 - 2009) were causing the > > planner to do a

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Matthew Wakeling
On Sun, 23 May 2010, David Jarvis wrote: The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were causing the planner to do a full table scan, which is correct. I wonder if you might see some benefit from CLUSTER

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-23 Thread David Jarvis
Hi, The problem is now solved (in theory). Well, it's not the functions per se that's the problem, it's the lack of > a useful index on the expression. > The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were ca

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-22 Thread David Jarvis
Hi, certainly understand that you wouldn't want to partition by year. It > Definitely not. > does strike me that perhaps you could partition by day ranges, but you'd > I don't think that will work; users can choose any day range, with the most common as Jan 1 - Dec 31, followed by seasonal ra

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-22 Thread David Jarvis
Hi, CREATE INDEX measurement_01_001_y_idx >> ON climate.measurement_01_001 >> USING btree >> (date_part('year'::text, taken)); >> >> Is that equivalent to what you suggest? >> > > No. It is not the same function, so Postgres has no way to know it produces > the same results (if it does). > Thi

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Stephen Frost
* Yeb Havinga (yebhavi...@gmail.com) wrote: >> Normalizing by date parts was fast. Partitioning the tables by year >> won't do much good -- users will probably choose 1900 to 2009, >> predominately. > Ok, in that case it is a bad idea. Yeah, now that I understand what the user actually wants,

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
Regarding the leap year problem, you might consider creating a modified day of year field, which always assumes that the year contains a leap day. Then a given number always resolves to a given date, regardless of year. If you then partition (or index) on that field, then you may get a benefit.

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
David Jarvis wrote: Hi, Yeb. This is starting to go back to the design I used with MySQL: * YEAR_REF - Has year and station * MONTH_REF - Has month, category, and yea referencer * MEASUREMENT - Has month reference, amount, and day Normalizing by date parts was fast. Partitioning th

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
Matthew Wakeling wrote: On Fri, 21 May 2010, Yeb Havinga wrote: For time based data I would for sure go for year based indexing. On the contrary, most of the queries seem to be over many years, but rather restricting on the time of year. Therefore, partitioning by month or some other per-yea

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread David Jarvis
Hi, Yeb. This is starting to go back to the design I used with MySQL: - YEAR_REF - Has year and station - MONTH_REF - Has month, category, and yea referencer - MEASUREMENT - Has month reference, amount, and day Normalizing by date parts was fast. Partitioning the tables by year won't do

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
On Fri, 21 May 2010, Yeb Havinga wrote: For time based data I would for sure go for year based indexing. On the contrary, most of the queries seem to be over many years, but rather restricting on the time of year. Therefore, partitioning by month or some other per-year method would seem sensi

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
There is a thing that might lead to confusion in the previous post: create or replace function yearmod(int) RETURNS int as 'select (($1 >> 2) %32);' language sql immutable strict; is equivalent with create or replace function yearmod(int) RETURNS int as 'select (($1 / 4) %32);' language sql imm

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Yeb Havinga
David Jarvis wrote: Also, you're trying to do constraint_exclusion, but have you made sure that it's turned on? And have you made sure that those constraints are really the right ones and that they make sense? You're using a bunch of extract()'s there too, why not just

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, (An EXPLAIN ANALYSE would be better here). Look at the expected number of > stations "Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual time=0.053..41.173 rows=78 loops=1)" " Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Thu, 20 May 2010, David Jarvis wrote: I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: Yes, because there is only one station_id selected. That's exactly what an index is for. Then c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: "Result (cost=0.00..21781.18 rows=8090 width=28)" " -> Append (cost=0.00..21781.18 rows=8090 width=28)" "-> Seq Scan on measur

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, Something in here really smells fishy to me. Those extract's above are > working on values which are from the table.. Why aren't you using these > functions to figure out how to construct the actual dates based on the > values provided by the *user*..? > Because I've only been using Postgre

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, check (taken >= '1913-12-01' and taken <= '1913-12-31') > I don't think I want to constrain by year, for a few reasons: 1. There are a lot of years -- over 110. 2. There will be more years added (both in the future for 2010 and in the past as I get data from other sources). Currently

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, ~300 million measurements ~12000 stations (not 7 as I mentioned before) ~5500 cities some serious data tho, at least. Basically, PG is sequentially scanning > through all of the tables in your partitioning setup. What is > constraint_exclusion set to? What version of PG is this? Do th

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > I was still referring to the measurement table. You have an index on > > stationid, but still seem to be getting a sequential scan. Maybe the planner > > does not realise that you are selecting a small number of stations. Posting > > an EXPLAIN ANALYSE

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > There are 72 child tables, each having a year index and a station index, > which are defined as follows: S, my thoughts: Partition by something that makes sense... Typically, I'd say that you would do it by the category id and when the measuremen

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Yeb Havinga
Tom Lane wrote: David Jarvis writes: I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > I was hoping to eliminate this part of the query: > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
The greatest() expression reduces to either the current year (year + 0) or the next year (year + 1) by taking the sign of the difference in start/end days. This allows me to derive an end date, such as: Dec 22, 1900 to Mar 22, 1901 Then I check if the measured date falls between those two dates.

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, I was still referring to the measurement table. You have an index on > stationid, but still seem to be getting a sequential scan. Maybe the planner > does not realise that you are selecting a small number of stations. Posting > an EXPLAIN ANALYSE would really help here. > Here is the result f

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > I was hoping to eliminate this part of the query: > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 > ) AS t

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date That uses

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > What if I were to have the application pass in two sets of date ranges? > For the condition of Dec 22 to Mar 22: > Dec 22 would become: >- Dec 22 - Dec 31 > Mar 22 would become: >- Jan 1 - Mar 22 I think what you're essentially describing here is removing the OR fro

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
What if I were to have the application pass in two sets of date ranges? For the condition of Dec 22 to Mar 22: Dec 22 would become: - Dec 22 - Dec 31 Mar 22 would become: - Jan 1 - Mar 22 The first range would always be for the current year; the second range would always be for the year

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > I was thinking that I could add three more columns to the measurement table: > year_taken, month_taken, day_taken > Then index those. That should allow me to avoid extracting years, months, > and days from the *m.taken* date column. You could, but I don't think there's any

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
When using MySQL, the performance was okay (~5 seconds per query) using: date( concat_ws( '-', y.year, m.month, d.day ) ) between -- Start date. date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND -- End date. Calculated by checking to see if the end date wraps -- into the

  1   2   >