date:20140403

Re: [GENERAL] Linux vs FreeBSD

2014-04-03 Thread Amitabh Kant

On Fri, Apr 4, 2014 at 9:33 AM, François Beausoleil wrote:

> Hi all!
>
> Does PG perform that much better on FreeBSD? I have some performance
> issues on a Ubuntu 12.04 which I'd like to resolve. iowait varies a lot,
> between 5 and 50%. Does FreeBSD better schedule I/O, which could alleviate
> some of the issues, or not at all? I have no experience administering
> FreeBSD, but I'm willing to learn if I'll get some performance enhancements
> out of the switch.
>
> Our workload is lots of data import, followed by many queries to summarize
> (daily and weekly reports). Our main table is a wide table that represents
> Twitter and Facebook interactions. Most of our reports work on a week's
> worth of data (table is partitioned by week), and the tables are
> approximately 25 GB plus 5 GB of indices, per week. Of course, while
> reports are ongoing, we're also importing next week's data.
>
> The host is a dedicated hardware machine at online.fr: 128 GB RAM, 2 x
> 3TB disk in RAID 1 configuration.
>
> I started thinking of this after reading "PostgreSQL pain points" at
> https://lwn.net/Articles/591723/. In the comments, bronson says FreeBSD
> does not exhibit the same problems (slow fsync, double buffering). On the
> list here, I've read about problems with certain kernel versions on Ubuntu.
>
> I'm not expecting anything magical, just some general guidelines and
> hints. Did anybody do the migration and was happier after?
>
> Thanks for any hints!
> François Beausoleil
>
> $ uname -a
> Linux munn.ca.seevibes.com 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3
> 17:37:58 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>
> $ psql -U postgres -c "select version()"
>version
>
> -
>  PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc
> (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
>
> /proc/cpuinfo says: 8 CPUs, identified as "Intel(R) Xeon(R) CPU E5-2609 0
> @ 2.40GHz"
>
>

Cannot give you a comparison, but running couple of dedicated PG servers
(9.1 & 9.2) on FreeBSD 9.x. Not seen much of a problem, apart from tuning
some sysctl variables for higher memory usage. My hardware uses either SAS
or SSD disks. RAM varies between 32 to 128 GB between various servers. My
workload is more of lots of small read and writes.

Amitabh

Re: [GENERAL] Linux vs FreeBSD

2014-04-03 Thread John R Pierce


On 4/3/2014 9:03 PM, François Beausoleil wrote:

The host is a dedicated hardware machine at online.fr: 128 GB RAM, 2 x 3TB disk 
in RAID 1 configuration.


just a passing comment...

3TB disks are 7200rpm and suitable for nearline bulk storage (or desktop 
use), not high performance database random access. you'd get way more IO 
throughput with a raid10 array of 15k rpm SAS disks, maybe 10 x 600GB or 
whatever, or with a suitable SSD configuration, although 3TB of SSD 
might get expensive.




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Linux vs FreeBSD

2014-04-03 Thread François Beausoleil

Hi all!

Does PG perform that much better on FreeBSD? I have some performance issues on 
a Ubuntu 12.04 which I'd like to resolve. iowait varies a lot, between 5 and 
50%. Does FreeBSD better schedule I/O, which could alleviate some of the 
issues, or not at all? I have no experience administering FreeBSD, but I'm 
willing to learn if I'll get some performance enhancements out of the switch.

Our workload is lots of data import, followed by many queries to summarize 
(daily and weekly reports). Our main table is a wide table that represents 
Twitter and Facebook interactions. Most of our reports work on a week's worth 
of data (table is partitioned by week), and the tables are approximately 25 GB 
plus 5 GB of indices, per week. Of course, while reports are ongoing, we're 
also importing next week's data.

The host is a dedicated hardware machine at online.fr: 128 GB RAM, 2 x 3TB disk 
in RAID 1 configuration.

I started thinking of this after reading "PostgreSQL pain points" at 
https://lwn.net/Articles/591723/. In the comments, bronson says FreeBSD does 
not exhibit the same problems (slow fsync, double buffering). On the list here, 
I've read about problems with certain kernel versions on Ubuntu.

I'm not expecting anything magical, just some general guidelines and hints. Did 
anybody do the migration and was happier after?

Thanks for any hints!
François Beausoleil

$ uname -a
Linux munn.ca.seevibes.com 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ psql -U postgres -c "select version()"
   version
-
 PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 
4.6.3-1ubuntu5) 4.6.3, 64-bit

/proc/cpuinfo says: 8 CPUs, identified as "Intel(R) Xeon(R) CPU E5-2609 0 @ 
2.40GHz"



smime.p7s
Description: S/MIME cryptographic signature

Re: [GENERAL] Is it safe to stop postgres in between pg_start_backup and pg_stop_backup?

2014-04-03 Thread AI Rumman

What we did in this kind of higher performance storage migration, setting
up standby on that mounts and then executed a failover.


On Thu, Apr 3, 2014 at 3:58 PM, Alan Hodgson  wrote:

> On Thursday, April 03, 2014 02:48:03 PM Steven Schlansker wrote:
> > On Apr 2, 2014, at 3:08 PM, Jacob Scott  wrote:
> >   * pg_start_backup
> >   * Take a filesystem snapshot (of a volume containing postgres data
> but not
> > pg_xlog) * pg_stop_backup
> >   * pg_ctl stop
> >   * Bring a new higher performing disk online from snapshot
> >   * switch disks (umount/remount at same mountpoint)
> >   * pg_ctl start
>
> ... with a recovery.conf in place when starting the new instance.
>
> >
> > Assuming you ensure that your archived xlogs are available same to the
> new
> > instance as the old
>
> And make sure they're archived to a different disk.
>
> > Another option you could consider is rsync.  I have often transferred
> > databases by running rsync concurrently with the database to get a "dirty
> > backup" of it.  Then once the server is shutdown you run a cleanup rsync
> > which is much faster than the initial run to ensure that the destination
> > disk is consistent and up to date.  This way your downtime is limited to
> > how long it takes rsync to compare fs trees / fix the inconsistencies.
> >
>
> This would be simpler.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Is it safe to stop postgres in between pg_start_backup and pg_stop_backup?

2014-04-03 Thread Alan Hodgson

On Thursday, April 03, 2014 02:48:03 PM Steven Schlansker wrote:
> On Apr 2, 2014, at 3:08 PM, Jacob Scott  wrote:
>   • pg_start_backup
>   • Take a filesystem snapshot (of a volume containing postgres data but 
> not
> pg_xlog) • pg_stop_backup
>   • pg_ctl stop
>   • Bring a new higher performing disk online from snapshot
>   • switch disks (umount/remount at same mountpoint)
>   • pg_ctl start

... with a recovery.conf in place when starting the new instance.

> 
> Assuming you ensure that your archived xlogs are available same to the new
> instance as the old

And make sure they're archived to a different disk.

> Another option you could consider is rsync.  I have often transferred
> databases by running rsync concurrently with the database to get a “dirty
> backup” of it.  Then once the server is shutdown you run a cleanup rsync
> which is much faster than the initial run to ensure that the destination
> disk is consistent and up to date.  This way your downtime is limited to
> how long it takes rsync to compare fs trees / fix the inconsistencies.
> 

This would be simpler.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread Scott Marlowe

On Thu, Apr 3, 2014 at 1:44 PM, Brent Wood  wrote:
>
> Hi David,
>
> Does the RAID 1 array give any performance benefits over a single drive? I'd 
> guess that writes may be slower, reads may be faster (if balanced) but data 
> security is improved.

I did some testing on machines with 3xMLC FusionIO Drive2s with 1.2TB.
Comparing 1 drive and 2 drives in RAID-1 the difference in performance
was minimal. However, a 3 drive mirror was noticeably slower. This was
all with ubuntu 12.04 using 3.8.latest kernel and software RAID.
RAID-0 was by far the fastest, about 30% faster than either a single
or a pair of drives in RAID-1

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread Scott Marlowe

On Thu, Apr 3, 2014 at 3:28 PM, Merlin Moncure  wrote:
> On Thu, Apr 3, 2014 at 2:53 PM, Scott Marlowe  wrote:
>> On a machine with 16 cores with HT (appears as 32 cores) and 8 of the
>> 3700 series Intel SSDs in a RAID-10 under an LSI MegaRAID with BBU, I
>> was able to get 6300 to 7500 tps on a decent sized pgbench db
>> (-s1000).
>
> Did you happen to grab any 'select only' numbers?

Darnit. Nope. I'll try to grab some on a spare box if I get one again.
Now they're all in production so running pgbench is kind of frowned
upon.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Spring JDBC and the PostgreSQL JDBC driver

2014-04-03 Thread Matthew Chambers



On 04/04/14 10:22, John R Pierce wrote:

On 4/3/2014 1:31 PM, Matthew Chambers wrote:
This removes all the boilerplate associated with old style JDBC.  It 
also means you get great reuse of your SQL code since the transaction 
starts at the entry point.  Most of my SQL related code is just 1 
liners using Springs JdbcTemplate class.  I've written some massive 
applications with this style.  My current postgres project sees about 
4gbps of traffic during peak times and there is not an explicit 
begin/commit in the entire code base.


so how does the OP use Jdbc4Array.getArray()  on an object returned 
from a Spring execute if Spring has automagically released/closed the 
connection?






Well, you wouldn't be calling Jdbc4Array.getArray() anywhere in your 
code where you don't have a connection, you would be doing that where 
the connection is active. The connection doesn't go away until the 
function that checked the connection out returns.  Assuming you have a 
ResultSet object you can do this:


String[] arrayOfStrings= (String[]) 
resultSet.getArray("col_name").getArray();


To put data into an array field, you have to use a 
PreparedStatementCreator which gives you access to the 
java.sql.Connection, so you can call "createArrayOf" from that.


   jdbc.update(new PreparedStatementCreator() {
public PreparedStatement createPreparedStatement(final 
Connection conn) throws SQLException {
final PreparedStatement ret = 
conn.prepareStatement(UPDATE_TAGS);

ret.setObject(1, conn.createArrayOf("text", tags));
ret.setObject(2, id);
return ret;
}
});



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is it safe to stop postgres in between pg_start_backup and pg_stop_backup?

2014-04-03 Thread Steven Schlansker

On Apr 2, 2014, at 3:08 PM, Jacob Scott  wrote:

> Hi, 

Hello there ;)

> 
> 
> Does upgrading a a disk being used by postgres (9.1, on Ubuntu) with the 
> following process sound safe?
>   • pg_start_backup
>   • Take a filesystem snapshot (of a volume containing postgres data but 
> not pg_xlog)
>   • Bring a new higher performing disk online from snapshot
>   • pg_ctl stop
>   • switch disks (umount/remount at same mountpoint)
>   • pg_ctl start
>   • pg_stop_backup
> This seems like an odd use case because pg_start_backup is designed for 
> performing on-line backups, but I think it will give me minimum downtime.

At the very least you need to move your pg_stop_backup earlier in the process.  
Online backups do not survive server shutdowns; any backup in process at 
shutdown is aborted.

• pg_start_backup
• Take a filesystem snapshot (of a volume containing postgres data but 
not pg_xlog)
• pg_stop_backup
• pg_ctl stop
• Bring a new higher performing disk online from snapshot
• switch disks (umount/remount at same mountpoint)
• pg_ctl start

Assuming you ensure that your archived xlogs are available same to the new 
instance as the old, I believe this should work.  But run it on a test instance 
first!

It sounds like an odd use case but really it’s no different from “the data is 
gone, restore from backup” — just that you intentionally trashed the data by 
switching disks :)

Another option you could consider is rsync.  I have often transferred databases 
by running rsync concurrently with the database to get a “dirty backup” of it.  
Then once the server is shutdown you run a cleanup rsync which is much faster 
than the initial run to ensure that the destination disk is consistent and up 
to date.  This way your downtime is limited to how long it takes rsync to 
compare fs trees / fix the inconsistencies.

Good luck!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread Brent Wood

Hi David,

Does the RAID 1 array give any performance benefits over a single drive? I'd 
guess that writes may be slower, reads may be faster (if balanced) but data 
security is improved.

Brent Wood

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | 
www.niwa.co.nz
[NIWA]

From: pgsql-general-ow...@postgresql.org [pgsql-general-ow...@postgresql.org] 
on behalf of David Rees [dree...@gmail.com]
Sent: Friday, April 4, 2014 8:32 AM
To: Merlin Moncure
Cc: bret_st...@machinemanagement.com; PostgreSQL General
Subject: Re: [GENERAL] SSD Drives

On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure  wrote:
> On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
>  wrote:
>> Any opinions/comments on using SSD drives with postgresql?
>
> Here's a single S3700 smoking an array of 16 15k drives (poster didn't
> realize that; was to focused on synthetic numbers):
> http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd

I just ran a quick test earlier this week on an old Dell 2970 (2
Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS
disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs
are about 4-6x faster using pgbench and a scaling factor of 1100. Some
sort of MegaRAID controller according to lspci and has BBU. TPS
numbers below are approximate.

RAID10 disk array:
8 clients: 350 tps
16 clients: 530 tps
32 clients: 800 tps

RAID1 SSD array:
8 clients: 2100 tps
16 clients: 2500 tps
32 clients: 3100 tps

So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
write workload isn't too high, the S3500 can work well. We'll see how
the SMART drive lifetime numbers do once we get into production, but
right now we estimate they should last at least 5 years and from what
we've seen it seems that SSDs seem to wear much better than expected.
If not, we'll pony up and go for the S3700 or perhaps move the xlog
back on to spinning disks.

-Dave

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

<>

[GENERAL] Is it safe to stop postgres in between pg_start_backup and pg_stop_backup?

2014-04-03 Thread Jacob Scott

Hi,


Does upgrading a a disk being used by postgres (9.1, on Ubuntu) with the
following process sound safe?

   1. pg_start_backup
   2. Take a filesystem snapshot (of a volume containing postgres data but
   not pg_xlog)
   3. Bring a new higher performing disk online from snapshot
   4. pg_ctl stop
   5. switch disks (umount/remount at same mountpoint)
   6. pg_ctl start
   7. pg_stop_backup

This seems like an odd use case because pg_start_backup is designed for
performing on-line backups, but I think it will give me minimum downtime.

Thanks,

Jacob

[GENERAL] Need some help in postgres locking mechanism

2014-04-03 Thread santhosh kumar

Hi
I have two problems to discuss.an you please guide me how to proceed on
this.
*problem 1:*
I have table X('a' is prmiary key) and table Y('b' is primary key).
Table Y has ''a''(X primary key) as foreign key.I have one job which runs
once in one hour.I want to lock the rows in table Y.but this will also lock
table X rows to avoid concurrent actions.My table X is used by other jobs
as well for updation.
The solution what I thought is implement  KEY SHARE on 'a' column in table
Y.That will help others jobs to update the corresponding rows in table X
without any issue.

My doubt is *how to keep  key share lock on "a" column in table Y*.by
default lock is on column 'b' which is primary key of table Y. My table X
is so huge.I dont want any other locks on it.
we are using postgres 9.2.

*one more doubt:**If I implement key share lock on 'a' column,will the
default locks on table X and table Y removed or not?*

*PLEASE PROVIDE SYNTAX AS WELL*

*problem2:*
This is a different issue.I have a table Q('w' is primary key).When a job
runs ,i want to lock some rows so that the other parallel job wont be
considering this row.

what is the simple and best lock I can implement on these rows?I want with
NOWAIT option.

kindly give solutions to above issues.I would be greatful for that.



Thanks
K.Santhosh

Re: [GENERAL] SSD Drives

2014-04-03 Thread Merlin Moncure

On Thu, Apr 3, 2014 at 2:53 PM, Scott Marlowe  wrote:
> On a machine with 16 cores with HT (appears as 32 cores) and 8 of the
> 3700 series Intel SSDs in a RAID-10 under an LSI MegaRAID with BBU, I
> was able to get 6300 to 7500 tps on a decent sized pgbench db
> (-s1000).

Did you happen to grab any 'select only' numbers?

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Spring JDBC and the PostgreSQL JDBC driver

2014-04-03 Thread John R Pierce


On 4/3/2014 1:31 PM, Matthew Chambers wrote:
This removes all the boilerplate associated with old style JDBC.  It 
also means you get great reuse of your SQL code since the transaction 
starts at the entry point.  Most of my SQL related code is just 1 
liners using Springs JdbcTemplate class.  I've written some massive 
applications with this style.  My current postgres project sees about 
4gbps of traffic during peak times and there is not an explicit 
begin/commit in the entire code base.


so how does the OP use Jdbc4Array.getArray()  on an object returned from 
a Spring execute if Spring has automagically released/closed the connection?




--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Spring JDBC and the PostgreSQL JDBC driver

2014-04-03 Thread Matthew Chambers



On 04/04/14 05:39, John R Pierce wrote:

On 4/3/2014 9:10 AM, Miller, Michael W wrote:


The issue I'm running into is the interaction between the Spring 
Framework JDBC functionality and the PostgreSQL JDBC driver. The 
specific issue is that I'm using SimpleJdbcCall.execute() to call the 
database and getting back a Jdbc4Array. When I then try to do 
something like Jdbc4Array.getArray() I get a SQL error that can be 
tracked down to the Jdbc driver trying to use a connection object 
which has already been closed by the Spring Framework.


One of the benefits of the Spring Framework is that I don't have to 
bother with the drudge work like creating\closing connections. The 
drawback seems to be that the JDBC driver is making assumptions like 
the connection is still open.




if that framework is opening/closing connections on every query, its 
broken from a performance perspective.   and if it has methods that 
return objects, that are then broken, THATS also broken behavior.


but yeah, this belongs on the jdbc list.


The framework itself (spring) has nothing to do with connections being 
open/closed, that would be whatever JDBC connection pool your using 
under the hood.  Spring doesn't ship with a production quality 
connection pool as far as I know.


I think what the OP meant to say was that with Spring your freed from 
the need to always check your open connection back into the pool.  
Spring is using aspect oriented magic to wrap your entry point functions 
with some code that checks out a connection, starts a transaction (sets 
these values as thread local objects that the rest of the framework 
uses), and then has finally block that cleans it all up for you.  So all 
you do is mark functions that you want to open a transaction, and the 
type of transaction you want. (read only, serializable, etc)


This removes all the boilerplate associated with old style JDBC.  It 
also means you get great reuse of your SQL code since the transaction 
starts at the entry point.  Most of my SQL related code is just 1 liners 
using Springs JdbcTemplate class.  I've written some massive 
applications with this style.  My current postgres project sees about 
4gbps of traffic during peak times and there is not an explicit 
begin/commit in the entire code base.


-Matt

Re: [GENERAL] SSD Drives

2014-04-03 Thread David Boreham


On 4/3/2014 2:00 PM, John R Pierce wrote:


an important thing in getting decent wear leveling life with SSDs is 
to keep them under about 70% full.




This depends on the drive : drives with higher specified write endurance 
already have significant overprovisioning, before the user sees the space.







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] New OpenSource columnstore from CitusData for Postgresql

2014-04-03 Thread Dann Corbit

I wonder how this approach:
https://github.com/citusdata/cstore_fdw

compares to this one:
https://github.com/knizhnik/imcs


From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Dorian Hoxha
Sent: Thursday, April 3, 2014 12:58 PM
To: PostgreSql-general
Subject: [GENERAL] New OpenSource columnstore from CitusData for Postgresql

Link to hackernews which also has some comments from the devs

https://news.ycombinator.com/item?id=7523950
Very interesting: They use foreign data tables as an abstraction to separate 
the storage layer from the rest of the database.

Re: [GENERAL] SSD Drives

2014-04-03 Thread John R Pierce


On 4/3/2014 12:32 PM, David Rees wrote:

So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
write workload isn't too high, the S3500 can work well. We'll see how
the SMART drive lifetime numbers do once we get into production, but
right now we estimate they should last at least 5 years and from what
we've seen it seems that SSDs seem to wear much better than expected.
If not, we'll pony up and go for the S3700 or perhaps move the xlog
back on to spinning disks.


an important thing in getting decent wear leveling life with SSDs is to 
keep them under about 70% full.


--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] New OpenSource columnstore from CitusData for Postgresql

2014-04-03 Thread Dorian Hoxha

Link to hackernews which also has some comments from the devs

https://news.ycombinator.com/item?id=7523950

Very interesting: They use foreign data tables as an abstraction to
separate the storage layer from the rest of the database.

Re: [GENERAL] SSD Drives

2014-04-03 Thread Bret Stern

On Thu, 2014-04-03 at 12:32 -0700, David Rees wrote:
> On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure  wrote:
> > On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
> >  wrote:
> >> Any opinions/comments on using SSD drives with postgresql?
> >
> > Here's a single S3700 smoking an array of 16 15k drives (poster didn't
> > realize that; was to focused on synthetic numbers):
> > http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd
> 
> I just ran a quick test earlier this week on an old Dell 2970 (2
> Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS
> disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs
> are about 4-6x faster using pgbench and a scaling factor of 1100. Some
> sort of MegaRAID controller according to lspci and has BBU. TPS
> numbers below are approximate.
> 
> RAID10 disk array:
> 8 clients: 350 tps
> 16 clients: 530 tps
> 32 clients: 800 tps
> 
> RAID1 SSD array:
> 8 clients: 2100 tps
> 16 clients: 2500 tps
> 32 clients: 3100 tps
> 
> So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
> write workload isn't too high, the S3500 can work well. 

Is a write cycle anywhere on the drive different than a re-write?

Or is a write a write!

They feedback/comments are awesome. I'm shopping..


> We'll see how
> the SMART drive lifetime numbers do once we get into production, but
> right now we estimate they should last at least 5 years and from what
> we've seen it seems that SSDs seem to wear much better than expected.
> If not, we'll pony up and go for the S3700 or perhaps move the xlog
> back on to spinning disks.
> 
> -Dave




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread Scott Marlowe

On Thu, Apr 3, 2014 at 1:32 PM, David Rees  wrote:
> On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure  wrote:
>> On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
>>  wrote:
>>> Any opinions/comments on using SSD drives with postgresql?
>>
>> Here's a single S3700 smoking an array of 16 15k drives (poster didn't
>> realize that; was to focused on synthetic numbers):
>> http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd
>
> I just ran a quick test earlier this week on an old Dell 2970 (2
> Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS
> disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs
> are about 4-6x faster using pgbench and a scaling factor of 1100. Some
> sort of MegaRAID controller according to lspci and has BBU. TPS
> numbers below are approximate.
>
> RAID10 disk array:
> 8 clients: 350 tps
> 16 clients: 530 tps
> 32 clients: 800 tps
>
> RAID1 SSD array:
> 8 clients: 2100 tps
> 16 clients: 2500 tps
> 32 clients: 3100 tps
>
> So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
> write workload isn't too high, the S3500 can work well. We'll see how
> the SMART drive lifetime numbers do once we get into production, but
> right now we estimate they should last at least 5 years and from what
> we've seen it seems that SSDs seem to wear much better than expected.
> If not, we'll pony up and go for the S3700 or perhaps move the xlog
> back on to spinning disks.

On a machine with 16 cores with HT (appears as 32 cores) and 8 of the
3700 series Intel SSDs in a RAID-10 under an LSI MegaRAID with BBU, I
was able to get 6300 to 7500 tps on a decent sized pgbench db
(-s1000).


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread David Rees

On Thu, Apr 3, 2014 at 12:44 PM, Brent Wood  wrote:
> Does the RAID 1 array give any performance benefits over a single drive? I'd 
> guess
> that writes may be slower, reads may be faster (if balanced) but data 
> security is improved.

Unfortunately I didn't test a single drive as that's not a
configuration we would run our systems in. I expect that it would
reduce read performance and thus pgbench results some, but I can't
tell you how much in this case.

-Dave

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Rob Sargent


On 04/03/2014 01:28 PM, Jeff Janes wrote:
On Thu, Apr 3, 2014 at 9:04 AM, Rob Sargent > wrote:


I have to straighten out my environment, which I admit I was
hoping to avoid. I reset checkpoint_segments to 12 and restarted
my server.
I kicked of the COPY at 19:00. That generated a couple of the "too
frequent" statements but 52 "WARNING:  pgstat wait timeout" lines
during the next 8 hours starting at 00:37 (5 hours in) 'til
finally keeling over at 03:04 on line 37363768.


Those things are not necessarily problems.  If there is a problem, 
those tell you places to look, nothing more.  In particular, "pgstat 
wait timeout" just means "Someone is beating the snot out of your hard 
drives, and the stat collector just happened to notice that fact". 
 This is uninformative, because you already know you are beating the 
snot out of your hard drives.  That, after all, is the point of the 
exercise, right?  If you saw this message when you weren't doing 
anything particularly strenuous, then that would be interesting.


That's the last line of the input so obviously I didn't flush my
last println properly. I'm beyond getting embarrassed at this point.

Is turning auto-vacuum off a reasonable way through this?


No, no, no, no!  First of all, what is the "this" you are trying to 
get through?  Previously you said you were not trying to get the data 
in as fast as possible, but only to see what you can expect.  Well, 
now you see what you can expect.  You can expect to load at a certain 
speed given a certain table size, and you can expect to see some log 
messages about unusual activity.  Is it fast enough, or is it not fast 
enough?


If it is fast enough, and if you can ignore a few dozen messages in 
the log file, then you are done.  (Although you will still want to 
assess how queries against your tables are affected by the load 
process, assuming your database is used for interactive queries)


If it is not fast enough, then randomly disabling important parts of 
the system which have nothing to do with the bulk load is probably not 
the way to improve things, but is an excellent way to shoot yourself 
in the foot.


Cheers,

Jeff

Points well taken.

Others in this thread have suggested that I should in fact expect higher 
through-put so I've been angling at that for a bit.

Re: [GENERAL] SSD Drives

2014-04-03 Thread David Rees

On Thu, Apr 3, 2014 at 12:13 PM, Merlin Moncure  wrote:
> On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
>  wrote:
>> Any opinions/comments on using SSD drives with postgresql?
>
> Here's a single S3700 smoking an array of 16 15k drives (poster didn't
> realize that; was to focused on synthetic numbers):
> http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd

I just ran a quick test earlier this week on an old Dell 2970 (2
Opteron 2387, 16GB RAM) comparing a 6-disk RAID10 with 10k 147GB SAS
disks to a 2-disk RAID1 with 480GB Intel S3500 SSDs and found the SSDs
are about 4-6x faster using pgbench and a scaling factor of 1100. Some
sort of MegaRAID controller according to lspci and has BBU. TPS
numbers below are approximate.

RAID10 disk array:
8 clients: 350 tps
16 clients: 530 tps
32 clients: 800 tps

RAID1 SSD array:
8 clients: 2100 tps
16 clients: 2500 tps
32 clients: 3100 tps

So yeah, even the slower, cheaper S3500 SSDs are way fast. If your
write workload isn't too high, the S3500 can work well. We'll see how
the SMART drive lifetime numbers do once we get into production, but
right now we estimate they should last at least 5 years and from what
we've seen it seems that SSDs seem to wear much better than expected.
If not, we'll pony up and go for the S3700 or perhaps move the xlog
back on to spinning disks.

-Dave

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Jeff Janes

On Thu, Apr 3, 2014 at 9:04 AM, Rob Sargent  wrote:

> I have to straighten out my environment, which I admit I was hoping to
> avoid. I reset checkpoint_segments to 12 and restarted my server.
> I kicked of the COPY at 19:00. That generated a couple of the "too
> frequent" statements but 52 "WARNING:  pgstat wait timeout" lines during
> the next 8 hours starting at 00:37 (5 hours in) 'til finally keeling over
> at 03:04 on line 37363768.
>

Those things are not necessarily problems.  If there is a problem, those
tell you places to look, nothing more.  In particular, "pgstat wait
timeout" just means "Someone is beating the snot out of your hard drives,
and the stat collector just happened to notice that fact".  This is
uninformative, because you already know you are beating the snot out of
your hard drives.  That, after all, is the point of the exercise, right?
 If you saw this message when you weren't doing anything particularly
strenuous, then that would be interesting.

> That's the last line of the input so obviously I didn't flush my last
> println properly. I'm beyond getting embarrassed at this point.
>
> Is turning auto-vacuum off a reasonable way through this?
>

No, no, no, no!  First of all, what is the "this" you are trying to get
through?  Previously you said you were not trying to get the data in as
fast as possible, but only to see what you can expect.  Well, now you see
what you can expect.  You can expect to load at a certain speed given a
certain table size, and you can expect to see some log messages about
unusual activity.  Is it fast enough, or is it not fast enough?

If it is fast enough, and if you can ignore a few dozen messages in the log
file, then you are done.  (Although you will still want to assess how
queries against your tables are affected by the load process, assuming your
database is used for interactive queries)

If it is not fast enough, then randomly disabling important parts of the
system which have nothing to do with the bulk load is probably not the way
to improve things, but is an excellent way to shoot yourself in the foot.

Cheers,

Jeff

Re: [GENERAL] SSD Drives

2014-04-03 Thread Merlin Moncure

On Wed, Apr 2, 2014 at 2:37 PM, Bret Stern
 wrote:
> Any opinions/comments on using SSD drives with postgresql?

Here's a single S3700 smoking an array of 16 15k drives (poster didn't
realize that; was to focused on synthetic numbers):
http://dba.stackexchange.com/questions/45224/postgres-write-performance-on-intel-s3700-ssd

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Jeff Janes

On Wed, Apr 2, 2014 at 5:11 PM, Rob Sargent  wrote:

>  On 04/02/2014 04:36 PM, Jeff Janes wrote:
>
>  On Wed, Apr 2, 2014 at 12:37 PM, Rob Sargent wrote:
>
>>  I'm playing with various data models to compare performance and
>> practicalities and not sure if I should be surprised by the numbers I'm
>> getting. I hope this report isn't too wishy-washy for reasoned comment.
>>
>> One model says a genotype is defined as follows:
>>
>>  Table "public.oldstyle"
>> +-+--+---+
>> |   Column| Type | Modifiers |
>> +-+--+---+
>> | id  | uuid | not null  |
>> | sample_name | text | not null  |
>> | marker_name | text | not null  |
>> | allele1 | character(1) |   |
>> | allele2 | character(1) |   |
>> +-+--+---+
>> (0. id is a Primary Key)
>> (1. Take what you will from the table name.)
>> (2. I hadn't thought of "char" type at this point)
>> (3. Ultimately the names would become ids, RI included)
>> (4. We're loading 39 samples and ~950K markers)
>>
>> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours
>> (800+ records/sec).  Then I tried COPY and killed that after 11.25 hours
>> when I realised that I had added on non-unque index on the name fields
>> after the first load. By that point is was on line 28301887, so ~0.75 done
>> which implies it would have take ~15hours to complete.
>>
>> Would the overhead of the index likely explain this decrease in
>> throughput?
>>
>
>  Absolutely.
>
>
>>
>> Impatience got the better of me and I killed the second COPY.  This time
>> it had done 54% of the file in 6.75 hours, extrapolating to roughly 12
>> hours to do the whole thing.
>>
>
>  Are you sure you actually dropped the indices?  (And the primary key?)
>
>  I get about 375,000 lines per second with no indexes, triggers,
> constraints.
>
>  perl -le 'my $x=""; foreach(1..37e6) {$x++; print join "\t",
> "a0eebc99-9c0b-4ef8-bb6d-$x",$_,$_,"A","T"}'|time psql -c 'truncate
> oldstyle; copy oldstyle from stdin;'
>
>  (More if I truncate it in the same transaction as the copy)
>
>  If you can't drop the pk constraint, can you at least generate the
> values in sort-order?
>
>  Cheers,
>
>  Jeff
>
> I restarted the java-based loading.  Still with pk in place,  (and
> actually generating the UUIDs etc) I got ~1.1M rows in one minute. And one 
> "LOG:
> checkpoints are occurring too frequently (24 seconds apart)" with
> checkpoint_segment now at 6 (v. default 3). In  plotting the records v.
> time in that minute at least it's pretty linear.  The time per batch is
> relatively constant with a few hiccups. (.5 sec per 10K lines).  So I've
> improved things immensely but not entirely sure that the simple config
> change is the reason.  If this continued I would be done inside 40 minutes.
>
> With copy interuptus I now get 2.9M records per minute so the load would
> take only 12 or so minutes. I did get four reports of too-frequent
> checkpoints 2 at 15 seconds 2 at 9 seconds.
>
> I'll need to let each on go to completion.
>
> If these numbers are at all accurate and realistic, I'm still impressed
> with jOOQ, though COPY is rightfully back to its proper place as fastest
> way to load.
>

JOOQ is probably hooking into the same API that COPY uses (or if not, it
should be), so it isn't surprising that they perform similarly.

Cheers,

Jeff

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Jeff Janes

On Wed, Apr 2, 2014 at 3:46 PM, Rob Sargent  wrote:

>  On 04/02/2014 04:36 PM, Jeff Janes wrote:
>

>  Are you sure you actually dropped the indices?  (And the primary key?)
>
>  I get about 375,000 lines per second with no indexes, triggers,
> constraints.
>
>  perl -le 'my $x=""; foreach(1..37e6) {$x++; print join "\t",
> "a0eebc99-9c0b-4ef8-bb6d-$x",$_,$_,"A","T"}'|time psql -c 'truncate
> oldstyle; copy oldstyle from stdin;'
>
>  (More if I truncate it in the same transaction as the copy)
>
>  If you can't drop the pk constraint, can you at least generate the
> values in sort-order?
>
>  Cheers,
>
>  Jeff
>
> No I'll leave the pk in at the very least.  My example load (37M records)
> will not be the last word by any means.  That's one experiment, if you
> will.  My goal is not to see how fast I can get records in, rather to see
> what I can expect going forward.
>

You will probably want to pre-load the unindexed (including no PK) table
with dummy values until you anticipate at least index will be larger than
RAM.  Then build the indexes and PK; and then load some more values and
time that load.

If you just test on a small table, you will get answers that are
unrealistic for the long term.  If you try to build up the table from
scratch with the indexes in place, it could take 6 months to simulate 12
months of growth.

Cheers,

Jeff

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Andy Colson

On 4/3/2014 12:57 PM, Andy Colson wrote:

On 4/3/2014 12:15 PM, Rob Sargent wrote:

Is the java app cpu bound?

 > Also watch "vmstat 3" for a minute or two.  The last two numbers (wa
 > & id) (some vmstat's have a steal, ignore that) will tell you if you
 > are io bound.
 >
 > -Andy
 >
 >
During COPY, with autovaccume off (server restarted, manual vacuum to
get things going).  Immediately hit the "too frequently" gripe.
checkpoint_segments still at 12.

Not cpu bound, so waiting for IO, but surely one expects that?

vmstat 3

Did you watch top?  Is anything at 100%?

How many cpus are you running?  (well, cpus*cores I guess)

-Andy

From Rob, off list:

Two quad-core
model name  : Intel(R) Xeon(R) CPU   X5570  @ 2.93GHz

So, 8 cores, so I'd bet one of your cores, probably the PG doing the 
copy, is io bound.

>> Not cpu bound, so waiting for IO, but surely one expects that?

Not with good IO sub system.  We've seen the COPY process be cpu bound 
before.

Theses lines:

 1  1 100148 3267020 763272 3377873200 0 43125 10968 20848  5  2 90 
 3  0
 0  1 100148 3250528 763272 3379280400 0 63567 10467 20121  2  1 81 
16  0
 0  1 100148 3258584 763276 3378430800 0 17121 10553 20146  1  1 86 
12  0

Tell an interesting story.  On the first line, one process is running, 
the cpu's are 90% idle, and 3% io wait.  On the next two lines (for the 
next 6 seconds), there are no running processes, and one cpu is totally 
in io wait.  (100% / 8 cores = 12%)

Its also interesting that there are zero bi (blocks in), and lots of bo 
(blocks out).  I'm guessing the java app is reading completely from cache.

From your original post of vmstat the last 5 lines have no processes 
running ... I think that means for at least 15 seconds (5 lines * 3 
seconds), everything was blocked waiting for IO.

Have you tried the dd benchmark tests?

http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [ANNOUNCE] == PostgreSQL Weekly News - April 01 2014 ==

2014-04-03 Thread Dev Kumkar

On Thu, Apr 3, 2014 at 11:52 PM, Andy Colson  wrote:

>
> You realize its an April fools joke, right?
>
> I think I can hear David's evil laugh from here :-)
>
> -Andy
>

Ya, actually did realize the same time when it got posted as was online
that time. ;)

Regards...

Re: [GENERAL] SSD Drives

2014-04-03 Thread Ben Chobot

On Apr 3, 2014, at 12:47 PM, John R Pierce  wrote:

> On 4/3/2014 9:26 AM, Joe Van Dyk wrote:
>> Related, anyone have any thoughts on using postgresql on Amazon's EC2 SSDs?  
>> Been looking at 
>> http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance
>> 
> 
> if your data isn't very important, by all means, keep it on someone elses 
> virtualized infrastructure with no performance or reliability guarantees.

Well that’s not quite fair. AWS guarantees performance for those instances 
(http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/i2-instances.html#i2-instances-diskperf).
 They also guarantee their instances will fail sooner or later, with or without 
warning (at which point you will loose all your data unless you’ve been putting 
copies onto a different system).

Re: [GENERAL] [ANNOUNCE] == PostgreSQL Weekly News - April 01 2014 ==

2014-04-03 Thread Andy Colson


On 4/3/2014 1:14 PM, Dev Kumkar wrote:

On Tue, Apr 1, 2014 at 7:10 PM, David Fetter mailto:da...@fetter.org>> wrote:

== PostgreSQL Weekly News - April 01 2014 ==

PostgreSQL 10.0 Released.
This release includes built-in, tradeoff-free multi-master
replication, full integration with all other data stores, and a broad
choice of SQL query dialects including Cassandra, Hadoop, Oracle,
MS-SQL Server, MySQL, and mSQL.
http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf

== PostgreSQL Jobs for April ==

http://archives.postgresql.org/pgsql-jobs/2014-04/threads.php

== PostgreSQL Local ==

The Open Data Summit will be held Friday April 11, 2014 in Denver,
Colorado, USA.
http://www.opendatasummit.com

PGCon 2014, the world-wide developer conference for PostgreSQL, will
be in Ottawa, Ontario, Canada May 20-24, 2014.
http://www.pgcon.org/2014/

The sixth PGDay Cubano be held on 13 and 14 October 2014 in Habana.
https://postgresql.uci.cu/?p=380

== PostgreSQL in the News ==

Planet PostgreSQL: http://planet.postgresql.org/

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to da...@fetter.org
, German language
to p...@pgug.de , Italian language to
p...@itpug.org .  Spanish language
to p...@arpug.com.ar .

== Applied Patches ==

== Rejected Patches (for now) ==

Everyone was disappointed this week :-(


It talks about multi-master replication. Are there more details available??

Regards...



You realize its an April fools joke, right?

I think I can hear David's evil laugh from here :-)

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [ANNOUNCE] == PostgreSQL Weekly News - April 01 2014 ==

2014-04-03 Thread Dev Kumkar

On Tue, Apr 1, 2014 at 7:10 PM, David Fetter  wrote:

> == PostgreSQL Weekly News - April 01 2014 ==
>
> PostgreSQL 10.0 Released.
> This release includes built-in, tradeoff-free multi-master
> replication, full integration with all other data stores, and a broad
> choice of SQL query dialects including Cassandra, Hadoop, Oracle,
> MS-SQL Server, MySQL, and mSQL.
> http://db.cs.berkeley.edu/papers/ERL-M85-95.pdf
>
> == PostgreSQL Jobs for April ==
>
> http://archives.postgresql.org/pgsql-jobs/2014-04/threads.php
>
> == PostgreSQL Local ==
>
> The Open Data Summit will be held Friday April 11, 2014 in Denver,
> Colorado, USA.
> http://www.opendatasummit.com
>
> PGCon 2014, the world-wide developer conference for PostgreSQL, will
> be in Ottawa, Ontario, Canada May 20-24, 2014.
> http://www.pgcon.org/2014/
>
> The sixth PGDay Cubano be held on 13 and 14 October 2014 in Habana.
> https://postgresql.uci.cu/?p=380
>
> == PostgreSQL in the News ==
>
> Planet PostgreSQL: http://planet.postgresql.org/
>
> PostgreSQL Weekly News is brought to you this week by David Fetter
>
> Submit news and announcements by Sunday at 3:00pm Pacific time.
> Please send English language ones to da...@fetter.org, German language
> to p...@pgug.de, Italian language to p...@itpug.org.  Spanish language
> to p...@arpug.com.ar.
>
> == Applied Patches ==
>
> == Rejected Patches (for now) ==
>
> Everyone was disappointed this week :-(
>

It talks about multi-master replication. Are there more details available??

Regards...

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Andy Colson


On 4/3/2014 12:15 PM, Rob Sargent wrote:

Is the java app cpu bound?



 > Also watch "vmstat 3" for a minute or two.  The last two numbers (wa
 > & id) (some vmstat's have a steal, ignore that) will tell you if you
 > are io bound.
 >
 > -Andy
 >
 >
During COPY, with autovaccume off (server restarted, manual vacuum to
get things going).  Immediately hit the "too frequently" gripe.
checkpoint_segments still at 12.

Not cpu bound, so waiting for IO, but surely one expects that?

vmstat 3





Did you watch top?  Is anything at 100%?

How many cpus are you running?  (well, cpus*cores I guess)

-Andy



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] window function help

2014-04-03 Thread David Johnston

Andy Colson wrote
> On 4/3/2014 11:09 AM, David Johnston wrote:
>> Andy Colson wrote
>>> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
 I'm trying to figure out how to count the number of rows within a fixed
 range of the current row value.  My table looks like this:

 SELECT chr_pos
 FROM mutations_crosstab_9615_99
 WHERE bta = 38
 LIMIT 10

 chr_pos
 138
 140
 163
 174
 187
 187
 188
 208
 210
 213

>>>
>>> This is the answer I got, which is different than yours, but I think its
>>> right.
>>>
>>>
>>>chr_pos | count
>>> -+---
>>>138 | 2
>>>140 | 2
>>>163 | 2
>>>174 | 4
>>>187 | 3
>>>188 | 4
>>>208 | 5
>>>210 | 4
>>>212 | 4
>>>213 | 4
>>> (10 rows)
>>
>> Same concept as mine - but I'm not sure where the "212" came from and you
>> did not duplicate the "187" that was present in the original.
>>
>> The OP wanted to show the duplicate row - which yours does and mine does
>> not
>> - but depending on how many duplicates there are having to run the same
>> effective query multiple times knowing you will always get the same
>> result
>> seems inefficient.  Better to query over a distinct set of values and
>> then,
>> if needed, join that back onto the original dataset.
>>
>> David J.
>>
> 
> 
> 
>  > Same concept as mine - but I'm not sure where the "212" came from and
> you
>  > did not duplicate the "187" that was present in the original.
> 
> Ah, data entry error.  I didn't even notice.  Oops.
> 
>  > The OP wanted to show the duplicate row - which yours does and mine 
> does not
> 
> Did you post a sql statement?  I didn't seem to get it.
> 
>  > - but depending on how many duplicates there are having to run the same
> 
> Agreed.  If there are a lot of dups, we could probably speed this up.
> 
> -Andy

My original seems to be held up for some reason...

Let me try again:

WITH val (value) AS ( 
VALUES (138),(140),(163),(174),(187),(187),(188),(208),(210),(213) 
) 
SELECT value, (SELECT count(*) FROM val AS valcheck WHERE valcheck.value
BETWEEN src.value - 20 AND src.value + 20) 
FROM ( 
SELECT DISTINCT value FROM val 
) src 
ORDER BY 1; 




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/window-function-help-tp5798526p5798565.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Rob Sargent


Is the java app cpu bound?



> Also watch "vmstat 3" for a minute or two.  The last two numbers (wa
> & id) (some vmstat's have a steal, ignore that) will tell you if you
> are io bound.
>
> -Andy
>
>
During COPY, with autovaccume off (server restarted, manual vacuum to 
get things going).  Immediately hit the "too frequently" gripe. 
checkpoint_segments still at 12.


Not cpu bound, so waiting for IO, but surely one expects that?

vmstat 3
procs ---memory-- ---swap-- -io --system-- 
-cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy 
id wa st
 0  2 100148 3317248 763264 3373103200 0 33355 10555 20122  
1  1 84 14  0
 1  0 100148 3311040 763272 3373697200 0 18128 8 20601  
5  1 87  6  0
 1  0 100148 3277808 763272 3376900800 0 12536 11373 20551  
7  2 90  1  0
 1  1 100148 3267020 763272 3377873200 0 43125 10968 20848  
5  2 90  3  0
 0  1 100148 3250528 763272 3379280400 0 63567 10467 20121  
2  1 81 16  0
 0  1 100148 3258584 763276 3378430800 0 17121 10553 20146  
1  1 86 12  0
 1  2 100148 3237616 763276 3380202400 0 36327 11198 20151  
4  2 85  9  0
 1  0 100148 3223224 763276 3381655600 0 32189 10763 19900  
2  1 84 13  0
 0  0 100148 3206740 763276 3383158000 0 13233 11080 20469  
7  3 89  1  0
 0  2 100148 3204872 763276 3382958000 0 75205 10500 20912  
2  1 84 13  0
 0  2 100148 3188256 763276 3384754400 0 35448 11028 20788  
3  2 86  9  0
 0  2 100148 3190248 763276 3384488800 0 21808 11938 20848  
2  1 82 15  0
 0  1 100148 3184656 763280 3384896800 0 18592 11562 20574  
1  1 84 14  0

Re: [GENERAL] window function help

2014-04-03 Thread Andy Colson

On 4/3/2014 11:09 AM, David Johnston wrote:

Andy Colson wrote

On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:

I'm trying to figure out how to count the number of rows within a fixed
range of the current row value.  My table looks like this:

SELECT chr_pos
FROM mutations_crosstab_9615_99
WHERE bta = 38
LIMIT 10

chr_pos
138
140
163
174
187
187
188
208
210
213

This is the answer I got, which is different than yours, but I think its
right.

   chr_pos | count
-+---
   138 | 2
   140 | 2
   163 | 2
   174 | 4
   187 | 3
   188 | 4
   208 | 5
   210 | 4
   212 | 4
   213 | 4
(10 rows)

Same concept as mine - but I'm not sure where the "212" came from and you
did not duplicate the "187" that was present in the original.

The OP wanted to show the duplicate row - which yours does and mine does not
- but depending on how many duplicates there are having to run the same
effective query multiple times knowing you will always get the same result
seems inefficient.  Better to query over a distinct set of values and then,
if needed, join that back onto the original dataset.

David J.

> Same concept as mine - but I'm not sure where the "212" came from and you
> did not duplicate the "187" that was present in the original.

Ah, data entry error.  I didn't even notice.  Oops.

> The OP wanted to show the duplicate row - which yours does and mine 
does not

Did you post a sql statement?  I didn't seem to get it.

> - but depending on how many duplicates there are having to run the same

Agreed.  If there are a lot of dups, we could probably speed this up.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread John R Pierce


On 4/3/2014 9:26 AM, Joe Van Dyk wrote:
Related, anyone have any thoughts on using postgresql on Amazon's EC2 
SSDs?  Been looking at 
http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance




if your data isn't very important, by all means, keep it on someone 
elses virtualized infrastructure with no performance or reliability 
guarantees.



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Spring JDBC and the PostgreSQL JDBC driver

2014-04-03 Thread John R Pierce


On 4/3/2014 9:10 AM, Miller, Michael W wrote:


The issue I'm running into is the interaction between the Spring 
Framework JDBC functionality and the PostgreSQL JDBC driver. The 
specific issue is that I'm using SimpleJdbcCall.execute() to call the 
database and getting back a Jdbc4Array. When I then try to do 
something like Jdbc4Array.getArray() I get a SQL error that can be 
tracked down to the Jdbc driver trying to use a connection object 
which has already been closed by the Spring Framework.


One of the benefits of the Spring Framework is that I don't have to 
bother with the drudge work like creating\closing connections. The 
drawback seems to be that the JDBC driver is making assumptions like 
the connection is still open.




if that framework is opening/closing connections on every query, its 
broken from a performance perspective.   and if it has methods that 
return objects, that are then broken, THATS also broken behavior.


but yeah, this belongs on the jdbc list.


--
john r pierce  37N 122W
somewhere on the middle of the left coast

Re: [GENERAL] window function help

2014-04-03 Thread Schnabel, Robert D.

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David Johnston
Sent: Thursday, April 03, 2014 11:09 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] window function help

Andy Colson wrote
> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
>> I'm trying to figure out how to count the number of rows within a 
>> fixed range of the current row value.  My table looks like this:
>>
>> SELECT chr_pos
>> FROM mutations_crosstab_9615_99
>> WHERE bta = 38
>> LIMIT 10
>>
>> chr_pos
>> 138
>> 140
>> 163
>> 174
>> 187
>> 187
>> 188
>> 208
>> 210
>> 213
>>
> 
> This is the answer I got, which is different than yours, but I think 
> its right.
> 
> 
>   chr_pos | count
> -+---
>   138 | 2
>   140 | 2
>   163 | 2
>   174 | 4
>   187 | 3
>   188 | 4
>   208 | 5
>   210 | 4
>   212 | 4
>   213 | 4
> (10 rows)

Same concept as mine - but I'm not sure where the "212" came from and you did 
not duplicate the "187" that was present in the original.

The OP wanted to show the duplicate row - which yours does and mine does not
- but depending on how many duplicates there are having to run the same 
effective query multiple times knowing you will always get the same result 
seems inefficient.  Better to query over a distinct set of values and then, if 
needed, join that back onto the original dataset.

David J.

Thanks. I had considered this strategy initially but didn't actually try it 
because I figured it would be too slow and I knew from previous experience with 
window functions that they are much faster than queries of this nature.  My 
largest chromosome has about 6M position and this ran in 69 seconds which is 
acceptable since I'll only be doing this infrequently.  I should be able to 
handle it form here.  Thanks again for pointing me in the right direction.

Bob




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Torsten Förtsch

On 03/04/14 17:44, Leonardo M. Ramé wrote:
> Nice!, do you know if this will work on 8.4?.
> 
no way


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SSD Drives

2014-04-03 Thread Joe Van Dyk

On Wed, Apr 2, 2014 at 12:37 PM, Bret Stern <
bret_st...@machinemanagement.com> wrote:

> Any opinions/comments on using SSD drives with postgresql?
>

Related, anyone have any thoughts on using postgresql on Amazon's EC2 SSDs?
 Been looking at
http://aws.amazon.com/about-aws/whats-new/2013/12/19/announcing-the-next-generation-of-amazon-ec2-high-i/o-instance

Re: [GENERAL] Spring JDBC and the PostgreSQL JDBC driver

2014-04-03 Thread Tom Lane

"Miller, Michael W"  writes:
> I apologize if this is not the correct place to post this question but who do 
> I talk to about modifying the functionality of the JDBC driver?

pgsql-jdbc list would be the place for that.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Spring JDBC and the PostgreSQL JDBC driver

2014-04-03 Thread Miller, Michael W

I apologize if this is not the correct place to post this question but who do I 
talk to about modifying the functionality of the JDBC driver?

The issue I'm running into is the interaction between the Spring Framework JDBC 
functionality and the PostgreSQL JDBC driver. The specific issue is that I'm 
using SimpleJdbcCall.execute() to call the database and getting back a 
Jdbc4Array. When I then try to do something like Jdbc4Array.getArray() I get a 
SQL error that can be tracked down to the Jdbc driver trying to use a 
connection object which has already been closed by the Spring Framework.

One of the benefits of the Spring Framework is that I don't have to bother with 
the drudge work like creating\closing connections. The drawback seems to be 
that the JDBC driver is making assumptions like the connection is still open.

What I 'm looking for is someone who is familiar with the JDBC driver who can 
say "Yes, it's worthwhile to add this functionality to the driver." Or "No, we 
should be looking at alternative ways to solve the problem."

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Rob Sargent


On 04/03/2014 09:01 AM, Thomas Kellerer wrote:

Rob Sargent, 02.04.2014 21:37:

I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+
hours (800+ records/sec).  Then I tried COPY and killed that after
11.25 hours when I realised that I had added on non-unque index on
the name fields after the first load. By that point is was on line
28301887, so ~0.75 done which implies it would have take ~15hours to
complete.

Would the overhead of the index likely explain this decrease in
throughput?

Impatience got the better of me and I killed the second COPY.  This
time it had done 54% of the file in 6.75 hours, extrapolating to
roughly 12 hours to do the whole thing.

That matches up with the java speed. Not sure if I should be elated
with jOOQ or disappointed with COPY.


This is not what I see with COPY FROM STDIN

When I load 2million rows using a batch size of 1000 with plain JDBC that takes 
about 4 minutes

Loading the same file through Java and COPY FROM STDIN takes about 4 seconds

The table looks like this:

 Table "public.products"
   Column   |  Type  | Modifiers
---++---
  product_id| integer| not null
  ean_code  | bigint | not null
  product_name  | character varying(100) | not null
  manufacturer_name | character varying  | not null
  price | numeric(10,2)  | not null
  publish_date  | date   | not null
Indexes:
 "products_pkey" PRIMARY KEY, btree (product_id)
 "idx_publish_date" btree (publish_date, product_id)


During the load both indexes are present.

Regards
Thomas
  


Thomas thanks for these numbers.

I have to straighten out my environment, which I admit I was hoping to 
avoid. I reset checkpoint_segments to 12 and restarted my server.
I kicked of the COPY at 19:00. That generated a couple of the "too 
frequent" statements but 52 "WARNING:  pgstat wait timeout" lines during 
the next 8 hours starting at 00:37 (5 hours in) 'til finally keeling 
over at 03:04 on line 37363768.  That's the last line of the input so 
obviously I didn't flush my last println properly. I'm beyond getting 
embarrassed at this point.


Is turning auto-vacuum off a reasonable way through this?

Re: [GENERAL] window function help

2014-04-03 Thread David Johnston

Andy Colson wrote
> On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:
>> I'm trying to figure out how to count the number of rows within a fixed
>> range of the current row value.  My table looks like this:
>>
>> SELECT chr_pos
>> FROM mutations_crosstab_9615_99
>> WHERE bta = 38
>> LIMIT 10
>>
>> chr_pos
>> 138
>> 140
>> 163
>> 174
>> 187
>> 187
>> 188
>> 208
>> 210
>> 213
>>
> 
> This is the answer I got, which is different than yours, but I think its 
> right.
> 
> 
>   chr_pos | count
> -+---
>   138 | 2
>   140 | 2
>   163 | 2
>   174 | 4
>   187 | 3
>   188 | 4
>   208 | 5
>   210 | 4
>   212 | 4
>   213 | 4
> (10 rows)

Same concept as mine - but I'm not sure where the "212" came from and you
did not duplicate the "187" that was present in the original.

The OP wanted to show the duplicate row - which yours does and mine does not
- but depending on how many duplicates there are having to run the same
effective query multiple times knowing you will always get the same result
seems inefficient.  Better to query over a distinct set of values and then,
if needed, join that back onto the original dataset.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/window-function-help-tp5798526p5798542.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Andrew Sullivan

On Thu, Apr 03, 2014 at 12:44:23PM -0300, Leonardo M. Ramé wrote:
> Sounds nice, is it possible to modify my "count(*) over()" to what you
> suggest?.

I think the window_definition inside over() can contain a LIMIT, can't
it?  I didn't check just now, but I can't think any reason why not.
(ISTR when I did this in the past we didn't have window functions, so
I simulated it another way.)

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] window function help

2014-04-03 Thread Andy Colson


On 4/3/2014 10:27 AM, Schnabel, Robert D. wrote:

Hi,

I'm trying to figure out how to count the number of rows within a fixed range 
of the current row value.  My table looks like this:

SELECT chr_pos
FROM mutations_crosstab_9615_99
WHERE bta = 38
LIMIT 10

chr_pos
138
140
163
174
187
187
188
208
210
213

chr_pos is integer and represents the base pair position along a chromosome.

It looks to me like a window function would be appropriate but I cannot figure 
out the correct syntax.  What I want to do is count the number of rows within 
+/- 20 of chr_pos (the current row).  Given the above example, for chr_pos = 
138 I want the count of rows between 118 and 158.  For chr_pos 187 I want the 
count of rows between 167 and 207 etc for all rows.  The result I'm looking for 
should look like the following:

chr_pos,num_variants
138,2
140,2
163,2
174,4
187,4
187,4
188,4
208,6
210,3
213,1

Is there a way to do this with a window function? Any help would be appreciated.

Thanks
Bob






Don't think a window function is needed, how about this:

select chr_pos, (
  select count(*)
  from mutant b
  where b.chr_pos between a.chr_pos-20 and a.chr_pos+20
  )
from mutant a;

Here's what I get.  I dont remember if "between" is inclusive on both 
sides or not, but you can change it to suit your needs.


This is the answer I got, which is different than yours, but I think its 
right.



 chr_pos | count
-+---
 138 | 2
 140 | 2
 163 | 2
 174 | 4
 187 | 3
 188 | 4
 208 | 5
 210 | 4
 212 | 4
 213 | 4
(10 rows)


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Andy Colson


On 4/2/2014 7:30 PM, Rob Sargent wrote:







Well things slow down over time, and lots of "too frequent"s:

Have done 500 batches in 24219 ms
Have done 1000 batches in 52362 ms
Have done 1500 batches in 82256 ms
Have done 2000 batches in 113754 ms
Have done 2500 batches in 149637 ms
Have done 3000 batches in 211314 ms
Have done 3500 batches in 301989 ms
Have done 4000 batches in 430817 ms
Have done 4500 batches in 596043 ms
Have done 5000 batches in 804250 ms

where a batch is 500,000 lines.  This on the java side of course.






Have you watched cpu usage of the java process vs the PG process in top? 
 I had a perl program importing data that was the bottleneck because it 
was calling rand().  I'll bet generating uuid's is cpu intensive too.


Is the java app cpu bound?

Also watch "vmstat 3" for a minute or two.  The last two numbers (wa & 
id) (some vmstat's have a steal, ignore that) will tell you if you are 
io bound.


-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Leonardo M . Ramé

On 2014-04-03 17:19:56 +0200, Torsten Förtsch wrote:
> On 03/04/14 15:34, Leonardo M. Ramé wrote:
> > Hi, in one of our systems, we added a kind of pagination feature, that
> > shows N records of Total records.
> > 
> > To do this, we added a "count(*) over() as Total" field in our queries
> > in replacement of doing two queries, one for fetching the records, and
> > other for getting the count. This improved the performance, but we are't
> > happy with the results yet, by removing the count, the query takes
> > 200ms vs 2000ms with it.
> > 
> > We are thinking of removing the count, but if we do that, the system
> > will lack an interesting feature.
> > 
> > What strategy for showing the total number of records returned do you
> > recommend?.
> 
> If you need only an estimated number and if your planner statistics are
> up to date, you can use the planner.
> 
> Here is my implementation of the explain function. The COMMENT below
> shows how to use it:
> 
> CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[])
> RETURNS JSON AS $$
> DECLARE
> tmp TEXT;
> BEGIN
> EXECUTE 'EXPLAIN ('
>  || array_to_string(array_append($1[2:array_upper($1, 1)],
> 'FORMAT JSON'), ', ')
>  || ') '
>  || $1[1] INTO tmp;
> RETURN tmp::JSON;
> END;
> $$ LANGUAGE plpgsql VOLATILE;
> 
> COMMENT ON FUNCTION explain(VARIADIC TEXT[])
> IS $def$
> This function is a SQL interface to the planner. It returns the plan
> (result of EXPLAIN) of the query passed as TEXT string as the first
> parameter as JSON object.
> 
> The remaining parameters are EXPLAIN-modifiers, like ANALYZE or
> BUFFERS.
> 
> The function can be used to store plans in the database.
> 
> Another interesting usage is when you need only an estimated row
> count for a query. You can use
> 
>   SELECT count(*) ...
> 
> This gives you an exact number but is usually slow. If your planner
> statistics are up to date and the query is not too complicated, the
> planner usually gives a good estimate and is much faster.
> 
>   SELECT explain('SELECT 1 FROM tb WHERE id>8000')
>->0->'Plan'->'Plan Rows';
> $def$;
> 
> 
> Torsten

Nice!, do you know if this will work on 8.4?.

-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Leonardo M . Ramé

On 2014-04-03 10:00:18 -0400, Andrew Sullivan wrote:
> On Thu, Apr 03, 2014 at 10:34:32AM -0300, Leonardo M. Ramé wrote:
> > 
> > What strategy for showing the total number of records returned do you
> > recommend?.
> 
> The best answer for this I've ever seen is to limit the number of rows
> you're counting (at least at first) to some reasonably small number --
> say 5000.  This is usually reasonably fast for a well-indexed query,
> and your pagination can say something like "First n of at least 5000
> results", unless you have fewer than 5000 results, in which case you
> know the number (and the count returned quickly anyway).  As you're
> displaying those first 5000 results, you can work in the background
> getting a more accurate number.  This is more work for your
> application, but it provides a much better user experience (and you
> can delay getting the detailed number until the user pages through to
> the second page of results, so you don't count everything needlessly
> in case the user just uses the first page, which IME happens a lot).
> Note that even Google doesn't give you an accurate number -- they just
> say "about ten trillion" or whatever.
> 
> Hope that's useful,
> 
> A
> 

Sounds nice, is it possible to modify my "count(*) over()" to what you
suggest?.

-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] size of interval type

2014-04-03 Thread Kasahara Tatsuhito

2014/04/03 23:53 "Tom Lane" :
> It's a documentation bug; the size used to be 12 bytes, but that was quite
> a long time ago.  I see somebody fixed this in HEAD recently but failed to
> back-patch it.
I see.
Thank you for your reply and quick fix!

Best regards,

[GENERAL] window function help

2014-04-03 Thread Schnabel, Robert D.

Hi,

I'm trying to figure out how to count the number of rows within a fixed range 
of the current row value.  My table looks like this:

SELECT chr_pos
FROM mutations_crosstab_9615_99
WHERE bta = 38
LIMIT 10

chr_pos
138
140
163
174
187
187
188
208
210
213

chr_pos is integer and represents the base pair position along a chromosome.

It looks to me like a window function would be appropriate but I cannot figure 
out the correct syntax.  What I want to do is count the number of rows within 
+/- 20 of chr_pos (the current row).  Given the above example, for chr_pos = 
138 I want the count of rows between 118 and 158.  For chr_pos 187 I want the 
count of rows between 167 and 207 etc for all rows.  The result I'm looking for 
should look like the following:

chr_pos,num_variants
138,2
140,2
163,2
174,4
187,4
187,4
188,4
208,6
210,3
213,1

Is there a way to do this with a window function? Any help would be appreciated.

Thanks
Bob




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Torsten Förtsch

On 03/04/14 15:34, Leonardo M. Ramé wrote:
> Hi, in one of our systems, we added a kind of pagination feature, that
> shows N records of Total records.
> 
> To do this, we added a "count(*) over() as Total" field in our queries
> in replacement of doing two queries, one for fetching the records, and
> other for getting the count. This improved the performance, but we are't
> happy with the results yet, by removing the count, the query takes
> 200ms vs 2000ms with it.
> 
> We are thinking of removing the count, but if we do that, the system
> will lack an interesting feature.
> 
> What strategy for showing the total number of records returned do you
> recommend?.

If you need only an estimated number and if your planner statistics are
up to date, you can use the planner.

Here is my implementation of the explain function. The COMMENT below
shows how to use it:

CREATE OR REPLACE FUNCTION explain(VARIADIC TEXT[])
RETURNS JSON AS $$
DECLARE
tmp TEXT;
BEGIN
EXECUTE 'EXPLAIN ('
 || array_to_string(array_append($1[2:array_upper($1, 1)],
'FORMAT JSON'), ', ')
 || ') '
 || $1[1] INTO tmp;
RETURN tmp::JSON;
END;
$$ LANGUAGE plpgsql VOLATILE;

COMMENT ON FUNCTION explain(VARIADIC TEXT[])
IS $def$
This function is a SQL interface to the planner. It returns the plan
(result of EXPLAIN) of the query passed as TEXT string as the first
parameter as JSON object.

The remaining parameters are EXPLAIN-modifiers, like ANALYZE or
BUFFERS.

The function can be used to store plans in the database.

Another interesting usage is when you need only an estimated row
count for a query. You can use

  SELECT count(*) ...

This gives you an exact number but is usually slow. If your planner
statistics are up to date and the query is not too complicated, the
planner usually gives a good estimate and is much faster.

  SELECT explain('SELECT 1 FROM tb WHERE id>8000')
   ->0->'Plan'->'Plan Rows';
$def$;

Torsten

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Any way to insert rows with ID used in another column

2014-04-03 Thread Alban Hertroys

On 3 April 2014 16:41, Ben Hoyt  wrote:
> Hi folks,

> I tried using currval() to see if that'd work, but it gave an error, I guess
> because I was using it multiple times per session.

currval() requires that nextval() was called before it (either
automatically or explicitly) in the same transaction.

Usually what you want is achieved using nextval(). You request n new
ID's using nextval(), which you can then use to both name your n image
files and for the ID with which you will be inserting them into your
table.

Unfortunately there doesn't appear to be a variant of nextval() that
you pass a number which then subsequently returns a set of values,
that would be ideal for such usage, but that can be worked around by
calling nextval() in conjunction with generate_series().

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Any way to insert rows with ID used in another column

2014-04-03 Thread Francisco Olarte

( Forgot to hit reply all, so probably someone will get this twice, sorry ).

Hi:

On Thu, Apr 3, 2014 at 4:41 PM, Ben Hoyt  wrote:
.
> Is there some way to do something like this:
> INSERT INTO images (filename) VALUES
> ('foo' || image_id_about_to_used_for_this_row::text || '.jpg')
> ('bar' || image_id_about_to_used_for_this_row::text || '.jpg')
> I tried using currval() to see if that'd work, but it gave an error, I guess
> because I was using it multiple times per session.

You normally need to call nextval before currval. Anyway, subqueries
are your friend:

psql (9.3.2)
Type "help" for help.

postgres=# create table files ( id serial primary key, file varchar);
CREATE TABLE
postgres=# \d+ files
  Table "public.files"
 Column |   Type| Modifiers
  | Storage  | Stats target | Description
+---++--+--+-
 id | integer   | not null default
nextval('files_id_seq'::regclass) | plain|  |
 file   | character varying |
  | extended |  |
Indexes:
"files_pkey" PRIMARY KEY, btree (id)
Has OIDs: no

postgres=# select nextval('files_id_seq'::regclass) as id  from
generate_series(1,3);
 id

  1
  2
  3
(3 rows)
postgres=# select newid as id, 'image_'||cast(newid as text)||'.jpg'
as file from ( select nextval('files_id_seq'::regclass) as newid  from
generate_series(1,3)) as newids;
 id |file
+-
  4 | image_4.jpg
  5 | image_5.jpg
  6 | image_6.jpg
(3 rows)

postgres=# insert into files (id, file) select newid as id,
'image_'||cast(newid as text)||'.jpg' as file from ( select
nextval('files_id_seq'::regclass) as newid  from generate_series(1,3))
as newids returning *;
 id |file
+-
  7 | image_7.jpg
  8 | image_8.jpg
  9 | image_9.jpg
(3 rows)

INSERT 0 3
postgres=# select * from files;
 id |file
+-
  7 | image_7.jpg
  8 | image_8.jpg
  9 | image_9.jpg
(3 rows)



Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Any way to insert rows with ID used in another column

2014-04-03 Thread Adrian Klaver


On 04/03/2014 07:41 AM, Ben Hoyt wrote:

Hi folks,

We have a table images in our db with id (serial primary key) and
filename columns, where the filename is a unique text column that looks
something like "pool-1234.jpg".

The catch is that the "1234" in the filename is the image ID. We want
the filename to include the image ID because it's a nice way of making
it unique and gives the benefit of being able to easily map from the
filename back to the ID for debugging and the like.

Currently I insert new image rows in multiple steps:

1) begin transaction
2) insert a whole bunch of image rows in one multiple-row INSERT, using
a temporary random filename
3) use the RETURNING clause on the above insert to get a mapping between
the database IDs and filenames just inserted
4) loop through all images just inserted, and for each image, execute
UPDATE to set the filename to the real filename which includes the new
image ID
5) commit

This works, but it's pretty cumbersome, and requires N UPDATE statements
which is also slow.

Is there some way to do something like this:

INSERT INTO images (filename) VALUES
('foo' || image_id_about_to_used_for_this_row::text || '.jpg')
('bar' || image_id_about_to_used_for_this_row::text || '.jpg')

I tried using currval() to see if that'd work, but it gave an error, I
guess because I was using it multiple times per session.


Write a BEFORE INSERT trigger function?



Thanks,
Ben




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Thomas Kellerer

Rob Sargent, 02.04.2014 21:37:
> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+
> hours (800+ records/sec).  Then I tried COPY and killed that after
> 11.25 hours when I realised that I had added on non-unque index on
> the name fields after the first load. By that point is was on line
> 28301887, so ~0.75 done which implies it would have take ~15hours to
> complete.
> 
> Would the overhead of the index likely explain this decrease in
> throughput?
> 
> Impatience got the better of me and I killed the second COPY.  This
> time it had done 54% of the file in 6.75 hours, extrapolating to
> roughly 12 hours to do the whole thing.
> 
> That matches up with the java speed. Not sure if I should be elated
> with jOOQ or disappointed with COPY.
> 

This is not what I see with COPY FROM STDIN

When I load 2million rows using a batch size of 1000 with plain JDBC that takes 
about 4 minutes

Loading the same file through Java and COPY FROM STDIN takes about 4 seconds

The table looks like this:

Table "public.products"
  Column   |  Type  | Modifiers
---++---
 product_id| integer| not null
 ean_code  | bigint | not null
 product_name  | character varying(100) | not null
 manufacturer_name | character varying  | not null
 price | numeric(10,2)  | not null
 publish_date  | date   | not null
Indexes:
"products_pkey" PRIMARY KEY, btree (product_id)
"idx_publish_date" btree (publish_date, product_id)


During the load both indexes are present.

Regards
Thomas
 




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY v. java performance comparison

2014-04-03 Thread Rob Sargent


On 04/02/2014 08:40 PM, Adrian Klaver wrote:

On 04/02/2014 05:30 PM, Rob Sargent wrote:

On 04/02/2014 06:06 PM, Adrian Klaver wrote:

On 04/02/2014 02:27 PM, Rob Sargent wrote:

On 04/02/2014 03:11 PM, Adrian Klaver wrote:

On 04/02/2014 02:04 PM, Rob Sargent wrote:

On 04/02/2014 02:36 PM, Adrian Klaver wrote:

On 04/02/2014 01:14 PM, Rob Sargent wrote:

On 04/02/2014 01:56 PM, Steve Atkins wrote:




Have you tried moving the input file to the same disk as the server,
to factor out the controller?

I labour under the delusion that it is through the controller one 
reads

and writes and that there might be some slight advantage to not doing
both against one drive if avoidable. Wrong again?


Well there is one way to find out:)

Might try with something less then the whole file to get come up an
approximate row/sec rate.






Well things slow down over time, and lots of "too frequent"s:

Have done 500 batches in 24219 ms
Have done 1000 batches in 52362 ms
Have done 1500 batches in 82256 ms
Have done 2000 batches in 113754 ms
Have done 2500 batches in 149637 ms
Have done 3000 batches in 211314 ms
Have done 3500 batches in 301989 ms
Have done 4000 batches in 430817 ms
Have done 4500 batches in 596043 ms
Have done 5000 batches in 804250 ms

where a batch is 500,000 lines.  This on the java side of course.


This is drive to drive or on single drive?










Same as first go round, drive to drive.

Re: [GENERAL] size of interval type

2014-04-03 Thread Tom Lane

Kasahara Tatsuhito  writes:
> The document says the size of interval type is 12 bytes.
> http://www.postgresql.org/docs/9.3/static/datatype-datetime.html
> But ISTM the size is actually 16 bytes.
> # I checked src/include/datatype/timestamp.h and pg_type system view.

> Is it a document bug? Or am I missing something?

It's a documentation bug; the size used to be 12 bytes, but that was quite
a long time ago.  I see somebody fixed this in HEAD recently but failed to
back-patch it.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Any way to insert rows with ID used in another column

2014-04-03 Thread Ben Hoyt

Hi folks,

We have a table images in our db with id (serial primary key) and filename
columns, where the filename is a unique text column that looks something
like "pool-1234.jpg".

The catch is that the "1234" in the filename is the image ID. We want the
filename to include the image ID because it's a nice way of making it
unique and gives the benefit of being able to easily map from the filename
back to the ID for debugging and the like.

Currently I insert new image rows in multiple steps:

1) begin transaction
2) insert a whole bunch of image rows in one multiple-row INSERT, using a
temporary random filename
3) use the RETURNING clause on the above insert to get a mapping between
the database IDs and filenames just inserted
4) loop through all images just inserted, and for each image, execute
UPDATE to set the filename to the real filename which includes the new
image ID
5) commit

This works, but it's pretty cumbersome, and requires N UPDATE statements
which is also slow.

Is there some way to do something like this:

INSERT INTO images (filename) VALUES
('foo' || image_id_about_to_used_for_this_row::text || '.jpg')
('bar' || image_id_about_to_used_for_this_row::text || '.jpg')

I tried using currval() to see if that'd work, but it gave an error, I
guess because I was using it multiple times per session.

Thanks,
Ben

[GENERAL] size of interval type

2014-04-03 Thread Kasahara Tatsuhito

Hi,

The document says the size of interval type is 12 bytes.

http://www.postgresql.org/docs/9.3/static/datatype-datetime.html

But ISTM the size is actually 16 bytes.
# I checked src/include/datatype/timestamp.h and pg_type system view.

Is it a document bug? Or am I missing something?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Dorian Hoxha

Cache the total ?


On Thu, Apr 3, 2014 at 3:34 PM, Leonardo M. Ramé  wrote:

> Hi, in one of our systems, we added a kind of pagination feature, that
> shows N records of Total records.
>
> To do this, we added a "count(*) over() as Total" field in our queries
> in replacement of doing two queries, one for fetching the records, and
> other for getting the count. This improved the performance, but we are't
> happy with the results yet, by removing the count, the query takes
> 200ms vs 2000ms with it.
>
> We are thinking of removing the count, but if we do that, the system
> will lack an interesting feature.
>
> What strategy for showing the total number of records returned do you
> recommend?.
>
> Regards,
> --
> Leonardo M. Ramé
> Medical IT - Griensu S.A.
> Av. Colón 636 - Piso 8 Of. A
> X5000EPT -- Córdoba
> Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
> Cel.: +54 9 (011) 40871877
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Alban Hertroys

On 3 April 2014 15:34, Leonardo M. Ramé  wrote:
> Hi, in one of our systems, we added a kind of pagination feature, that
> shows N records of Total records.
>
> To do this, we added a "count(*) over() as Total" field in our queries
> in replacement of doing two queries, one for fetching the records, and
> other for getting the count. This improved the performance, but we are't
> happy with the results yet, by removing the count, the query takes
> 200ms vs 2000ms with it.
>
> We are thinking of removing the count, but if we do that, the system
> will lack an interesting feature.
>
> What strategy for showing the total number of records returned do you
> recommend?.

Assuming your results are unique (and what would be the point of
showing duplicate results?) in a specific order, it should be possible
to (uniquely) identify the last record shown on a previous page and
display n results from there on.

To add a result counter for displaying purposes to that, since you
need to remember the last displayed result anyway, you might as well
store the relative record number of that result with it and count from
there on.

This works well if you only have a next/previous results link, but not
so well when people can pick arbitrary page numbers. It can work
relative to the current page regardless of which page the user chose
to navigate to next, but you'll have to go through all the results in
between that page and the current page... That should still be faster
than always counting from the start though (and you can be smart about
from which end you start counting).

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Pagination count strategies

2014-04-03 Thread Andrew Sullivan

On Thu, Apr 03, 2014 at 10:34:32AM -0300, Leonardo M. Ramé wrote:
> 
> What strategy for showing the total number of records returned do you
> recommend?.

The best answer for this I've ever seen is to limit the number of rows
you're counting (at least at first) to some reasonably small number --
say 5000.  This is usually reasonably fast for a well-indexed query,
and your pagination can say something like "First n of at least 5000
results", unless you have fewer than 5000 results, in which case you
know the number (and the count returned quickly anyway).  As you're
displaying those first 5000 results, you can work in the background
getting a more accurate number.  This is more work for your
application, but it provides a much better user experience (and you
can delay getting the detailed number until the user pages through to
the second page of results, so you don't count everything needlessly
in case the user just uses the first page, which IME happens a lot).
Note that even Google doesn't give you an accurate number -- they just
say "about ten trillion" or whatever.

Hope that's useful,

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Pagination count strategies

2014-04-03 Thread Leonardo M . Ramé

Hi, in one of our systems, we added a kind of pagination feature, that
shows N records of Total records.

To do this, we added a "count(*) over() as Total" field in our queries
in replacement of doing two queries, one for fetching the records, and
other for getting the count. This improved the performance, but we are't
happy with the results yet, by removing the count, the query takes
200ms vs 2000ms with it.

We are thinking of removing the count, but if we do that, the system
will lack an interesting feature.

What strategy for showing the total number of records returned do you
recommend?.

Regards,
-- 
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Unattended Installation

2014-04-03 Thread peter . schumm


Many thanks to Igor

The hint with the "bitrock" installer log helped me to find the problem.
It was an incorrect path setting (double backslash). The UI-Installer  
might correct these settings.


Regards, Peter.

Quoting Igor Neyman :


-Original Message-
From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
ow...@postgresql.org] On Behalf Of shetty65
Sent: Tuesday, April 01, 2014 3:13 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Unattended Installation

Hello

I am using Postgres 9.3.3.1 on Windows (32-Bit Windows 7 Professional).
I use the installer executable postgresql-9.3.3-1-windows.exe with the
option "--optionfile "

The option file has the following content (the ${..} are replaced  
with correct

values before execution):
 #mode=unattended
 datadir=${program.base}data
 prefix=${program.base}
 serverport=${postgres.port}
 superaccount=postgres
 superpassword=${postgres.passwd}
 unattendedmodeui=minimalWithDialogs
 servicename=${postgres.service.name}

1) If the "mode" property is enabled to mode=unattended, the installation
works silent but the whole 'data' folder is missing after  
installation (the first

error message will occur when the server-startup failes after installation).

2) If the "mode" property is disabled, the Installer interface appears.
Everything is correctly set, I need only to press the  button. The
Installation completes correctly.

Why does my unattended installation as described in 1) fail, but 2)  
succeeds?


Thank you in advance for the help
Peter.



Here are options that I use for unattended install (and it creates  
cluster with Postgres, template0, and template1 databases, which  
means data directory and all its subdirectories are created too):


SET INSTALLOPTIONS=--unattendedmodeui none
SET INSTALLOPTIONS=%INSTALLOPTIONS% --mode unattended

Also, you are missing settings for servicesaccount, servicepassword,  
and locale, e.g.:

SET INSTALLOPTIONS=%INSTALLOPTIONS% --locale C
SET INSTALLOPTIONS=%INSTALLOPTIONS% --serviceaccount postgres
SET INSTALLOPTIONS=%INSTALLOPTIONS% --servicepassword pg_password123

In general, to diagnose (silent or not) installation problems find  
"bitrock" installer log somewhere under DocumentsandSettings for OS  
user that runs installation.


Regards,
Igor Neyman



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

67 matches

Mail list logo