Re: [PERFORM] Unexpected expensive index scan

2016-09-27 Thread Mike Sofen
From: Jake NielsenSent: Tuesday, September 27, 2016 5:22 PM


the query

SELECT * FROM SyncerEvent WHERE ID > 12468 AND propogatorId NOT IN 
('"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"') AND conflicted != 1 AND userId = 
'57dc984f1c87461c0967e228' ORDER BY ID LIMIT 4000;^

 

On Tue, Sep 27, 2016 at 5:02 PM, Jake Nielsen <jake.k.niel...@gmail.com 
<mailto:jake.k.niel...@gmail.com> > wrote:

I've got a query that takes a surprisingly long time to run, and I'm having a 
really rough time trying to figure it out.

 

Before I get started, here are the specifics of the situation:

 

Here is the table that I'm working with (apologies for spammy indices, I've 
been throwing shit at the wall)

Table "public.syncerevent"

Column|  Type   |Modifiers  
   

--+-+--

 id   | bigint  | not null default 
nextval('syncerevent_id_seq'::regclass)

 userid   | text| 

 event| text| 

 eventid  | text| 

 originatorid | text| 

 propogatorid | text| 

 kwargs   | text| 

 conflicted   | integer | 

Indexes:

"syncerevent_pkey" PRIMARY KEY, btree (id)

"syncereventidindex" UNIQUE, btree (eventid)

"anothersyncereventidindex" btree (userid)

"anothersyncereventidindexwithascending" btree (userid, id)

"asdfasdgasdf" btree (userid, id DESC)

"syncereventuseridhashindex" hash (userid)

 

To provide some context, as per the wiki, 

there are 3,290,600 rows in this table. 

It gets added to frequently, but never deleted from. 

The "kwargs" column often contains mid-size JSON strings (roughly 30K 
characters on average)

As of right now, the table has 53 users in it. About 20% of those have a 
negligible number of events, but the rest of the users have a fairly even 
smattering.

 

EXPLAIN (ANALYZE, BUFFERS) says:

  QUERY 
PLAN  

--

 Limit  (cost=0.43..1218.57 rows=4000 width=615) (actual 
time=3352.390..3403.572 rows=4000 loops=1)  Buffers: shared hit=120244 
read=160198

   ->  Index Scan using syncerevent_pkey on syncerevent  (cost=0.43..388147.29 
rows=1274560 width=615) (actual time=3352.386..3383.100 rows=4000 loops=1)

 Index Cond: (id > 12468)

 Filter: ((propogatorid <> 
'"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"'::text) AND (conflicted <> 1) AND 
(userid = '57dc984f1c87461c0967e228'::text))

 Rows Removed by Filter: 1685801

 Buffers: shared hit=120244 read=160198

 Planning time: 0.833 ms

 Execution time: 3407.633 ms

(9 rows)

If it matters/interests you, here is my underlying confusion:

>From some internet sleuthing, I've decided that having a table per user (which 
>would totally make this problem a non-issue) isn't a great idea. Because there 
>is a file per table, having a table per user would not scale. My next thought 
>was partial indexes (which would also totally help), but since there is also a 
>table per index, this really doesn't side-step the problem. My rough mental 
>model says: If there exists a way that a table-per-user scheme would make this 
>more efficient, then there should also exist an index that could achieve the 
>same effect (or close enough to not matter). I would think that "userid = 
>'57dc984f1c87461c0967e228'" could utilize at least one of the two indexes on 
>the userId column, but clearly I'm not understanding something.

Any help in making this query more efficient would be greatly appreciated, and 
any conceptual insights would be extra awesome.

Thanks for reading.

-Jake

--

 

This stands out:  WHERE ID > 12468 AND propogatorId NOT IN 
('"d8130ab9!-66d0!-4f13!-acec!-a9556362f0ad"')

As does this from the analyze:  Rows Removed by Filter: 1685801

 

The propogaterid is practically the only column NOT indexed and it’s used in a 
“not in”.  It looks like it’s having to do a table scan for all the rows above 
the id cutoff to see if any meet the filter requirement.  “not in” can be very 
expensive.  An index might help on this column.  Have you tried that?

 

Your rowcounts aren’t high enough to require partitioning or any other changes 
to your table that I can see right now.

 

Mike Sofen  (Synthetic Genomics)

 



Re: [PERFORM] Millions of tables

2016-09-27 Thread Mike Sofen
From: Mike Sofen   Sent: Tuesday, September 27, 2016 8:10 AM



From: Greg Spiegelberg   Sent: Monday, September 26, 2016 7:25 AM
I've gotten more responses than anticipated and have answered some questions 
and gotten some insight but my challenge again is what should I capture along 
the way to prove or disprove this storage pattern?  Alternatives to the storage 
pattern aside, I need ideas to test rig, capture metrics and suggestions to 
tune it.

 

In the next 24 hours, I will be sending ~1 trillion records to the test 
database.  Because of time to set up, I'd rather have things set up properly 
the first go.

 

Thanks!

-Greg 

-

Greg, I ran another quick test on a wider table than you’ve described, but this 
time with 80 million rows, with core counts, ram and ssd storage similar to 
what you’d have on that AWS EC2 instance.  This table had 7 columns (3 
integers, 3 text, 1 timestamptz) with an average width of 157 chars, one btree 
index on the pk int column.  Using explain analyze, I picked one id value out 
of the 80m and ran a select * where id = x.  It did an index scan, had a 
planning time of 0.077ms, and an execution time of 0.254 seconds.  I ran the 
query for a variety of widely spaced values (so the data was uncached) and the 
timing never changed. This has been mirroring my general experience with PG – 
very fast reads on indexed queries. 

 

Summary:  I think your buckets can be WAY bigger than you are envisioning for 
the simple table design you’ve described.  I’m betting you can easily do 500 
million rows per bucket before approaching anything close to the 30ms max query 
time.

 

Mike Sofen (Synthetic Genomics)

 

Totally typo’d the execution time:  it was 0.254 MILLISECONDS, not SECONDS.  
Thus my comment about going up 10x in bucket size instead of appearing to be 
right at the limit.  Sorry!

 

Mike



Re: [PERFORM] Millions of tables

2016-09-27 Thread Mike Sofen
 

 

From: Greg Spiegelberg   Sent: Monday, September 26, 2016 7:25 AM
I've gotten more responses than anticipated and have answered some questions 
and gotten some insight but my challenge again is what should I capture along 
the way to prove or disprove this storage pattern?  Alternatives to the storage 
pattern aside, I need ideas to test rig, capture metrics and suggestions to 
tune it.

 

In the next 24 hours, I will be sending ~1 trillion records to the test 
database.  Because of time to set up, I'd rather have things set up properly 
the first go.

 

Thanks!

-Greg 

-

Greg, I ran another quick test on a wider table than you’ve described, but this 
time with 80 million rows, with core counts, ram and ssd storage similar to 
what you’d have on that AWS EC2 instance.  This table had 7 columns (3 
integers, 3 text, 1 timestamptz) with an average width of 157 chars, one btree 
index on the pk int column.  Using explain analyze, I picked one id value out 
of the 80m and ran a select * where id = x.  It did an index scan, had a 
planning time of 0.077ms, and an execution time of 0.254 seconds.  I ran the 
query for a variety of widely spaced values (so the data was uncached) and the 
timing never changed. This has been mirroring my general experience with PG – 
very fast reads on indexed queries. 

 

Summary:  I think your buckets can be WAY bigger than you are envisioning for 
the simple table design you’ve described.  I’m betting you can easily do 500 
million rows per bucket before approaching anything close to the 30ms max query 
time.

 

Mike Sofen (Synthetic Genomics)



Re: [PERFORM] Millions of tables

2016-09-26 Thread Mike Sofen
From: Rick Otten   Sent: Monday, September 26, 2016 3:24 AM
Are the tables constantly being written to, or is this a mostly read scenario? 

 

With regards to consistent query performance, I think you need to get out of 
AWS.  That environment is terrible if you are going for consistency unless you 
buy dedicated hardware, and then you are paying so much money it is ridiculous.

 

Also I think having 10M rows in a table is not a problem for the query times 
you are referring to.  So instead of millions of tables, unless I'm doing my 
math wrong, you probably only need thousands of tables.

--

Excellent thoughts:  the read/write behavior will/should drive a lot of the 
design;  AWS does not guarantee consistency or latency;  and 10m rows is 
nothing to PG.

 

Re AWS:  we’re on it, at least for now.  In my profiling of our performance 
there, I consistently get low latencies…I just know that there will be random 
higher latencies, but the statistical average will be low.  I just ran a quick 
test against a modest sized table on a modest sized EC2 instance (m4.xlarge – 4 
core/16gb ram, 3 tb ssd):  the table has 15m rows but is huge (it represents 
nearly 500m rows compressed in jsonb documents), with 5 indexed key columns and 
a total of 12 columns.  I queried for a single, non-PK, indexed value using 
“select *” (so it included the json) and it took 22ms, without the json it took 
11ms.  Especially with the db/memory-optimized EC2 instances now available 
(with guaranteed IOPS), performance against even 100m row tables should still 
stay within your requirements.

 

So Rick’s point about not needing millions of tables is right on.  If there’s a 
way to create table “clumps”, at least you’ll have a more modest table count.

 

Mike Sofen (Synthetic Genomics)



Re: [PERFORM] Millions of tables

2016-09-25 Thread Mike Sofen
From: Greg Spiegelberg  Sent: Sunday, September 25, 2016 7:50 PM
… Over the weekend, I created 8M tables with 16M indexes on those tables. 

… A system or database crash could take potentially hours to days to recover.  
There are likely other issues ahead.

 

You may wonder, "why is Greg attempting such a thing?"  I looked at DynamoDB, 
BigTable, and Cassandra.  I like Greenplum but, let's face it, it's antiquated 
and don't get me started on "Hadoop".  Problem with the "one big table" 
solution is I anticipate 1,200 trillion records.  Random access is expected and 
the customer expects <30ms reads for a single record fetch.

 

I'm not looking for alternatives yet but input to my test.

_

 

Holy guacamole, batman!  Ok, here’s my take:  you’ve traded the 
risks/limitations of the known for the risks of the unknown.  The unknown 
being, in the numerous places where postgres historical development may have 
cut corners, you may be the first to exercise those corners and flame out like 
the recent SpaceX rocket.

 

Put it another way – you’re going to bet your career (perhaps) or a client’s 
future on an architectural model that just doesn’t seem feasible.  I think 
you’ve got a remarkable design problem to solve, and am glad you’ve chosen to 
share that problem with us.

 

And I do think it will boil down to this: it’s not that you CAN do it on 
Postgres (which you clearly can), but once in production, assuming things are 
actually stable, how will you handle the data management aspects like 
inevitable breakage, data integrity issues, backups, restores, user contention 
for resources, fault tolerance and disaster recovery.  Just listing the tables 
will take forever.  Add a column?  Never.  I do think the amount of testing 
you’ll need to do prove that every normal data management function still works 
at that table count…that in itself is going to be not a lot of fun.

 

This one hurts my head.  Ironically, the most logical destination for this type 
of data may actually be Hadoop – auto-scale, auto-shard, fault tolerant, 
etc…and I’m not a Hadoopie.

 

I am looking forward to hearing how this all plays out, it will be quite an 
adventure!  All the best,

 

Mike Sofen (Synthetic Genomics…on Postgres 9.5x)



Re: [PERFORM] Storing large documents - one table or partition by doc?

2016-09-23 Thread Mike Sofen
From: Dev Nop  Sent: Friday, September 23, 2016 3:12 AM
I’m storing thousands of independent documents each containing around 20k rows. 
The larger the document, the more likely it is to be active with inserts and 
updates (1000s/day). The most common read query is to get all the rows for a 
single document (100s/day). It will be supporting real-time collaboration but 
with strong-consistency for a simple schema so not well-suited to dedicated 
"document databases" that assume schema-less & eventual consistency. I won’t 
have great hardware/budget so need to squeeze the most out of the least.

 

My question is whether to put all documents into a single huge table or 
partition by document?

 

The documents are independent so its purely a performance question. Its too 
many tables for postgresql partitioning support but I don’t get any benefit 
from a master table and constraints. Handling partitioning in application logic 
is effectively zero cost.

 

I know that 1000s of tables is regarded as an anti-pattern but I can only see 
the performance and maintenance benefits of one table per independent document 
e.g. fast per-table vacuum, incremental schema updates, easy future sharding. A 
monster table will require additional key columns and indexes that don’t have 
any value beyond allowing the documents to sit in the same table.

 

The only downsides seem to be the system level per-table overhead but I only 
see that as a problem if I have a very long tail of tiny documents. I'd rather 
solve that problem if it occurs than manage an all-eggs-in-one-basket monster 
table.


Is there anything significant I am missing in my reasoning? Is it mostly a 
“relational purist” perspective that argues against multiple tables? Should I 
be looking at alternative tech for this problem?

 

The one factor I haven't fully resolved is how much a caching layer in front of 
the database changes things.

 

Thanks for your help.

-

This is, to me, a very standard, almost classic, relational pattern, and one 
that a relational engine handles extremely well, especially the consistency and 
locking needed to support lots of updates.  Inserts are irrelevant unless the 
parent record must be locked to do so…that would be a bad design.

 

Imagine a normal parent-child table pair, 1:M, with the 20k rows per parent 
document in the child table.  Unless there’s something very bizarre about the 
access patterns against that child table, those 20k rows per document would not 
normally all be in play for every user on every access throughout that access 
(it’s too much data to show on a web page, for instance).  Even so, at “100s” 
of large queries per day, it’s a trivial load unless each child row contains a 
large json blob…which doesn’t jive with your table description.

 

So with proper indexing, I can’t see where there will be a performance issue.   
Worst case, you create a few partitions based on some category, but the row 
counts you’re describing don’t yet warrant it.  I’m running a few hundred 
million rows in a new “child” table on a dev server (4 cores/16gb ram) with 
large json documents in each row and it’s still web page performant on normal 
queries, using a paging model (say 20 full rows per web page request).  The 
critical pieces, hardware-wise, are memory (buy as much as you can afford) and 
using SSDs (required, IMO).  It’s much harder to create measurable loads on the 
CPUs.  Amazon has memory optimized EC2 instances that support that pattern 
(with SSD storage).

 

Are there other issues/requirements that are creating other performance 
concerns that aren’t obvious in your initial post?

 

Mike Sofen (Synthetic Genomics)



Re: [PERFORM] Postgres bulk insert/ETL performance on high speed servers - test results

2016-09-07 Thread Mike Sofen
From: Jim Nasby [mailto:jim.na...@bluetreble.com]  Sent: Wednesday, September 
07, 2016 12:22 PM
On 9/4/16 7:34 AM, Mike Sofen wrote:

> You raise a good point.  However, other disk activities involving 

> large data (like backup/restore and pure large table copying), on both 

> platforms, do not seem to support that notion.  I did have both our IT 

> department and Cisco turn on instrumentation for my last test, 

> capturing all aspects of both tests on both platforms, and I’m hoping 

> to see the results early next week and will reply again.

 

Something important to remember about Postgres is that it makes virtually no 
efforts to optimize IO; it throws the entire problem in the OSes lap. So 
differences in OS config or in IO *latency* can have a massive impact on 
performance. Because of the sensitivity to IO latency, you can also end up with 
a workload that only reports say 60% IO utilization but is essentially IO bound 
(would be 100% IO utilization if enough read-ahead was happening).

--

Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in 
Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble!  
<http://BlueTreble.com> http://BlueTreble.com

855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

=

Hi Jim,

 

Thanks for that info regarding the sensitivity to IO latency.  As it turns out, 
our network guys have determined while the AWS Direct Connect pipe is running 
at “normal” speed, the end to latency is quite high and are working with AWS 
support to see if there are any optimizations to be done.  To me, the 
performance differences have to be tied to networking, especially since it does 
appear that for these EC2 instances, all data – both SSD and network – is 
consuming bandwidth in their network “connection”, possibly adding to PG IO 
pressure.  I’ll keep your note in mind as we evaluate next steps.

 

Mike

 

 



Re: [PERFORM] Postgres bulk insert/ETL performance on high speed servers - test results

2016-09-04 Thread Mike Sofen
From: Claudio Freire   Sent: Friday, September 02, 2016 1:27 PM
On Thu, Sep 1, 2016 at 11:30 PM, Mike Sofen < <mailto:mso...@runbox.com> 
mso...@runbox.com> wrote:

> It's obvious the size of the batch exceeded the AWS server memory, 

> resulting in a profoundly slower processing time.  This was a true, 

> apples to apples comparison between Pass 1 and Pass 2: average row 

> lengths were within 7% of each other (1121 vs 1203) using identical 

> table structures and processing code, the only difference was the target 
> server.

> 

> I'm happy to answer questions about these results.

 

Are you sure it's a memory thing and not an EBS bandwidth thing?

 

EBS has significantly less bandwidth than direct-attached flash.

 

You raise a good point.  However, other disk activities involving large data 
(like backup/restore and pure large table copying), on both platforms, do not 
seem to support that notion.  I did have both our IT department and Cisco turn 
on instrumentation for my last test, capturing all aspects of both tests on 
both platforms, and I’m hoping to see the results early next week and will 
reply again.

 

Mike



[PERFORM] Postgres bulk insert/ETL performance on high speed servers - test results

2016-09-01 Thread Mike Sofen
High level summary:  server ram has a significant impact on batch processing
performance (no surprise), and AWS processing can largely compete with local
servers IF the AWS network connection is optimized.

With the recent threads about insert performance (and performance in
general), I thought I'd share some numbers that could assist some other
Postgres users in planning their environments.

I am currently running a Postgres dev server in AWS and we are evaluating a
high powered physical server for our data center, for which we received a
demo unit from Cisco for testing.  Great opportunity to test a range of
pinch points that could restrict scalability and performance, comparing how
2 very different servers behave under a high bulk loading/transform
scenario.  The scenario is that I'm migrating mysql data ("v1", eventually
20tb of genomics data) over to a new Postgres server ("v2").

[As a side note, I'm attempting to get a third server spun up, being a high
powered AWS EC2 instance (an r3.4xlarge with 122gb ram, 16 cores, 6tb SSD
EBS Optimized with 16k guaranteed IOPS).  When I finish the testing against
the 3rd server, I'll report again.]

Landscape:
Source mysql server:  Dell physical 24 cores at 2.8ghz, 32gb ram, 1gbe
networking, Percona/mysql v5.5.3 on linux in our data center
AWS:  EC2 m4.xlarge instance with 16 gb ram, 4 cores at 2.4ghz, 3tb SSD.  PG
v9.5.1 on Red Hat 4.8.5-4 64 bit, on a 10gb Direct Connect link from our
data center to.
Cisco:  Hyperflex HX240c M4 node with UCS B200 M4 blade, with 256gb ram, 48
cores at 2.2ghz, 4tb direct attached Intel flash (SSD) for the OS, 10tb of
NetApp Filer SSD storage via 4gb HBA cards.  PG v9.5.1 on Red Hat 4.8.5-4 64
bit, 10gbe networking but has to throttle down to 1gbe when talking to the
mysql source server.

PASS 1:
Process:  Extract (pull the raw v1 data over the network to the 32 v2
staging tables) 
Num Source Rows:  8,232,673 (Small Test) 
Rowcount Compression:  1.0 (1:1 copy) 
AWS Time in Secs:  1,516** 
Cisco Time in Secs:  376 
Difference:  4.0x
Comment:  AWS:  5.7k rows/seccisco:  21.9k rows/sec
(**network speed appears to be the factor, see notes below)

Process:  Transform/Load (all work local to the server - read,
transform, write as a single batch) 
Num Source Rows:  5,575,255 (many smaller batches from the source
tables, all writes going to a single target table) 
Avg Rowcount Compression:  10.3 (jsonb row compression resulting in 10x
fewer rows) 
AWS Time in Secs:  408 
Cisco Time in Secs:  294 
Difference:  1.4x  (the Cisco is 40% faster...not a huge difference)
Comment:AWS:  13.6k rows/sec   Cisco:  19k rows/sec

Notes:  The testing has revealed an issue with the networking in our data
center, which appears to be causing abnormally slow transfer speed to AWS.
That is being investigated.  So if we look at just the Transform/Load
process, we can see that both AWS and the local Cisco server have comparable
processing speeds on the small dataset.

However, when I moved to a medium sized dataset of 204m rows, a different
pattern emerged.  I'm including just the Transform/Load process here, and
testing just ONE table out of the batch:

PASS 2:
Process:  Transform/Load (all work local to the server - read,
transform, write as a single batch) 
Num Source Rows:  10,554,800 (one batch from just a single source table
going to a single target table) 
Avg Rowcount Compression:  31.5 (jsonb row compression resulting in
31.5x fewer rows) 
AWS Time in Secs:  2,493 (41.5 minutes) 
Cisco Time in Secs:  661 (10 minutes) 
Difference:  3.8x
Comment:AWS:  4.2k rows/sec   Cisco:  16k rows/sec

It's obvious the size of the batch exceeded the AWS server memory, resulting
in a profoundly slower processing time.  This was a true, apples to apples
comparison between Pass 1 and Pass 2: average row lengths were within 7% of
each other (1121 vs 1203) using identical table structures and processing
code, the only difference was the target server.

I'm happy to answer questions about these results.

Mike Sofen (USA)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query with big tables

2016-08-26 Thread Mike Sofen
 

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tommi K
Sent: Friday, August 26, 2016 7:25 AM
To: Craig James 
Cc: andreas kretschmer ; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow query with big tables

 

Ok, sorry that I did not add the original message. I thought that it would be 
automatically added to the message thread.

 

Here is the question again:

 

Is there way to keep query time constant as the database size grows. Should I 
use partitioning or partial indexes?

 

Thanks,

Tommi Kaksonen

 

 

 

> Hello, 

> 

> I have the following tables and query. I would like to get some help to find 
> out why it is slow and how its performance could be improved.

> 

> Thanks,

> Tommi K.

> 

> 

> --Table definitions---

> CREATE TABLE "Measurement"

> (

>   id bigserial NOT NULL,

>   product_id bigserial NOT NULL,

>   nominal_data_id bigserial NOT NULL,

>   description text,

>   serial text,

>   measurement_time timestamp without time zone,

>   status smallint,

>   system_description text,

>   CONSTRAINT "Measurement_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Measurement_nominal_data_id_fkey" FOREIGN KEY (nominal_data_id)

>   REFERENCES "Nominal_data" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION,

>   CONSTRAINT "Measurement_product_id_fkey" FOREIGN KEY (product_id)

>   REFERENCES "Product" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX measurement_time_index

>   ON "Measurement"

>   USING btree

>   (measurement_time);

> ALTER TABLE "Measurement" CLUSTER ON measurement_time_index;

> 

> CREATE TABLE "Product"

> (

>   id bigserial NOT NULL,

>   name text,

>   description text,

>   info text,

>   system_name text,

>   CONSTRAINT "Product_pkey" PRIMARY KEY (id)

> )

> WITH (

>   OIDS=FALSE

> );

> 

> 

> CREATE TABLE "Extra_info"

> (

>   id bigserial NOT NULL,

>   measurement_id bigserial NOT NULL,

>   name text,

>   description text,

>   info text,

>   type text,

>   value_string text,

>   value_double double precision,

>   value_integer bigint,

>   value_bool boolean,

>   CONSTRAINT "Extra_info_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Extra_info_measurement_id_fkey" FOREIGN KEY (measurement_id)

>   REFERENCES "Measurement" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX extra_info_measurement_id_index

>   ON "Extra_info"

>   USING btree

>   (measurement_id);

> 

> CREATE TABLE "Feature"

> (

>   id bigserial NOT NULL,

>   measurement_id bigserial NOT NULL,

>   name text,

>   description text,

>   info text,

>   CONSTRAINT "Feature_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Feature_measurement_id_fkey" FOREIGN KEY (measurement_id)

>   REFERENCES "Measurement" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX feature_measurement_id_and_name_index

>   ON "Feature"

>   USING btree

>   (measurement_id, name COLLATE pg_catalog."default");

> 

> CREATE INDEX feature_measurement_id_index

>   ON "Feature"

>   USING hash

>   (measurement_id);

> 

> 

> CREATE TABLE "Point"

> (

>   id bigserial NOT NULL,

>   feature_id bigserial NOT NULL,

>   x double precision,

>   y double precision,

>   z double precision,

>   status_x smallint,

>   status_y smallint,

>   status_z smallint,

>   difference_x double precision,

>   difference_y double precision,

>   difference_z double precision,

>   CONSTRAINT "Point_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Point_feature_id_fkey" FOREIGN KEY (feature_id)

>   REFERENCES "Feature" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX point_feature_id_index

>   ON "Point"

>   USING btree

>   (feature_id);

> 

> CREATE TABLE "Warning"

> (

>   id bigserial NOT NULL,

>   feature_id bigserial NOT NULL,

>   "number" smallint,

>   info text,

>   CONSTRAINT "Warning_pkey" PRIMARY KEY (id),

>   CONSTRAINT "Warning_feature_id_fkey" FOREIGN KEY (feature_id)

>   REFERENCES "Feature" (id) MATCH SIMPLE

>   ON UPDATE NO ACTION ON DELETE NO ACTION

> )

> WITH (

>   OIDS=FALSE

> );

> 

> CREATE INDEX warning_feature_id_index

>   ON "Warning"

>   USING btree

>   (feature_id);

> 

> 

> ---Query---

> SELECT

> f.name  , 

> f.description,

> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND 
> warning.id   IS NULL THEN 1 ELSE 0 END) AS green_count, 

> SUM(CASE WHEN p.status_x = 0 AND p.status_y = 0 AND p.status_z = 0 AND 
> warning.id   IS NOT NULL THEN 1 ELSE 0 END) AS 
> green_warned_count,

> SUM(CASE WHEN NOT (p.status_x = 0 AND p.status_y = 0 AND p.status_z = 

Re: [PERFORM] Big number of connections

2016-04-04 Thread Mike Sofen
From: Jim Nasby Sent: Sunday, April 03, 2016 10:19 AM

>>On 4/1/16 2:54 AM, jarek wrote:
>> I'll be happy to hear form users of big PostgreSQL installations, how 
>> many users do you have and what kind of problems we may expect.
>> Is there any risk, that huge number of roles will slowdown overall 
>> performance ?

>Assuming you're on decent sized hardware though, 3000-4000 open connections 
>shouldn't be much of an >issue *as long as very few are active at once*. If 
>you get into a situation where there's a surge of activity >and you suddenly 
>have 2x more active connections than cores, you won't be happy. I've seen that 
>push >servers into a state where the only way to recover was to disconnect 
>everyone.
>--
>Jim Nasby

Jim - I don't quite understand the math here: on a server with 20 cores, it can 
only support 40 active users?

I come from the SQL Server world where a single 20 core server could support 
hundreds/thousands of active users and/or many dozens of background/foreground 
data processes.  Is there something fundamentally different between the two 
platforms relative to active user loads?  How would we be able to use Postgres 
for larger web apps?

Mike Sofen

 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Architectural question

2016-03-23 Thread Mike Sofen
> -Original Message-
> Thomas Kellerer Wednesday, March 23, 2016 2:51 AM
> 
> Jim Nasby schrieb am 11.03.2016 um 17:37:
> > If the blob is in the database then you have nothing extra to do. It's 
> > handled
> just like all your other data.
> >
> > If it's a file in a file system then you need to:
> >
> > - Have application code that knows how and where to get at the file
> > - Have a way to make those files available on all your webservers
> > - Have completely separate backup and recovery plans for those files
> >
> > That's a lot of extra work. Sometimes it's necessary, but many times it's 
> > not.
> 
> Don't forget the code you need to write to properly handle transactional 
> access
> (writing, deleting) to the files
> 
> You usually also need to distribute the files over many directories.
> Having millions of files in a single directory is usually not such a good 
> idea.
> 
> In my experience you also need some cleanup job that removes orphaned files
> from the file system.
> Because no matter how hard you try, to get updates/writes to the file system
> right, at some point this fails.
> 
> Also from a security point of view having this in the database is more robust
> then in the file system.
> 
> The downside of bytea is that you can't stream them to the client. The
> application always needs to read the whole blob into memory before it can be
> used. This might put some memory pressure on the application server.
> 
> Thomas

This is really an excellent conversation, and highlights the never-ending 
contemplation
of blob storage.  I've had to go through this dialog in two different 
industries - healthcare
and now genomics, creating a new EMR (electronic medical record) system and 
storing
and manipulating huge genomic data sets.

I have, in both cases, ended up leaving the blob-type data outside of the 
database.  Even
though, as Thomas mentioned, it requires more database and app code to manage, 
it
ends up allowing for both systems to be optimized for their respective duties.

In addition, the vastly smaller database sizes result in far faster backups and 
restores, 
transactional replication maintains it's speed, and in general, I find the 
fault tolerant
behaviors to be excellent.  

Yes, losing track of a file would be very bad, and...we're only storing things 
like xray photos
or ct scans (healthcare), or genomic processing results.  In both cases, 
usually, the results
can be recreated.  That said, I've never lost a file so haven't needed to pull 
on that lever.

My latest model is placing large genomic data onto the AWS S3 file system, 
keeping all of
the metadata inside the database.  It's working very well so far, but we're 
still in development.

Mike



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Disk Benchmarking Question

2016-03-19 Thread Mike Sofen
Hi Dave,

 

Database disk performance has to take into account IOPs, and IMO, over MBPs, 
since it’s the ability of the disk subsystem to write lots of little bits 
(usually) versus writing giant globs, especially in direct attached storage 
(like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and 
this is where SSDs utterly crush spinning disks.

 

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get 
a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will 
support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs 
are eliminating spinning disks when performance is critical and budget allows.

 

Back to your question – the MBPs is the capacity of interface, so it makes 
sense that it’s the same for both reads and writes.  The perc raid controller 
will be saving your bacon on writes, with 2gb cache (assuming it’s caching 
writes), so it becomes the equivalent of an SSD up to the capacity limit of the 
write cache.  With only 400 iops of write speed, with a busy server you can 
easily saturate the cache and then your system will drop to a crawl.

 

If I didn’t answer the intent of your question, feel free to clarify for me.

 

Mike

 

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Disk Benchmarking Question

 

I'm pretty new to benchmarking hard disks and I'm looking for some advice on 
interpreting the results of some basic tests.

 

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

 

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an 
xfs volume for PGDATA.

 

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 
'bonnie++ -n0 -f' on the root volume.

 

Here's a link to the bonnie test results

https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0

 

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd 
expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty 
confused as to why the read and write sequential speeds are almost identical. 
Does this look wrong?

 

Thanks,

 

Dave

 

 

 



Re: [PERFORM] Disk Benchmarking Question

2016-03-18 Thread Mike Sofen
Sorry for the delay, long work day!

 

Ok, I THINK I understand where you’re going.  Do it this way:

4 drives in Raid10 = 2 pairs of mirrored drives, aka still 2 active drives (2 
are failover).  They are sharing the 12gbps SAS interface, but that speed is 
quite irrelevant…it’s just a giant pipe for filling lots of drives.  

 

Each of your 2 drives has a max seq read/write spec 200 MBPs (WAY max).  When I 
say max, I mean, under totally edge laboratory conditions, writing to the outer 
few tracks with purely sequential data (never happens in the real world).  With 
2 drives running perfectly in raid 10, the theoretical max would be 400mbps.  
Real world, less than half, on sequential.

 

But random writes are the rulers of most activity in the data world (think of 
writing a single row to a table – a few thousand bytes that might be plopped 
anywhere on the disk and then randomly retrieved.  So the MBPs throughput 
number becomes mostly meaningless (because the data chunks are small and 
random), and IOPs and drive seek times become king (thus my earlier comments).

 

So – if you’re having disk performance issues with a database, you either add 
more spinning disks (to increase IOPs/distribute them) or switch to SSDs and 
forget about almost everything…

 

Mike

 

--

From: Dave Stibrany [mailto:dstibr...@gmail.com] 
Sent: Friday, March 18, 2016 7:48 AM



Hey Mike,

 

Thanks for the response. I think where I'm confused is that I thought vendor 
specified MBps was an estimate of sequential read/write speed. Therefore if 
you're in RAID10, you'd have 4x the sequential read speed and 2x the sequential 
write speed. Am I misunderstanding something?

 

Also, when you mention that MBPs is the capacity of the interface, what do you 
mean exactly. I've been taking interface speed to be the electronic transfer 
speed, not the speed from the actual physical medium, and more in the 6-12 
gigabit range.

 

Please let me know if I'm way off on any of this, I'm hoping to have my mental 
model updated.

 

Thanks!

 

Dave

 

On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen <mso...@runbox.com 
<mailto:mso...@runbox.com> > wrote:

Hi Dave,

 

Database disk performance has to take into account IOPs, and IMO, over MBPs, 
since it’s the ability of the disk subsystem to write lots of little bits 
(usually) versus writing giant globs, especially in direct attached storage 
(like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and 
this is where SSDs utterly crush spinning disks.

 

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get 
a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will 
support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs 
are eliminating spinning disks when performance is critical and budget allows.

 

Back to your question – the MBPs is the capacity of interface, so it makes 
sense that it’s the same for both reads and writes.  The perc raid controller 
will be saving your bacon on writes, with 2gb cache (assuming it’s caching 
writes), so it becomes the equivalent of an SSD up to the capacity limit of the 
write cache.  With only 400 iops of write speed, with a busy server you can 
easily saturate the cache and then your system will drop to a crawl.

 

If I didn’t answer the intent of your question, feel free to clarify for me.

 

Mike

 

From: pgsql-performance-ow...@postgresql.org 
<mailto:pgsql-performance-ow...@postgresql.org>  
[mailto:pgsql-performance-ow...@postgresql.org 
<mailto:pgsql-performance-ow...@postgresql.org> ] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: pgsql-performance@postgresql.org <mailto:pgsql-performance@postgresql.org> 
Subject: [PERFORM] Disk Benchmarking Question

 

I'm pretty new to benchmarking hard disks and I'm looking for some advice on 
interpreting the results of some basic tests.

 

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

 

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an 
xfs volume for PGDATA.

 

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 
'bonnie++ -n0 -f' on the root volume.

 

Here's a link to the bonnie test results

https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0

 

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd 
expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty 
confused as to why the read and write sequential speeds are almost identical. 
Does this look wrong?

 

Thanks,

 

Dave

 

 

 





 

-- 

THIS IS A TEST