Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-02-05 Thread Віталій Тимчишин
2011/2/4 Robert Haas robertmh...@gmail.com

 On Fri, Feb 4, 2011 at 4:19 AM, Vitalii Tymchyshyn tiv...@gmail.com
 wrote:
  Why do you expect such a invasive code changes? I know little about
  postgresql code layering, but what I propose (with changing delete to
  truncate) is:
  1) Leave tuple addressing as it is now

 i.e. a block number and a slot position within the block?

 Seems like you'd need file,block,slot.


No, that's what I mean. Leave as it is. You will have file logical length
(fixed for all but the last one, 1GB currently) and file actual legth that
can be less (if file trucated). In the latter case you still have this
empty blocks that don't exists at all. Actually the simplest
implementation could be to tell to file system drop this part of file and
pretend it's all zeros, but I don't think many FSs (OSes?) supports this.
So, each  file still have it's fixed N blocks. And filenumber is still
blocknumber / N.


-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-02-04 Thread Vitalii Tymchyshyn

03.02.11 20:42, Robert Haas написав(ла):

2011/1/30 Віталій Тимчишинtiv...@gmail.com:

I was thinking if a table file could be deleted if it has no single live
row. And if this could be done by vacuum. In this case vacuum on table that
was fully updated recently could be almost as good as cluster - any scan
would skip such non-existing files really fast. Also almost no disk space
would be wasted.

VACUUM actually already does something along these lines.  If there
are 1 or any larger number of entirely-free pages at the end of a
table, VACUUM will truncate them away.  In the degenerate case where
ALL pages are entirely-free, this results in zeroing out the file.

The problem with this is that it rarely does much.  Consider a table
with 1,000,000 pages, 50% of which contain live rows.  On average, how
many pages will this algorithm truncate away?  Answer: if the pages
containing live rows are randomly distributed, approximately one.
Yes, but take into account operations on a (by different reasons) 
clustered tables, like removing archived data (yes I know, this is best 
done with partitioning, but one must still go to a point when he will 
decide to use partitioning :) ).

Your idea of having a set of heaps rather than a single heap is an
interesting one, but it's pretty much catering to the very specific
case of a full-table update.  I think the code changes needed would be
far too invasive to seriously contemplate doing it just for that one
case - although it is an important case that I would like to see us
improve.
Why do you expect such a invasive code changes? I know little about 
postgresql code layering, but what I propose (with changing delete to 
truncate) is:

1) Leave tuple addressing as it is now
2) Allow truncated files, treating non-existing part as if it contained 
not used tuples

3) Make vacuum truncate file if it has not used tuples at the end.

The only (relatively) tricky thing I can see is synchronizing truncation 
with parallel ongoing scan.


Best regards, Vitalii Tymchyshyn



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-02-04 Thread Robert Haas
On Fri, Feb 4, 2011 at 4:19 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote:
 Why do you expect such a invasive code changes? I know little about
 postgresql code layering, but what I propose (with changing delete to
 truncate) is:
 1) Leave tuple addressing as it is now

i.e. a block number and a slot position within the block?

Seems like you'd need file,block,slot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-02-03 Thread Robert Haas
2011/1/30 Віталій Тимчишин tiv...@gmail.com:
 I was thinking if a table file could be deleted if it has no single live
 row. And if this could be done by vacuum. In this case vacuum on table that
 was fully updated recently could be almost as good as cluster - any scan
 would skip such non-existing files really fast. Also almost no disk space
 would be wasted.

VACUUM actually already does something along these lines.  If there
are 1 or any larger number of entirely-free pages at the end of a
table, VACUUM will truncate them away.  In the degenerate case where
ALL pages are entirely-free, this results in zeroing out the file.

The problem with this is that it rarely does much.  Consider a table
with 1,000,000 pages, 50% of which contain live rows.  On average, how
many pages will this algorithm truncate away?  Answer: if the pages
containing live rows are randomly distributed, approximately one.
(Proof: There is a 50% chance that the last page will contain live
rows.  If so, we can't truncate anything.  If not, we can truncate one
page, and maybe more.  Now the chances of the next page being free are
499,999 in 999,999, or roughly one-half.  So we have an almost-25%
chance of being able to truncate at least two pages.  And so on.   So
you get roughly 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ... = 1 page.)

Your idea of having a set of heaps rather than a single heap is an
interesting one, but it's pretty much catering to the very specific
case of a full-table update.  I think the code changes needed would be
far too invasive to seriously contemplate doing it just for that one
case - although it is an important case that I would like to see us
improve.  Tom Lane previously objected to the idea of on-line table
compaction on the grounds that people's apps might break if CTIDs
changed under them, but I think a brawl between all the people who
want on-line table compaction and all the people who want to avoid
unexpected CTID changes would be pretty short.  A bigger problem - or
at least another problem - is that moving tuples this way is
cumbersome and expensive.  You basically have to move some tuples
(inserting new index entries for them), vacuum away the old index
entries (requiring a full scan of every index), and then repeat as
many times as necessary to shrink the table.  This is not exactly a
smooth maintenance procedure, or one that can be done without
significant disruption, but AFAIK nobody's come up with a better idea
yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-02-03 Thread Dave Crooke
There is a process in Oracle which essentially allows you to do the
equivalent of a CLUSTER in Postgres, but without locking the table, and so
updates can continue throughout the process. It requires a bit of manual
setup and fiddling (at least in Oracle 10g)  this would probably scratch
a lot of people's itches in this area. Of course, it's not trivial at all to
implement :-(

The Oracle equivalent of too many dead rows is too many chained rows and
that's where I've seen it used.

Cheers
Dave

2011/2/3 Robert Haas robertmh...@gmail.com

 2011/1/30 Віталій Тимчишин tiv...@gmail.com:
  I was thinking if a table file could be deleted if it has no single live
  row. And if this could be done by vacuum. In this case vacuum on table
 that
  was fully updated recently could be almost as good as cluster - any scan
  would skip such non-existing files really fast. Also almost no disk space
  would be wasted.

 VACUUM actually already does something along these lines.  If there
 are 1 or any larger number of entirely-free pages at the end of a
 table, VACUUM will truncate them away.  In the degenerate case where
 ALL pages are entirely-free, this results in zeroing out the file.

 The problem with this is that it rarely does much.  Consider a table
 with 1,000,000 pages, 50% of which contain live rows.  On average, how
 many pages will this algorithm truncate away?  Answer: if the pages
 containing live rows are randomly distributed, approximately one.
 (Proof: There is a 50% chance that the last page will contain live
 rows.  If so, we can't truncate anything.  If not, we can truncate one
 page, and maybe more.  Now the chances of the next page being free are
 499,999 in 999,999, or roughly one-half.  So we have an almost-25%
 chance of being able to truncate at least two pages.  And so on.   So
 you get roughly 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ... = 1 page.)

 Your idea of having a set of heaps rather than a single heap is an
 interesting one, but it's pretty much catering to the very specific
 case of a full-table update.  I think the code changes needed would be
 far too invasive to seriously contemplate doing it just for that one
 case - although it is an important case that I would like to see us
 improve.  Tom Lane previously objected to the idea of on-line table
 compaction on the grounds that people's apps might break if CTIDs
 changed under them, but I think a brawl between all the people who
 want on-line table compaction and all the people who want to avoid
 unexpected CTID changes would be pretty short.  A bigger problem - or
 at least another problem - is that moving tuples this way is
 cumbersome and expensive.  You basically have to move some tuples
 (inserting new index entries for them), vacuum away the old index
 entries (requiring a full scan of every index), and then repeat as
 many times as necessary to shrink the table.  This is not exactly a
 smooth maintenance procedure, or one that can be done without
 significant disruption, but AFAIK nobody's come up with a better idea
 yet.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-30 Thread Віталій Тимчишин
2011/1/28 Scott Carey sc...@richrelevance.com



 On 1/28/11 9:28 AM, Stephen Frost sfr...@snowman.net wrote:

 * Scott Marlowe (scott.marl...@gmail.com) wrote:
  There's nothing wrong with whole table updates as part of an import
  process, you just have to know to clean up after you're done, and
  regular vacuum can't fix this issue, only vacuum full or reindex or
  cluster.
 
 Just to share my experiences- I've found that creating a new table and
 inserting into it is actually faster than doing full-table updates, if
 that's an option for you.

 I wonder if postgres could automatically optimize that, if it thought that
 it was going to update more than X% of a table, and HOT was not going to
 help, then just create a new table file for XID's = or higher than the one
 making the change, and leave the old one for old XIDs, then regular VACUUM
 could toss out the old one if no more transactions could see it.


 I was thinking if a table file could be deleted if it has no single live
row. And if this could be done by vacuum. In this case vacuum on table that
was fully updated recently could be almost as good as cluster - any scan
would skip such non-existing files really fast. Also almost no disk space
would be wasted.

-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Stephen Frost
Robert,

* Robert Schnabel (schnab...@missouri.edu) wrote:
 Once the bulk data is inserted into the tables I generally
 do some updates on columns to set values which characterize the
 data.  

Please tell me you're not running actual full-table UPDATE statements...
You would be *much* better off either:
a) munging the data on the way in (if possible/reasonable)
b) loading the data into temp tables first, and then using INSERT
   statements to move the data into the 'final' tables WITH the new
   columns/info you want
c) considering if you can normalize the data into multiple tables and/or
   to cut down the columns to only what you need as you go through the
   above, too

A full-table UPDATE means you're basically making the table twice as big
as it needs to be, and it'll never get smaller..

 These columns then get indexed.  Basically once the initial
 manipulation is done the table is then static and what I'm looking
 for is query speed.

Sadly, this is the same type of DW needs that I've got (though with
telecomm data and phone calls, not genetic stuffs ;), and PG ends up
being limited by the fact that it can only use one core/thread to go
through the data with.

You might consider investing some time trying to figure out how to
parallelize your queries.  My approach to this has been to partition the
data (probably something you're doing already) into multiple tables and
then have shell/perl scripts which will run a given query against all of
the tables, dumping the results of that aggregation/analysis into other
tables, and then having a final 'merge' query.

 The data is sorted by snp_number, sample_id.  So if I want the data
 for a given sample_id it would be a block of ~58k rows.  The size of
 the table depends on how many sample_id's there are.  My largest has
 ~30k sample_id by 58k snp_number per sample.  The other big table
 (with children) is mutations and is set up similarly so that I can
 access individual tables (samples) based on constraints.  Each of
 these children have between 5-60M records.

Understand that indexes are only going to be used/useful, typically, if
the amount of records being returned is small relative to the size of
the table (eg: 5%).

 This is all direct attach storage via SAS2 so I'm guessing it's
 probably limited to the single port link between the controller and
 the expander.  Again, geneticist here not computer scientist. ;-)

That link certainly isn't going to help things..  You might consider how
or if you can improve that.

 All of the data could be reloaded.  Basically, once I get the data
 into the database and I'm done manipulating it I create a backup
 copy/dump which then gets stored at a couple different locations.

You might consider turning fsync off while you're doing these massive
data loads..  and make sure that you issue your 'CREATE TABLE' and your
'COPY' statements in the same transaction, and again, I suggest loading
into temporary (CREATE TEMPORARY TABLE) tables first, then doing the
CREATE TABLE/INSERT statement for the 'real' table.  Make sure that you
create *both* your constraints *and* your indexes *after* the table is
populated.

If you turn fsync off, make sure you turn it back on. :)

 My goal is to 1) have a fairly robust system so that I don't have to
 spend my time rebuilding things and 2) be able to query the data
 quickly.  Most of what I do are ad hoc queries.  I have an idea...
 how many X have Y in this set of Z samples and write the query to
 get the answer.  I can wait a couple minutes to get an answer but
 waiting an hour is becoming tiresome.

Have you done any analysis to see what the bottleneck actually is?  When
you run top, is your PG process constantly in 'D' state, or is it in 'R'
state, or what?  Might help figure some of that out.  Note that
parallelizing the query will help regardless of if it's disk bound or
CPU bound, when you're running on the kind of hardware you're talking
about (lots of spindles, multiple CPUs, etc).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Grant Johnson



Putting the WAL on a second controller does help, if you're write-heavy.

I tried separating indexes and data once on one server and didn't 
really notice that it helped much. Managing the space was problematic. 
I would suggest putting those together on a single RAID-10 of all the 
300GB drives (minus a spare). It will probably outperform separate 
arrays most of the time, and be much easier to manage.


--


I like to use RAID 1, and let LVM do the striping.   That way I can add 
more drives later too.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Robert Schnabel


  
  

On 1/28/2011 7:14 AM, Stephen Frost wrote:

  Robert,

* Robert Schnabel (schnab...@missouri.edu) wrote:

  
Once the bulk data is inserted into the tables I generally
do some updates on columns to set values which characterize the
data.  

  
  
Please tell me you're not running actual full-table UPDATE statements...
You would be *much* better off either:
a) munging the data on the way in (if possible/reasonable)
b) loading the data into temp tables first, and then using INSERT
   statements to move the data into the 'final' tables WITH the new
   columns/info you want
c) considering if you can normalize the data into multiple tables and/or
   to cut down the columns to only what you need as you go through the
   above, too

A full-table UPDATE means you're basically making the table twice as big
as it needs to be, and it'll never get smaller..


Depends on what you mean by that. The tables that I'm concerned
with look something like bigint x2, char var x13, int x24, real x8,
smallint x4 by about 65M rows, each. I only do the updates on one
table at a time. The real columns are actually null in the input
csv file. I run an update which basically uses some of the integer
columns and calculates frequencies which go into the real columns.
Ditto with some of the other columns. I don't do this before I
upload the data because 1) it's easier this way and 2) I can't
because some of the updates involve joins to other tables to grab
info that I can't do outside the database. So yes, once the upload
is done I run queries that update every row for certain columns, not
every column. After I'm done with a table I run a VACUUM ANALYZE.
I'm really not worried about what my table looks like on disk. I
actually take other steps also to avoid what you're talking about.


  
  
These columns then get indexed.  Basically once the initial
manipulation is done the table is then static and what I'm looking
for is query speed.

  
  
Sadly, this is the same type of DW needs that I've got (though with
telecomm data and phone calls, not genetic stuffs ;), and PG ends up
being limited by the fact that it can only use one core/thread to go
through the data with.

You might consider investing some time trying to figure out how to
parallelize your queries.  My approach to this has been to partition the
data (probably something you're doing already) into multiple tables and
then have shell/perl scripts which will run a given query against all of
the tables, dumping the results of that aggregation/analysis into other
tables, and then having a final 'merge' query.


Thanks for the advise but parallelizing/automating doesn't really do
anything for me. The data is already partitioned. Think of it this
way, you just got 65M new records with about 30 data points per
record on an individual sample. You put it in a new table of it's
own and now you want to characterize those 65M data points. The
first update flags about 60M of the rows as uninteresting so you
move them to their own *uninteresting* table and basically never
really touch them again (but you cant get rid of them). Now you're
working with 5M that you're going to characterize into about 20
categories based on what is in those 30 columns of data. Do all the
querying/updating then index and you're done. Too long to describe
but I cannot automate this. I only update one partition at a time
and only about every couple weeks or so.



  
  
The data is sorted by snp_number, sample_id.  So if I want the data
for a given sample_id it would be a block of ~58k rows.  The size of
the table depends on how many sample_id's there are.  My largest has
~30k sample_id by 58k snp_number per sample.  The other big table
(with children) is "mutations" and is set up similarly so that I can
access individual tables (samples) based on constraints.  Each of
these children have between 5-60M records.

  
  
Understand that indexes are only going to be used/useful, typically, if
the amount of records being returned is small relative to the size of
the table (eg: 5%).


Yep, I understand that. Even though they occupy a lot of space, I
keep them around because there are times when I need them.



  

  
This is all direct attach storage via SAS2 so I'm guessing it's
probably limited to the single port link between the controller and
the expander.  Again, geneticist here not computer scientist. ;-)

  
  
That link certainly isn't going to help things..  You might consider how
or if you can improve that.


Suggestions??? It was previously suggested to split the drives on
each array across the two controller ports rather than have all the
data drives on one port which makes sense. Maybe I'm getting my
terminology 

Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Scott Marlowe
On Fri, Jan 28, 2011 at 9:39 AM, Robert Schnabel schnab...@missouri.edu wrote:
 I can't do outside the database.  So yes, once the upload is done I run
 queries that update every row for certain columns, not every column.  After
 I'm done with a table I run a VACUUM ANALYZE.  I'm really not worried about
 what my table looks like on disk.  I actually take other steps also to avoid
 what you're talking about.

It will still get bloated.  If you update one column in one row in pg,
you now have two copies of that row in the database.  If you date 1
column in 1M rows, you now have 2M rows in the database (1M dead
rows, 1M live rows).  vacuum analyze will not get rid of them, but
will free them up to be used in future updates / inserts.  Vacuum full
or cluster will free up the space, but will lock the table while it
does so.

There's nothing wrong with whole table updates as part of an import
process, you just have to know to clean up after you're done, and
regular vacuum can't fix this issue, only vacuum full or reindex or
cluster.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Robert Schnabel


On 1/28/2011 11:00 AM, Scott Marlowe wrote:

On Fri, Jan 28, 2011 at 9:39 AM, Robert Schnabelschnab...@missouri.edu  wrote:

I can't do outside the database.  So yes, once the upload is done I run
queries that update every row for certain columns, not every column.  After
I'm done with a table I run a VACUUM ANALYZE.  I'm really not worried about
what my table looks like on disk.  I actually take other steps also to avoid
what you're talking about.

It will still get bloated.  If you update one column in one row in pg,
you now have two copies of that row in the database.  If you date 1
column in 1M rows, you now have 2M rows in the database (1M dead
rows, 1M live rows).  vacuum analyze will not get rid of them, but
will free them up to be used in future updates / inserts.  Vacuum full
or cluster will free up the space, but will lock the table while it
does so.

There's nothing wrong with whole table updates as part of an import
process, you just have to know to clean up after you're done, and
regular vacuum can't fix this issue, only vacuum full or reindex or
cluster.


Those are exactly what I was referring to with my other steps.  I just 
don't always do them as soon as I'm done updating because sometimes I 
want to query the table right away to find out something.  Yep, I found 
out the hard way that regular VACUUM didn't help.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Stephen Frost
Robert,

* Robert Schnabel (schnab...@missouri.edu) wrote:
Depends on what you mean by that.  The tables that I'm concerned with look
something like bigint x2, char var x13, int x24, real x8, smallint x4 by
about 65M rows, each.  I only do the updates on one table at a time.  The
real columns are actually null in the input csv file.  I run an update
which basically uses some of the integer columns and calculates
frequencies which go into the real columns.  

Erm, I'm pretty sure you're still increasing the size of the resulting
tables by quite a bit by doing this process- which will slow down later
queries.

 Ditto with some of the other
columns.  I don't do this before I upload the data because 1) it's easier
this way and 2) I can't because some of the updates involve joins to other
tables to grab info that I can't do outside the database.  

That's fine- just first load the data into temporary tables and then do
INSERT INTO new_table SELECT your query;

instead.

 So yes, once
the upload is done I run queries that update every row for certain
columns, not every column.  After I'm done with a table I run a VACUUM
ANALYZE.  I'm really not worried about what my table looks like on disk. 

I thought you wanted it fast..?  If not, I'm not sure why you're
bothering to post to this list.  What it looks like on disk certainly
impacts how fast it is...

I actually take other steps also to avoid what you're talking about.

If you really don't feel like changing your process, you could just run
'CLUSTER' on the table, on whatever index you use most frequently, and
PG will rewrite the entire table for you, dropping all the dead rows,
etc.  You should then run VACUUM FREEZE on it.

  These columns then get indexed.  Basically once the initial
  manipulation is done the table is then static and what I'm looking
  for is query speed.

Yes, I gathered that, making the table smaller on disk will improve
query speed.

Thanks for the advise but parallelizing/automating doesn't really do
anything for me.  The data is already partitioned.  Think of it this way,
you just got 65M new records with about 30 data points per record on an
individual sample.  You put it in a new table of it's own and now you want
to characterize those 65M data points.  The first update flags about 60M
of the rows as uninteresting so you move them to their own *uninteresting*
table and basically never really touch them again (but you cant get rid of
them).  Now you're working with 5M that you're going to characterize into
about 20 categories based on what is in those 30 columns of data.  Do all
the querying/updating then index and you're done.  Too long to describe
but I cannot automate this.  I only update one partition at a time and
only about every couple weeks or so.

I was referring to parallelizing queries *after* the data is all loaded,
etc.  I wasn't talking about the queries that you use during the load.

I presume that after the load you run some queries.  You can probably
parallelize those queries (most DW queries can be, be ime...).

  That link certainly isn't going to help things..  You might consider how
  or if you can improve that.
 
Suggestions???  It was previously suggested to split the drives on each
array across the two controller ports rather than have all the data drives
on one port which makes sense.  Maybe I'm getting my terminology wrong
here but I'm talking about a single SFF-8088 link to each 16 drive
enclosure.  What about two controllers, one for each enclosure?  Don't
know if I have enough empty slots though.

I don't know that you'd need a second controller (though it probably
wouldn't hurt if you could).  If there's only one way to attach the
enclosure, then so be it.  The issue is if the enclosures end up
multi-plexing the individual drives into fewer channels than there are
actual drives, hence creating a bottle-neck.  You would need different
enclosures to deal with that, if that's the case.

I haven't messed with fsync but maybe I'll try.  In general, I create my
indexes and constraints after I'm done doing all the updating I need to
do.  I made the mistake *once* of copying millions of rows into a table
that already had indexes.

Yeah, I bet that took a while.  As I said above, if you don't want to
change your process (which, tbh, I think would be faster if you were
doing INSERTs into a new table than full-table UPDATEs...), then you
should do a CLUSTER after you've created whatever is the most popular
INDEX, and then create your other indexes after that.

It got lost from the original post but my database (9.0.0) is currently on
my Windows XP 64-bit workstation in my office on a 16 drive Seagate 15k.5
RAID5, no comments needed, I know, I'm moving it :-).  I'm moving it to my
server which is Windows Ent Server 2008 R2 64-bit 8 AMD cores  32G ram
and 

Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Stephen Frost
* Scott Marlowe (scott.marl...@gmail.com) wrote:
 There's nothing wrong with whole table updates as part of an import
 process, you just have to know to clean up after you're done, and
 regular vacuum can't fix this issue, only vacuum full or reindex or
 cluster.

Just to share my experiences- I've found that creating a new table and
inserting into it is actually faster than doing full-table updates, if
that's an option for you.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Scott Carey


On 1/27/11 4:11 PM, Alan Hodgson 
ahodg...@simkin.camailto:ahodg...@simkin.ca wrote:


On January 27, 2011, Robert Schnabel 
schnab...@missouri.edumailto:schnab...@missouri.edu wrote:

 So my questions are 1) am I'm crazy for doing this, 2) would you change

 anything and 3) is it acceptable to put the xlog  wal (and perhaps tmp

 filespace) on a different controller than everything else? Please keep

 in mind I'm a geneticist who happens to know a little bit about

 bioinformatics and not the reverse. :-)



Putting the WAL on a second controller does help, if you're write-heavy.

I tried separating indexes and data once on one server and didn't really notice 
that it helped much. Managing the space was problematic. I would suggest 
putting those together on a single RAID-10 of all the 300GB drives (minus a 
spare). It will probably outperform separate arrays most of the time, and be 
much easier to manage.

If you go this route, I suggest two equally sized RAID 10's on different 
controllers fir index + data, with software raid-0 on top of that.  RAID 10 
will max out a controller after 6 to 10 drives, usually.  Using the OS RAID 0 
to aggregate the throughput of two controllers works great.

WAL only has to be a little bit faster than your network in most cases.  I've 
never seen it be a bottleneck on large bulk loads if it is on its own 
controller with 120MB/sec write throughput.  I suppose a bulk load from COPY 
might stress it a bit more, but CPU ends up the bottleneck in postgres once you 
have I/O hardware this capable.



--

A hybrid Escalade is missing the point much in the same way that having a diet 
soda with your extra large pepperoni pizza is missing the point.


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Scott Carey


On 1/28/11 9:00 AM, Scott Marlowe scott.marl...@gmail.com wrote:

On Fri, Jan 28, 2011 at 9:39 AM, Robert Schnabel schnab...@missouri.edu
wrote:
 I can't do outside the database.  So yes, once the upload is done I run
 queries that update every row for certain columns, not every column.
After
 I'm done with a table I run a VACUUM ANALYZE.  I'm really not worried
about
 what my table looks like on disk.  I actually take other steps also to
avoid
 what you're talking about.

It will still get bloated.  If you update one column in one row in pg,
you now have two copies of that row in the database.  If you date 1
column in 1M rows, you now have 2M rows in the database (1M dead
rows, 1M live rows).  vacuum analyze will not get rid of them, but
will free them up to be used in future updates / inserts.  Vacuum full
or cluster will free up the space, but will lock the table while it
does so.

There's nothing wrong with whole table updates as part of an import
process, you just have to know to clean up after you're done, and
regular vacuum can't fix this issue, only vacuum full or reindex or
cluster.


Also note that HOT will come into play if you have FILLFACTOR set
appropriately, so you won't get two copies of the row.  This is true if
the column being updated is small enough and not indexed.  It wastes some
space, but a lot less than the factor of two.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Scott Carey


On 1/28/11 9:28 AM, Stephen Frost sfr...@snowman.net wrote:

* Scott Marlowe (scott.marl...@gmail.com) wrote:
 There's nothing wrong with whole table updates as part of an import
 process, you just have to know to clean up after you're done, and
 regular vacuum can't fix this issue, only vacuum full or reindex or
 cluster.

Just to share my experiences- I've found that creating a new table and
inserting into it is actually faster than doing full-table updates, if
that's an option for you.

I wonder if postgres could automatically optimize that, if it thought that
it was going to update more than X% of a table, and HOT was not going to
help, then just create a new table file for XID's = or higher than the one
making the change, and leave the old one for old XIDs, then regular VACUUM
could toss out the old one if no more transactions could see it.



Thanks,

Stephen


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Scott Marlowe
On Fri, Jan 28, 2011 at 10:44 AM, Scott Carey sc...@richrelevance.com wrote:
 If you go this route, I suggest two equally sized RAID 10's on different
 controllers fir index + data, with software raid-0 on top of that.  RAID 10
 will max out a controller after 6 to 10 drives, usually.  Using the OS RAID
 0 to aggregate the throughput of two controllers works great.

I often go one step further and just create a bunch of RAID-1 pairs
and use OS level RAID-0 on top of that.  On the LSI cards that was
by far the fastest setup I tested.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Robert Schnabel


On 1/28/2011 11:14 AM, Stephen Frost wrote:


It got lost from the original post but my database (9.0.0) is currently on
my Windows XP 64-bit workstation in my office on a 16 drive Seagate 15k.5
RAID5, no comments needed, I know, I'm moving it :-).  I'm moving it to my
server which is Windows Ent Server 2008 R2 64-bit 8 AMD cores  32G ram
and these new drives/controller.

Ughh...  No chance to get a Unix-based system (Linux, BSD, whatever) on
there instead?  I really don't think Windows Server is going to help
your situation one bit.. :(

Almost zero chance.  I basically admin the server myself so I can do 
whatever I want but all permissions are controlled through campus active 
directory and our departmental IT person doesn't do *nix.  So let's just 
assume I'm stuck with Windows. The main purpose of the server at the 
moment is to house our backup images.  I have two 9 TB arrays which I 
use robocopy to mirror images once a day between our other server and my 
workstation.  There's really not much of anything else ever eating up 
CPUs on the server which is why I'm moving my database onto it.



I appreciate the comments thus far.

Let's hope you'll always appreciate them. :)

Thanks,

Stephen
Umm, that didn't quite read the way I meant it to when I wrote it.  All 
comments are appreciated.  :-)


Seriously though, there have been points made that have made me rethink 
how I go about processing data which I'm sure will help.  I'm in a 
fairly fortunate position in that I can put these new drives on the 
server and play around with different configurations while I maintain my 
current setup on my workstation.  I guess I just need to experiment and 
see what works.


Thanks again,
Bob



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-28 Thread Robert Schnabel


  
  

On 1/28/2011 11:44 AM, Scott Carey wrote:

  
  
  
  
  

  On 1/27/11 4:11 PM, "Alan Hodgson" ahodg...@simkin.ca
wrote:




  


  On January 27,
2011, Robert Schnabel schnab...@missouri.edu
wrote:
   So my
questions are 1) am I'm crazy for doing this, 2) would
you change
   anything
and 3) is it acceptable to put the xlog  wal (and
perhaps tmp
   filespace)
on a different controller than everything else? Please
keep
   in mind I'm
a geneticist who happens to know a little bit about
  
bioinformatics and not the reverse. :-)
   
  Putting the WAL
on a second controller does help, if you're write-heavy.
  I tried
separating indexes and data once on one server and
didn't really notice that it helped much. Managing the
space was problematic. I would suggest putting those
together on a single RAID-10 of all the 300GB drives
(minus a spare). It will probably outperform separate
arrays most of the time, and be much easier to manage.

  

  
  
  
  If you go this route, I suggest two equally sized RAID 10's
on different controllers fir index + data, with software raid-0
on top of that. RAID 10 will max out a controller after 6 to 10
drives, usually. Using the OS RAID 0 to aggregate the
throughput of two controllers works great.
  
  
  WAL only has to be a little bit faster than your network in
most cases. I've never seen it be a bottleneck on large bulk
loads if it is on its own controller with 120MB/sec write
throughput. I suppose a bulk load from COPY might stress it a
bit more, but CPU ends up the bottleneck in postgres once you
have I/O hardware this capable.
  
  

Do you mean 14 drives in one box as RAID10's on one controller, then
14 drives in the other box on a second controller, then software
RAID0 each of the two RAID10's together essentially as a single 4 TB
array? Would you still recommend doing this with Windows?
Bob



  



Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-27 Thread david

On Thu, 27 Jan 2011, Robert Schnabel wrote:


HI,

I use PostgreSQL basically as a data warehouse to store all the genetic data 
that our lab generates.  The only person that accesses the database is myself 
and therefore I've had it housed on my workstation in my office up till now. 
However, it's getting time to move it to bigger hardware.  I currently have a 
server that is basically only storing backup images of all our other 
workstations so I'm going to move my database onto it.  The server looks like 
this: Windows Server Enterprise 2008 R2 64-bit, AMD 2350 quad-core x2, 32GB 
RAM.  For my purposes the CPUs and RAM are fine.  I currently have an Adaptec 
52445+BBU controller that has the OS (4 drive RAID5), FTP (4 drive RAID5) and 
two backup arrays (8 drive each RAID0).  The backup arrays are in a 16 drive 
external enclosure through an expander so I actually have 16 ports free on 
the 52445 card.  I plan to remove 3 of the drives from my backup arrays to 
make room for 3 - 73GB 15k.5 drives (re-purposed from my workstation).  Two 
16 drive enclosures with SAS2 expanders just arrived as well as 36 Seagate 
15k.7 300GB drives (ST3300657SS).  I also intend on getting an Adaptec 6445 
controller with the flash module when it becomes available in about a month 
or two.  I already have several Adaptec cards so I'd prefer to stick with 
them.


Here's the way I was planning using the new hardware:
xlog  wal: 3 - 73G 15k.5 RAID1+hot spare in enclosure A on 52445 controller
data: 22 - 300G 15k.7 RAID10 enclosure BC on 6445 controller
indexes: 8 - 300G 15k.7 RAID10 enclosure C on 6445 controller
2 - 300G 15k.7 as hot spares enclosure C
4 spare 15k.7 for on the shelf

With this configuration I figure I'll have ~3TB for my main data tables and 
1TB for indexes.  Right now my database is 500GB total.  The 3:1 split 
reflects my current table structure and what I foresee coming down the road 
in terms of new data.


So my questions are 1) am I'm crazy for doing this, 2) would you change 
anything and 3) is it acceptable to put the xlog  wal (and perhaps tmp 
filespace) on a different controller than everything else?  Please keep in 
mind I'm a geneticist who happens to know a little bit about bioinformatics 
and not the reverse. :-)


a number of questions spring to mind

how much of the time are you expecting to spend inserting data into this 
system vs querying data from the system?


is data arriving continuously, or is it a matter of receiving a bunch of 
data, inserting it, then querying it?


which do you need to optimize for, insert speed or query speed?

do you expect your queries to be searching for a subset of the data 
scattered randomly throughlut the input data, or do you expect it to be 
'grab this (relativly) sequential chunk of input data and manipulate it to 
generate a report' type of thing


what is your connectvity to the raid enclosures? (does 
putting 22 drives on one cable mean that you will be limited due to the 
bandwidth of this cable rather than the performance of the drives)


can you do other forms of raid on these drives or only raid 10?

how critical is the data in this database? if it were to die would it just 
be a matter of recreating it and reloading the data? or would you loose 
irreplaceable data?


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-27 Thread Alan Hodgson
On January 27, 2011, Robert Schnabel schnab...@missouri.edu wrote:
 So my questions are 1) am I'm crazy for doing this, 2) would you change
 anything and 3) is it acceptable to put the xlog  wal (and perhaps tmp
 filespace) on a different controller than everything else?  Please keep
 in mind I'm a geneticist who happens to know a little bit about
 bioinformatics and not the reverse. :-)
 

Putting the WAL on a second controller does help, if you're write-heavy.

I tried separating indexes and data once on one server and didn't really 
notice that it helped much. Managing the space was problematic. I would 
suggest putting those together on a single RAID-10 of all the 300GB drives 
(minus a spare). It will probably outperform separate arrays most of the 
time, and be much easier to manage.

-- 
A hybrid Escalade is missing the point much in the same way that having a 
diet soda with your extra large pepperoni pizza is missing the point.


Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-27 Thread Robert Schnabel


On 1/27/2011 5:19 PM, da...@lang.hm wrote:

On Thu, 27 Jan 2011, Robert Schnabel wrote:


HI,

I use PostgreSQL basically as a data warehouse to store all the genetic data
that our lab generates.  The only person that accesses the database is myself
and therefore I've had it housed on my workstation in my office up till now.
However, it's getting time to move it to bigger hardware.  I currently have a
server that is basically only storing backup images of all our other
workstations so I'm going to move my database onto it.  The server looks like
this: Windows Server Enterprise 2008 R2 64-bit, AMD 2350 quad-core x2, 32GB
RAM.  For my purposes the CPUs and RAM are fine.  I currently have an Adaptec
52445+BBU controller that has the OS (4 drive RAID5), FTP (4 drive RAID5) and
two backup arrays (8 drive each RAID0).  The backup arrays are in a 16 drive
external enclosure through an expander so I actually have 16 ports free on
the 52445 card.  I plan to remove 3 of the drives from my backup arrays to
make room for 3 - 73GB 15k.5 drives (re-purposed from my workstation).  Two
16 drive enclosures with SAS2 expanders just arrived as well as 36 Seagate
15k.7 300GB drives (ST3300657SS).  I also intend on getting an Adaptec 6445
controller with the flash module when it becomes available in about a month
or two.  I already have several Adaptec cards so I'd prefer to stick with
them.

Here's the way I was planning using the new hardware:
xlog  wal: 3 - 73G 15k.5 RAID1+hot spare in enclosure A on 52445 controller
data: 22 - 300G 15k.7 RAID10 enclosure BC on 6445 controller
indexes: 8 - 300G 15k.7 RAID10 enclosure C on 6445 controller
2 - 300G 15k.7 as hot spares enclosure C
4 spare 15k.7 for on the shelf

With this configuration I figure I'll have ~3TB for my main data tables and
1TB for indexes.  Right now my database is 500GB total.  The 3:1 split
reflects my current table structure and what I foresee coming down the road
in terms of new data.

So my questions are 1) am I'm crazy for doing this, 2) would you change
anything and 3) is it acceptable to put the xlog  wal (and perhaps tmp
filespace) on a different controller than everything else?  Please keep in
mind I'm a geneticist who happens to know a little bit about bioinformatics
and not the reverse. :-)

a number of questions spring to mind

how much of the time are you expecting to spend inserting data into this
system vs querying data from the system?

is data arriving continuously, or is it a matter of receiving a bunch of
data, inserting it, then querying it?

which do you need to optimize for, insert speed or query speed?

Bulk loads of GB of data via COPY from csv files once every couple 
weeks.  I basically only have a couple different table types based on 
the data going into them.  Each type is set up as inherited tables so 
there is a new child table for each sample that is added.  Once the 
bulk data is inserted into the tables I generally do some updates on 
columns to set values which characterize the data.  These columns then 
get indexed.  Basically once the initial manipulation is done the table 
is then static and what I'm looking for is query speed.



do you expect your queries to be searching for a subset of the data
scattered randomly throughlut the input data, or do you expect it to be
'grab this (relativly) sequential chunk of input data and manipulate it to
generate a report' type of thing
Generally it is grab a big sequential chunk of data and either dump it 
to a csv or insert into another table.  I use external scripts to format 
data.  My two big table structures look like this:


CREATE TABLE genotypes
(
  snp_number integer NOT NULL,
  sample_id integer NOT NULL,
  genotype smallint NOT NULL
)

There are ~58k unique snp_number.  Other tables will have upwards of 
600-700k snp_number.  The child tables have a constraint based on 
sample_id such as:

CONSTRAINT check100 CHECK (sample_id  1 AND sample_id  10100)

The data is sorted by snp_number, sample_id.  So if I want the data for 
a given sample_id it would be a block of ~58k rows.  The size of the 
table depends on how many sample_id's there are.  My largest has ~30k 
sample_id by 58k snp_number per sample.  The other big table (with 
children) is mutations and is set up similarly so that I can access 
individual tables (samples) based on constraints.  Each of these 
children have between 5-60M records.



what is your connectvity to the raid enclosures? (does
putting 22 drives on one cable mean that you will be limited due to the
bandwidth of this cable rather than the performance of the drives)

can you do other forms of raid on these drives or only raid 10?
This is all direct attach storage via SAS2 so I'm guessing it's probably 
limited to the single port link between the controller and the 
expander.  Again, geneticist here not computer scientist. ;-)  The 
enclosures have Areca ARC-8026-16 expanders.  I can basically do 
whatever RAID level I want.



how 

Re: [PERFORM] How to best use 32 15k.7 300GB drives?

2011-01-27 Thread david
sorry for not replying properly to your response, I managed to delete the 
mail.


as I understand your data access pattern it's the following:

for the main table space:

bulk loads every couple of weeks. if the data is lost you can just reload 
it.


searches tend to be extracting large sequential chunks of data, either to 
external files or into different tables spaces.


for this table space, you are basically only inserting every couple of 
weeks, and it sounds as if you do not really care how long it takes to 
load the data.



first the disclaimer, I'm not a postgres expert, but I do have good 
experiance with large amounts of data on linux systems (and especially 
running into the limitations when doing it on the cheap ;-)



with this data pattern your WAL is meaningless (as it's only relavent for 
isertes), and you may as well use raid6 as raid10 (both allow you to 
utalize all drives for reads, but raid6 gives you 2 drives worth of 
reducnancy while the wrong two drives on raid10 could kill the entire 
array). You may even want to disable fsync on imports. It will save you a 
lot of time, and if the system crashes during the load you can just 
reinitialize and reload the data.


however, since you are going to be large sequential data transfers, you 
want to be utalizing multiple SAS links, preferrably as evenly as 
possible, so rather than putting all your data drives on one port, you may 
want to spread them between ports so that your aggragate bandwidth to the 
drives is higher (with this many high speed drives, this is a significant 
limitation)



the usual reason for keeping the index drives separate is to avoid having 
writes interact with index reads. Since you are not going to be doing both 
at the same time, I don't know if it helps to separate your indexes.



now, if you pull the data from this main table into a smaller table for 
analysis, you may want to do more interesting things with the drives that 
you use for this smaller table as you are going to be loading data into 
them more frequently.


David Lang


On Thu, 27 Jan 2011, da...@lang.hm wrote:


Date: Thu, 27 Jan 2011 15:19:32 -0800 (PST)
From: da...@lang.hm
To: Robert Schnabel schnab...@missouri.edu
Cc: pgsql-performance pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to best use 32 15k.7 300GB drives?

On Thu, 27 Jan 2011, Robert Schnabel wrote:


HI,

I use PostgreSQL basically as a data warehouse to store all the genetic 
data that our lab generates.  The only person that accesses the database is 
myself and therefore I've had it housed on my workstation in my office up 
till now. However, it's getting time to move it to bigger hardware.  I 
currently have a server that is basically only storing backup images of all 
our other workstations so I'm going to move my database onto it.  The 
server looks like this: Windows Server Enterprise 2008 R2 64-bit, AMD 2350 
quad-core x2, 32GB RAM.  For my purposes the CPUs and RAM are fine.  I 
currently have an Adaptec 52445+BBU controller that has the OS (4 drive 
RAID5), FTP (4 drive RAID5) and two backup arrays (8 drive each RAID0). 
The backup arrays are in a 16 drive external enclosure through an expander 
so I actually have 16 ports free on the 52445 card.  I plan to remove 3 of 
the drives from my backup arrays to make room for 3 - 73GB 15k.5 drives 
(re-purposed from my workstation).  Two 16 drive enclosures with SAS2 
expanders just arrived as well as 36 Seagate 15k.7 300GB drives 
(ST3300657SS).  I also intend on getting an Adaptec 6445 controller with 
the flash module when it becomes available in about a month or two.  I 
already have several Adaptec cards so I'd prefer to stick with them.


Here's the way I was planning using the new hardware:
xlog  wal: 3 - 73G 15k.5 RAID1+hot spare in enclosure A on 52445 
controller

data: 22 - 300G 15k.7 RAID10 enclosure BC on 6445 controller
indexes: 8 - 300G 15k.7 RAID10 enclosure C on 6445 controller
2 - 300G 15k.7 as hot spares enclosure C
4 spare 15k.7 for on the shelf

With this configuration I figure I'll have ~3TB for my main data tables and 
1TB for indexes.  Right now my database is 500GB total.  The 3:1 split 
reflects my current table structure and what I foresee coming down the road 
in terms of new data.


So my questions are 1) am I'm crazy for doing this, 2) would you change 
anything and 3) is it acceptable to put the xlog  wal (and perhaps tmp 
filespace) on a different controller than everything else?  Please keep in 
mind I'm a geneticist who happens to know a little bit about bioinformatics 
and not the reverse. :-)


a number of questions spring to mind

how much of the time are you expecting to spend inserting data into this 
system vs querying data from the system?


is data arriving continuously, or is it a matter of receiving a bunch of 
data, inserting it, then querying it?


which do you need to optimize for, insert speed or query speed?

do you expect your queries to be searching