Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread Richard_D_Levine
Jignesh,

Juan says the following below:

I figured the number of cores on the T1000/2000 processors would be
utilized by the forked copies of the postgresql server.  From the comments
I have seen so far it does not look like this is the case.

I think this needs to be refuted.  Doesn't Solaris switch processes as well
as threads (LWPs, whatever) equally well amongst cores?  I realize the
process context switch is more expensive than the thread switch, but
Solaris will utilize all cores as processes or threads become ready to run,
correct?

BTW, it's great to see folks with your email address on the list.  I feel
it points to a brighter future for all involved.

Thanks,

Rick


   
 Jignesh K. Shah 
 [EMAIL PROTECTED] 
   To 
 Sent by:  Juan Casero [EMAIL PROTECTED]   
 pgsql-performance  cc 
 [EMAIL PROTECTED] pgsql-performance@postgresql.org
 .org  Subject 
   Re: [PERFORM] PostgreSQL and
   Ultrasparc T1   
 12/19/2005 11:19  
 PM
   
   
   
   




I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously,
UltraSPARC T1 can do well compared comparatively provided the
application can scale also along with it.

The best way to approach is to figure out your peak workload, find an
accurate way to measure the true metric and then design a  benchmark
for it and run it on both servers.

Regards,
Jignesh


Juan Casero wrote:

Ok.  That  is what I wanted to know.  Right now this database is a
PostgreSQL
7.4.8 system.  I am using it in a sort of DSS role.  I have weekly
summaries
of the sales for our division going back three years.  I have a PHP based
webapp that I wrote to give the managers access to this data.  The webapp
lets them make selections for reports and then it submits a parameterized
query to the database for execution.  The returned data rows are displayed

and formatted in their web browser.  My largest sales table is about 13
million rows along with all the indexes it takes up about 20 gigabytes.  I

need to scale this application up to nearly 100 gigabytes to handle daily
sales summaries.  Once we start looking at daily sales figures our
database
size could grow ten to twenty times.  I use postgresql because it gives me

the kind of enterprise database features I need to program the complex
logic
for the queries.I also need the transaction isolation facilities it
provides so I can optimize the queries in plpgsql without worrying about
multiple users temp tables colliding with each other.  Additionally, I
hope
to rewrite the front end application in JSP so maybe I could use the
multithreaded features of the Java to exploit a multicore multi-cpu
system.
There are almost no writes to the database tables.   The bulk of the
application is just executing parameterized queries and returning huge
amounts of data.  I know bizgres is supposed to be better at this but I
want
to stay away from anything that is beta.  I cannot afford for this thing
to
go wrong.  My reasoning for looking at the T1000/2000 was simply the large

number of cores.  I  know postgresql uses a super server that forks copies
of
itself to handle incoming requests on port 5432.  But I figured the number
of
cores on the T1000/2000 processors would be utilized by the forked copies
of
the postgresql server.  From the comments I have seen so far it does not
look
like this is the case.  We had originally sized up a dual processor dual
core
AMD opteron system from HP for this but I thought I could get more bang
for
the buck on a T1000/2000.  It now seems I may have been wrong.  I am
stronger
in Linux than Solaris so I am not upset I am just trying to find the best
hardware for the anticipated needs of this application.

Thanks,
Juan

On Monday 19 December 2005 01:25, Scott Marlowe wrote:


From: [EMAIL PROTECTED] on behalf of Juan Casero

QUOTE:

Hi -


Can anyone tell me how well PostgreSQL 8.x performs on the new Sun
Ultrasparc T1 processor and architecture on 

Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Richard_D_Levine
 you'd be much better served by
 putting a big NVRAM cache in front of a fast disk array

I agree with the point below, but I think price was the issue of the
original discussion.  That said, it seems that a single high speed spindle
would give this a run for its money in both price and performance, and for
the same reasons Mike points out.  Maybe a SCSI 160 or 320 at 15k, or maybe
even something slower.

Rick

[EMAIL PROTECTED] wrote on 07/26/2005 01:33:43 PM:

 On Tue, Jul 26, 2005 at 11:23:23AM -0700, Luke Lonergan wrote:
 Yup - interesting and very niche product - it seems like it's only
obvious
 application is for the Postgresql WAL problem :-)

 On the contrary--it's not obvious that it is an ideal fit for a WAL. A
 ram disk like this is optimized for highly random access applications.
 The WAL is a single sequential writer. If you're in the kind of market
 that needs a really high performance WAL you'd be much better served by
 putting a big NVRAM cache in front of a fast disk array than by buying a
 toy like this.

 Mike Stone

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Richard_D_Levine
 exploring the option of buying 10 cheapass
 machines for $300 each.  At the moment, that $300 buys you, from Dell, a
 2.5Ghz Pentium 4

Buy cheaper ass Dells with an AMD 64 3000+.  Beats the crap out of the 2.5
GHz Pentium, especially for PostgreSQL.

See the thread Whence the Opterons for more

Rick

[EMAIL PROTECTED] wrote on 05/10/2005 10:02:50 AM:


 I think that perhaps he was trying to avoid having to buy Big Iron at
all.

 With all the Opteron v. Xeon around here, and talk of $30,000 machines,
 perhaps it would be worth exploring the option of buying 10 cheapass
 machines for $300 each.  At the moment, that $300 buys you, from Dell, a
 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit
ethernet.
 The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
 harness as a single machine.

 For those of us looking at batch and data warehousing applications, it
would
 be really handy to be able to partition databases, tables, and processing
 load across banks of cheap hardware.

 Yes, clustering solutions can distribute the data, and can even do it on
a
 per-table basis in some cases.  This still leaves it up to the
application's
 logic to handle reunification of the data.

 Ideas:
1. Create a table/storage type that consists of a select statement
 on another machine.  While I don't think the current executor is capable
of
 working on multiple nodes of an execution tree at the same time, it would
be
 great if it could offload a select of tuples from a remote table to an
 entirely different server and merge the resulting data into the current
 execution.  I believe MySQL has this, and Oracle may implement it in
another
 way.

2. There is no #2 at this time, but I'm sure one can be
 hypothesized.

 ...Google and other companies have definitely proved that one can harness
 huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of John A
Meinel
 Sent: Tuesday, May 10, 2005 7:41 AM
 To: Alex Stapleton
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Partitioning / Clustering

 Alex Stapleton wrote:
  What is the status of Postgres support for any sort of multi-machine
  scaling support? What are you meant to do once you've upgraded your
  box and tuned the conf files as much as you can? But your query load
  is just too high for a single machine?
 
  Upgrading stock Dell boxes (I know we could be using better machines,
  but I am trying to tackle the real issue) is not a hugely price
  efficient way of getting extra performance, nor particularly scalable
  in the long term.

 Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far
 away from Big Iron. I don't know what performance you are looking for,
but
 you can easily get into inserting 10M rows/day with quality hardware.

 But actually is it your SELECT load that is too high, or your INSERT
load,
 or something inbetween.

 Because Slony is around if it is a SELECT problem.
 http://gborg.postgresql.org/project/slony1/projdisplay.php

 Basically, Slony is a Master/Slave replication system. So if you have
INSERT
 going into the Master, you can have as many replicated slaves, which can
 handle your SELECT load.
 Slony is an asynchronous replicator, so there is a time delay from the
 INSERT until it will show up on a slave, but that time could be pretty
 small.

 This would require some application level support, since an INSERT goes
to a
 different place than a SELECT. But there has been some discussion about
 pg_pool being able to spread the query load, and having it be aware of
the
 difference between a SELECT and an INSERT and have it route the query to
the
 correct host. The biggest problem being that functions could cause a
SELECT
 func() to actually insert a row, which pg_pool wouldn't know about. There
 are 2 possible solutions, a) don't do that when you are using this
system,
 b) add some sort of comment hint so that pg_pool can understand that the
 select is actually an INSERT, and needs to be done on the master.

 
  So, when/is PG meant to be getting a decent partitioning system?
  MySQL is getting one (eventually) which is apparently meant to be
  similiar to Oracle's according to the docs. Clusgres does not appear
  to be widely/or at all used, and info on it seems pretty thin on the
  ground, so I am not too keen on going with that. Is the real solution
  to multi- machine partitioning (as in, not like MySQLs MERGE tables)
  on  PostgreSQL actually doing it in our application API? This seems
  like  a less than perfect solution once we want to add redundancy and
  things into the mix.

 There is also PGCluster
 http://pgfoundry.org/projects/pgcluster/

 Which is trying to be more of a Synchronous multi-master system. I
haven't
 heard of Clusgres, so I'm guessing it is an older attempt, which has been
 overtaken by pgcluster.

 Just realize that clusters don't 

[PERFORM] Disk Edge Partitioning

2005-04-22 Thread Richard_D_Levine
I saw an interesting thought in another thread about placing database data
in a partition that uses cylinders at the outer edge of the disk.  I want
to try this.  Are the lower number cylinders closer to the edge of a SCSI
disk or is it the other way around?  What about ATA?

Cheers,

Rick


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to improve db performance with $7K?

2005-04-19 Thread Richard_D_Levine


[EMAIL PROTECTED] wrote on 04/19/2005 11:10:22 AM:

 What is 'multiple initiators' used for in the real world?

I asked this same question and got an answer off list:  Somebody said their
SAN hardware used multiple initiators.  I would try to check the archives
for you, but this thread is becoming more of a rope.

Multiple initiators means multiple sources on the bus issuing I/O
instructions to the drives.  In theory you can have two computers on the
same SCSI bus issuing I/O requests to the same drive, or to anything else
on the bus, but I've never seen this implemented.  Others have noted this
feature as being a big deal, so somebody is benefiting from it.

Rick

 --
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania
19073

 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Richard_D_Levine
Dave wrote An interesting test would be to stick several drives in a
cabinet and
graph how performance is affected at the different price points/
technologies/number of drives.

From the discussion on the $7k server thread, it seems the RAID controller
would
be an important data point also.  And RAID level.  And application
load/kind.

Hmmm.  I just talked myself out of it.  Seems like I'd end up with
something
akin to those database benchmarks we all love to hate.

Rick

[EMAIL PROTECTED] wrote on 04/15/2005 08:40:13 AM:

  -Original Message-
  From: Alex Turner [mailto:[EMAIL PROTECTED]
  Sent: Thursday, April 14, 2005 6:15 PM
  To: Dave Held
  Cc: pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
 
  Looking at the numbers, the raptor with TCQ enabled was close or
  beat the Atlas III 10k drive on most benchmarks.

 And I would be willing to bet that the Atlas 10k is not using the
 same generation of technology as the Raptors.

  Naturaly a 15k drive is going to be faster in many areas, but it
  is also much more expensive.  It was only 44% better on the server
  tests than the raptor with TCQ, but it costs nearly 300% more ($538
  cdw.com, $180 newegg.com).

 State that in terms of cars.  Would you be willing to pay 300% more
 for a car that is 44% faster than your competitor's?  Of course you
 would, because we all recognize that the cost of speed/performance
 does not scale linearly.  Naturally, you buy the best speed that you
 can afford, but when it comes to hard drives, the only major feature
 whose price tends to scale anywhere close to linearly is capacity.

  Note also that the 15k drive was the only drive that kept up with
  the raptor on raw transfer speed, which is going to matter for WAL.

 So get a Raptor for your WAL partition. ;)

  [...]
  The Raptor drives can be had for as little as $180/ea, which is
  quite a good price point considering they can keep up with their
  SCSI 10k RPM counterparts on almost all tests with NCQ enabled
  (Note that 3ware controllers _don't_ support NCQ, although they
  claim their HBA based queueing is 95% as good as NCQ on the drive).

 Just keep in mind the points made by the Seagate article.  You're
 buying much more than just performance for that $500+.  You're also
 buying vibrational tolerance, high MTBF, better internal
 environmental controls, and a pretty significant margin on seek time,
 which is probably your most important feature for disks storing tables.
 An interesting test would be to stick several drives in a cabinet and
 graph how performance is affected at the different price points/
 technologies/number of drives.

 __
 David B. Held
 Software Engineer/Array Services Group
 200 14th Ave. East,  Sartell, MN 56377
 320.534.3637 320.253.7800 800.752.8129

 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Richard_D_Levine
This is a different thread that the $7k server thread.
Greg Stark started it and wrote:

 I'm also wondering about whether I'm better off with one of these 
 SATA raid  
 controllers or just going with SCSI drives.   



Rick

[EMAIL PROTECTED] wrote on 04/15/2005 10:01:56 AM:

 The original thread was how much can I get for $7k

 You can't fit a 15k RPM SCSI solution into $7K ;)  Some of us are ona
budget!

 10k RPM SATA drives give acceptable performance at a good price, thats
 really the point here.

 I have never really argued that SATA is going to match SCSI
 performance on multidrive arrays for IO/sec.  But it's all about the
 benjamins baby.  If I told my boss we need $25k for a database
 machine, he'd tell me that was impossible, and I have $5k to do it.
 If I tell him $7k - he will swallow that.  We don't _need_ the amazing
 performance of a 15k RPM drive config.  Our biggest hit is reads, so
 we can buy 3xSATA machines and load balance.  It's all about the
 application, and buying what is appropriate.  I don't buy a Corvette
 if all I need is a malibu.

 Alex Turner
 netEconomist

 On 4/15/05, Dave Held [EMAIL PROTECTED] wrote:
   -Original Message-
   From: Alex Turner [mailto:[EMAIL PROTECTED]
   Sent: Thursday, April 14, 2005 6:15 PM
   To: Dave Held
   Cc: pgsql-performance@postgresql.org
   Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
  
   Looking at the numbers, the raptor with TCQ enabled was close or
   beat the Atlas III 10k drive on most benchmarks.
 
  And I would be willing to bet that the Atlas 10k is not using the
  same generation of technology as the Raptors.
 
   Naturaly a 15k drive is going to be faster in many areas, but it
   is also much more expensive.  It was only 44% better on the server
   tests than the raptor with TCQ, but it costs nearly 300% more ($538
   cdw.com, $180 newegg.com).
 
  State that in terms of cars.  Would you be willing to pay 300% more
  for a car that is 44% faster than your competitor's?  Of course you
  would, because we all recognize that the cost of speed/performance
  does not scale linearly.  Naturally, you buy the best speed that you
  can afford, but when it comes to hard drives, the only major feature
  whose price tends to scale anywhere close to linearly is capacity.
 
   Note also that the 15k drive was the only drive that kept up with
   the raptor on raw transfer speed, which is going to matter for WAL.
 
  So get a Raptor for your WAL partition. ;)
 
   [...]
   The Raptor drives can be had for as little as $180/ea, which is
   quite a good price point considering they can keep up with their
   SCSI 10k RPM counterparts on almost all tests with NCQ enabled
   (Note that 3ware controllers _don't_ support NCQ, although they
   claim their HBA based queueing is 95% as good as NCQ on the drive).
 
  Just keep in mind the points made by the Seagate article.  You're
  buying much more than just performance for that $500+.  You're also
  buying vibrational tolerance, high MTBF, better internal
  environmental controls, and a pretty significant margin on seek time,
  which is probably your most important feature for disks storing tables.
  An interesting test would be to stick several drives in a cabinet and
  graph how performance is affected at the different price points/
  technologies/number of drives.
 
  __
  David B. Held
  Software Engineer/Array Services Group
  200 14th Ave. East,  Sartell, MN 56377
  320.534.3637 320.253.7800 800.752.8129
 
  ---(end of
broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 

 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Richard_D_Levine
Greg,

I posted this link under a different thread (the $7k server thread).  It is
a very good read on why SCSI is better for servers than ATA.  I didn't note
bias, though it is from a drive manufacturer.  YMMV.  There is an
interesting, though dated appendix on different manufacturers' drive
characteristics.

http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

Enjoy,

Rick

[EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM:


 Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller
 instead of the 3ware one.

 Poking around it seems this does come with Linux drivers and there is a
 battery backup option. So it doesn't seem to be completely insane.

 Anyone have any experience with these controllers?

 I'm also wondering about whether I'm better off with one of these SATA
raid
 controllers or just going with SCSI drives.

 --
 greg


 ---(end of broadcast)---
 TIP 8: explain analyze is your friend


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Richard_D_Levine
Nice research Alex.

Your data strongly support the information in the paper.  Your SCSI drives
blew away the others in all of the server benchmarks.  They're only
marginally better in desktop use.

I do find it somewhat amazing that a 15K SCSI 320 drive isn't going to help
me play Unreal Tournament much faster.  That's okay.  I suck at it anyway.
My kid has never lost to me.  She enjoys seeing daddy as a bloody smear and
bouncing body parts anyway.  It promotes togetherness.

Here's a quote from the paper:

[SCSI] interfaces support multiple initiators or hosts. The
drive must keep track of separate sets of information for each
host to which it is attached, e.g., maintaining the processor
pointer sets for multiple initiators and tagged commands.
The capability of SCSI/FC to efficiently process commands
and tasks in parallel has also resulted in a higher overhead
kernel structure for the firmware.

Has anyone ever seen a system with multiple hosts or initiators on a SCSI
bus?  Seems like it would be a very cool thing in an SMP architecture, but
I've not seen an example implemented.

Rick

Alex Turner [EMAIL PROTECTED] wrote on 04/14/2005 12:13:41 PM:

 I have put together a little head to head performance of a 15k SCSI,
 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive
 comparison at storage review

 http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?

typeID=10testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=40devID_2=259devID_3=267devID_4=261devID_5=248devCnt=6


 It does illustrate some of the weaknesses of SATA drives, but all in
 all the Raptor drives put on a good show.

 Alex Turner
 netEconomist

 On 4/14/05, Alex Turner [EMAIL PROTECTED] wrote:
  I have read a large chunk of this, and I would highly recommend it to
  anyone who has been participating in the drive discussions.  It is
  most informative!!
 
  Alex Turner
  netEconomist
 
  On 4/14/05, [EMAIL PROTECTED]
 [EMAIL PROTECTED] wrote:
   Greg,
  
   I posted this link under a different thread (the $7k server
 thread).  It is
   a very good read on why SCSI is better for servers than ATA.  I
 didn't note
   bias, though it is from a drive manufacturer.  YMMV.  There is an
   interesting, though dated appendix on different manufacturers' drive
   characteristics.
  
   http://www.seagate.

com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

  
   Enjoy,
  
   Rick
  
   [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45
AM:
  
   
Our vendor is trying to sell us on an Intel SRCS16 SATA raid
controller
instead of the 3ware one.
   
Poking around it seems this does come with Linux drivers and there
is a
battery backup option. So it doesn't seem to be completely insane.
   
Anyone have any experience with these controllers?
   
I'm also wondering about whether I'm better off with one of these
SATA
   raid
controllers or just going with SCSI drives.
   
--
greg
   
   
---(end of
broadcast)---
TIP 8: explain analyze is your friend
  
   ---(end of
broadcast)---
   TIP 9: the planner will ignore your desire to choose an index scan if
your
 joining column's datatypes do not match
  
 
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Richard_D_Levine
Another simple question: Why is SCSI more expensive?  After the
eleventy-millionth controller is made, it seems like SCSI and SATA are
using a controller board and a spinning disk.  Is somebody still making
money by licensing SCSI technology?

Rick

[EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:

 You asked for it!  ;-)

 If you want cheap, get SATA.  If you want fast under
 *load* conditions, get SCSI.  Everything else at this
 time is marketing hype, either intentional or learned.
 Ignoring dollars, expect to see SCSI beat SATA by 40%.

  * * * What I tell you three times is true * * *

 Also, compare the warranty you get with any SATA
 drive with any SCSI drive.  Yes, you still have some
 change leftover to buy more SATA drives when they
 fail, but... it fundamentally comes down to some
 actual implementation and not what is printed on
 the cardboard box.  Disk systems are bound by the
 rules of queueing theory.  You can hit the sales rep
 over the head with your queueing theory book.

 Ultra320 SCSI is king of the hill for high concurrency
 databases.  If you're only streaming or serving files,
 save some money and get a bunch of SATA drives.
 But if you're reading/writing all over the disk, the
 simple first-come-first-serve SATA heuristic will
 hose your performance under load conditions.

 Next year, they will *try* bring out some SATA cards
 that improve on first-come-first-serve, but they ain't
 here now.  There are a lot of rigged performance tests
 out there...  Maybe by the time they fix the queueing
 problems, serial Attached SCSI (a/k/a SAS) will be out.
 Looks like Ultra320 is the end of the line for parallel
 SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
 water.

 Ultra320 SCSI.
 Ultra320 SCSI.
 Ultra320 SCSI.

 Serial Attached SCSI.
 Serial Attached SCSI.
 Serial Attached SCSI.

 For future trends, see:
 http://www.incits.org/archive/2003/in031163/in031163.htm

 douglas

 p.s. For extra credit, try comparing SATA and SCSI drives
 when they're 90% full.

 On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:

  I guess I'm setting myself up here, and I'm really not being ignorant,
  but can someone explain exactly how is SCSI is supposed to better than
  SATA?
 
  Both systems use drives with platters.  Each drive can physically only
  read one thing at a time.
 
  SATA gives each drive it's own channel, but you have to share in SCSI.
   A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
  SCSI can only do 320MB/sec across the entire array.
 
  What am I missing here?
 
  Alex Turner
  netEconomist


 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if
your
   joining column's datatypes do not match


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Richard_D_Levine
Yep, that's it, as well as increased quality control.  I found this from
Seagate:

http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

With this quote (note that ES stands for Enterprise System and PS stands
for Personal System):

There is significantly more silicon on ES products. The following
comparison comes from a study done in 2000:
 the ES ASIC gate count is more than 2x a PS drive,
 the embedded SRAM space for program code is 2x,
 the permanent flash memory for program code is 2x,
 data SRAM and cache SRAM space is more than 10x.
The complexity of the SCSI/FC interface compared to the
IDE/ATA interface shows up here due in part to the more
complex system architectures in which ES drives find themselves.
ES interfaces support multiple initiators or hosts. The
drive must keep track of separate sets of information for each
host to which it is attached, e.g., maintaining the processor
pointer sets for multiple initiators and tagged commands.
The capability of SCSI/FC to efficiently process commands
and tasks in parallel has also resulted in a higher overhead
kernel structure for the firmware. All of these complexities
and an overall richer command set result in the need for a
more expensive PCB to carry the electronics.

Rick

Alex Turner [EMAIL PROTECTED] wrote on 04/07/2005 10:46:31 AM:

 Based on the reading I'm doing, and somebody please correct me if I'm
 wrong, it seems that SCSI drives contain an on disk controller that
 has to process the tagged queue.  SATA-I doesn't have this.  This
 additional controller, is basicaly an on board computer that figures
 out the best order in which to process commands.  I believe you are
 also paying for the increased tolerance that generates a better speed.
  If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM
 to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one
 represents a step up in parts and technology, thereby generating a
 cost increase (at least thats what the manufactures tell us).  I know
 if you ever held a 15k drive in your hand, you can notice a
 considerable weight difference between it and a 7200RPM IDE drive.

 Alex Turner
 netEconomist

 On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED]
 [EMAIL PROTECTED] wrote:
  Another simple question: Why is SCSI more expensive?  After the
  eleventy-millionth controller is made, it seems like SCSI and SATA are
  using a controller board and a spinning disk.  Is somebody still making
  money by licensing SCSI technology?
 
  Rick
 
  [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:
 
   You asked for it!  ;-)
  
   If you want cheap, get SATA.  If you want fast under
   *load* conditions, get SCSI.  Everything else at this
   time is marketing hype, either intentional or learned.
   Ignoring dollars, expect to see SCSI beat SATA by 40%.
  
* * * What I tell you three times is true * * *
  
   Also, compare the warranty you get with any SATA
   drive with any SCSI drive.  Yes, you still have some
   change leftover to buy more SATA drives when they
   fail, but... it fundamentally comes down to some
   actual implementation and not what is printed on
   the cardboard box.  Disk systems are bound by the
   rules of queueing theory.  You can hit the sales rep
   over the head with your queueing theory book.
  
   Ultra320 SCSI is king of the hill for high concurrency
   databases.  If you're only streaming or serving files,
   save some money and get a bunch of SATA drives.
   But if you're reading/writing all over the disk, the
   simple first-come-first-serve SATA heuristic will
   hose your performance under load conditions.
  
   Next year, they will *try* bring out some SATA cards
   that improve on first-come-first-serve, but they ain't
   here now.  There are a lot of rigged performance tests
   out there...  Maybe by the time they fix the queueing
   problems, serial Attached SCSI (a/k/a SAS) will be out.
   Looks like Ultra320 is the end of the line for parallel
   SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
   water.
  
   Ultra320 SCSI.
   Ultra320 SCSI.
   Ultra320 SCSI.
  
   Serial Attached SCSI.
   Serial Attached SCSI.
   Serial Attached SCSI.
  
   For future trends, see:
   http://www.incits.org/archive/2003/in031163/in031163.htm
  
   douglas
  
   p.s. For extra credit, try comparing SATA and SCSI drives
   when they're 90% full.
  
   On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:
  
I guess I'm setting myself up here, and I'm really not being
ignorant,
but can someone explain exactly how is SCSI is supposed to better
than
SATA?
   
Both systems use drives with platters.  Each drive can physically
only
read one thing at a time.
   
SATA gives each drive it's own channel, but you have to share in
SCSI.
 A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive,
but
SCSI can only do 320MB/sec across the entire array.
   
What am I missing here?
   
Alex 

Re: [PERFORM] Reading recommendations

2005-03-31 Thread Richard_D_Levine


Steve Wampler [EMAIL PROTECTED] wrote on 03/30/2005 03:58:12 PM:

 [EMAIL PROTECTED] wrote:

 Mohan, Ross wrote:
 
 VOIP over BitTorrent?
 
 Now *that* I want to see.  Aught to be at least as interesting
 as the TCP/IP over carrier pigeon experiment - and more
 challenging to boot!
 
 
 
  It was very challenging.  I worked on the credit window sizing and
  retransmission timer estimation algorithms.  We took into account
weather
  patterns, size and age of the bird, feeding times, and the average
number
  of times a bird circles before determining magnetic north.
Interestingly,
  packet size had little effect in the final algorithms.
 
  I would love to share them with all of you, but they're classified.

 Ah, but VOIPOBT requires many people all saying the same thing at the
 same time.  The synchronization alone (since you need to distribute
 these people adequately to avoid overloading a trunk line...) is probably
 sufficiently hard to make it interesting.  Then there are the problems of
 different accents, dilects, and languages ;)

Interestingly, we had a follow on contract to investigate routing
optimization using flooding techniques.  Oddly, it was commissioned by a
consortium of local car washes.  Work stopped when the park service sued us
for the cost of cleaning all the statuary, and the company went out of
business.  We were serving cornish game hens at our frequent dinner
parties for months.


 --
 Steve Wampler -- [EMAIL PROTECTED]
 The gods that smiled on your birth are now laughing out loud.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Reading recommendations

2005-03-31 Thread Richard_D_Levine


[EMAIL PROTECTED] wrote on 03/31/2005 10:48:09 AM:

 Stefan Weiss wrote:
  On 2005-03-31 15:19, [EMAIL PROTECTED] wrote:
 
 Now *that* I want to see.  Aught to be at least as interesting
 as the TCP/IP over carrier pigeon experiment - and more
 challenging to boot!
 
  ..
 
 Interestingly, we had a follow on contract to investigate routing
 optimization using flooding techniques.  Oddly, it was commissioned by
a
 consortium of local car washes.  Work stopped when the park service
sued us
 for the cost of cleaning all the statuary, and the company went out of
 business.  We were serving cornish game hens at our frequent dinner
 parties for months.
 
 
  This method might have been safer (and it works great with Apaches):
  http://eagle.auc.ca/~dreid/

 Aha - VOIPOBD as well as VOIPOBT!  What more can one want?

 VOIPOCP, I suppose...

Start collecting recipes for small game birds now.  We ran out pretty
quickly.  Finally came up with Pigeon Helper and sold it to homeless
shelters in New York.  Sales were slow until we added a wine sauce.



 --
 Steve Wampler -- [EMAIL PROTECTED]
 The gods that smiled on your birth are now laughing out loud.

 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Richard_D_Levine


[EMAIL PROTECTED] wrote on 03/30/2005 10:58:21 AM:


 Allow telecommute from across the pond and I might be interested :-)

Please post phone bills to this list.


 --
 Michael Fuhr
 http://www.fuhr.org/~mfuhr/

 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Richard_D_Levine
It was very challenging.  I worked on the credit window sizing and
retransmission timer estimation algorithms.  We took into account weather
patterns, size and age of the bird, feeding times, and the average number
of times a bird circles before determining magnetic north.  Interestingly,
packet size had little effect in the final algorithms.

[EMAIL PROTECTED] wrote on 03/30/2005 11:52:13 AM:

 Mohan, Ross wrote:
  VOIP over BitTorrent?

 Now *that* I want to see.  Aught to be at least as interesting
 as the TCP/IP over carrier pigeon experiment - and more
 challenging to boot!


It was very challenging.  I worked on the credit window sizing and
retransmission timer estimation algorithms.  We took into account weather
patterns, size and age of the bird, feeding times, and the average number
of times a bird circles before determining magnetic north.  Interestingly,
packet size had little effect in the final algorithms.

I would love to share them with all of you, but they're classified.


 --
 Steve Wampler -- [EMAIL PROTECTED]
 The gods that smiled on your birth are now laughing out loud.

 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Questions about 2 databases.

2005-03-11 Thread Richard_D_Levine
 this seems
 like a dead waste of effort :-(.  The work to put the data into the main
 database isn't lessened at all; you've just added extra work to manage
 the buffer database.

True from the view point of the server, but not from the throughput in the
client session (client viewpoint).  The client will have a blazingly fast
session with the buffer database.  I'm assuming the buffer database table
size is zero or very small.  Constraints will be a problem if there are
PKs, FKs that need satisfied on the server that are not adequately testable
in the buffer.  Might not be a problem if the full table fits on the RAM
disk, but you still have to worry about two clients inserting the same PK.

Rick



 
  Tom Lane  
 
  [EMAIL PROTECTED]To:   [EMAIL 
PROTECTED]
  Sent by:   cc:   
pgsql-performance@postgresql.org  
  [EMAIL PROTECTED]Subject:  Re: [PERFORM] 
Questions about 2 databases.
  tgresql.org   
 

 

 
  03/11/2005 03:33 PM   
 

 

 




jelle [EMAIL PROTECTED] writes:
 1) on a single 7.4.6 postgres instance does each database have it own WAL
 file or is that shared? Is it the same on 8.0.x?

Shared.

 2) what's the high performance way of moving 200 rows between similar
 tables on different databases? Does it matter if the databases are
 on the same or seperate postgres instances?

COPY would be my recommendation.  For a no-programming-effort solution
you could just pipe the output of pg_dump --data-only -t mytable
into psql.  Not sure if it's worth developing a custom application to
replace that.

 My web app does lots of inserts that aren't read until a session is
 complete. The plan is to put the heavy insert session onto a ramdisk
based
 pg-db and transfer the relevant data to the master pg-db upon session
 completion. Currently running 7.4.6.

Unless you have a large proportion of sessions that are abandoned and
hence never need be transferred to the main database at all, this seems
like a dead waste of effort :-(.  The work to put the data into the main
database isn't lessened at all; you've just added extra work to manage
the buffer database.

 regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Richard_D_Levine

I think maybe a SAN in conjunction with tablespaces might be the answer.
Still need one honking server.

Rick



  
  Stephen Frost 
  
  [EMAIL PROTECTED]   To:   Christopher 
Kings-Lynne [EMAIL PROTECTED]  
  Sent by:   cc:   Hervé 
Piedvache [EMAIL PROTECTED], pgsql-performance@postgresql.org
  [EMAIL PROTECTED]Subject:  Re: [PERFORM] 
PostgreSQL clustering VS MySQL clustering
  tgresql.org   
  

  

  
  01/20/2005 10:08 AM   
  

  

  




* Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote:
 PostgreSQL has replication, but not partitioning (which is what you
want).

It doesn't have multi-server partitioning..  It's got partitioning
within a single server (doesn't it?  I thought it did, I know it was
discussed w/ the guy from Cox Communications and I thought he was using
it :).

 So, your only option is Oracle or another very expensive commercial
 database.

Or partition the data at the application layer.

 Stephen
(See attached file: signature.asc)


signature.asc
Description: Binary data

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-11 Thread Richard_D_Levine
Jim wrote: you'd be hard-pressed to find too many real-world examples where
you could do
something with a PostgreSQL procedural language that you couldn't do
with PL/SQL.

Rick mumbled: You can't get it for nothing! %)



 
  Jim C. Nasby
 
  [EMAIL PROTECTED]  To:   [EMAIL 
PROTECTED] 
  Sent by:   cc:   Frank Wiles 
[EMAIL PROTECTED], Yann Michel [EMAIL PROTECTED],   
  [EMAIL PROTECTED] [EMAIL PROTECTED], 
pgsql-performance@postgresql.org 
  tgresql.orgSubject:  Re: 
[PERFORM] PostgreSQL vs. Oracle vs. Microsoft 

 

 
  01/10/2005 06:29 PM   
 

 

 




On Mon, Jan 10, 2005 at 12:46:01PM -0500, Alex Turner wrote:
 You sir are correct!  You can't use perl in MS-SQL or Oracle ;).

On the other hand, PL/SQL is incredibly powerful, especially combined
with all the tools/utilities that come with Oracle. I think you'd be
hard-pressed to find too many real-world examples where you could do
something with a PostgreSQL procedural language that you couldn't do
with PL/SQL.
--
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Low Performance for big hospital server ..

2005-01-06 Thread Richard_D_Levine
In my younger days I denormalized a database for performance reasons and
have been paid for it dearly with increased maintenance costs.  Adding
enhanced capabilities and new functionality will render denormalization
worse than useless quickly.  --Rick



 
  Frank Wiles   
 
  [EMAIL PROTECTED]  To:   Josh 
Berkus josh@agliodbs.com   
  Sent by:   cc:   
pgsql-performance@postgresql.org  
  [EMAIL PROTECTED]Subject:  Re: [PERFORM] Low 
Performance for big hospital server ..  
  tgresql.org   
 

 

 
  01/06/2005 12:12 PM   
 

 

 




On Thu, 6 Jan 2005 09:06:55 -0800
Josh Berkus josh@agliodbs.com wrote:

 I can't tell you how many times I've seen this sort of thing.   And
 the developers always tell me Well, we denormalized for performance
 reasons ... 

  Now that's rich.  I don't think I've ever seen a database perform
  worse after it was normalized.  In fact, I can't even think of a
  situation where it could!

 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query Performance and IOWait

2004-11-18 Thread Richard_D_Levine
Andrew,

It seems that you could combine the subquery's WHERE clause with the main
query's to produce a simpler query, i.e. one without a subquery.

Rick





 
  Andrew Janian   
 
  [EMAIL PROTECTED]To:   
[EMAIL PROTECTED]
  Sent by:   cc:
 
  [EMAIL PROTECTED]Subject:  [PERFORM] Query 
Performance and IOWait
  tgresql.org   
 

 

 
  11/18/2004 08:42 AM   
 

 

 




Hello All,

I have a setup with a Dell Poweredge 2650 with Red Hat and Postgres 7.4.5
with a database with about 27GB of data.  The table in question has about
35 million rows.

I am running the following query:

SELECT *
FROM mb_fix_message
WHERE msg_client_order_id IN (
 SELECT msg_client_order_id
 FROM mb_fix_message
 WHERE msg_log_time = '2004-06-01'
 AND msg_log_time  '2004-06-01 13:30:00.000'
 AND msg_message_type IN ('D','G')
 AND mb_ord_type = '1'
 )
 AND msg_log_time  '2004-06-01'
 AND msg_log_time  '2004-06-01 23:59:59.999'
 AND msg_message_type = '8'
 AND (mb_raw_text LIKE '%39=1%' OR mb_raw_text LIKE '%39=2%');

with the following plan:

QUERY PLAN
Nested Loop IN Join  (cost=0.00..34047.29 rows=1 width=526)
  -  Index Scan using mfi_log_time on mb_fix_message  (cost=0.00..22231.31
rows=2539 width=526)
   Index Cond: ((msg_log_time  '2004-06-01 00:00:00'::timestamp
without time zone) AND (msg_log_time  '2004-06-01 23:59:59.999'::timestamp
without time zone))
   Filter: (((msg_message_type)::text = '8'::text) AND
(((mb_raw_text)::text ~~ '%39=1%'::text) OR ((mb_raw_text)::text ~~
'%39=2%'::text)))
  -  Index Scan using mfi_client_ordid on mb_fix_message
(cost=0.00..445.56 rows=1 width=18)
   Index Cond: ((outer.msg_client_order_id)::text =
(mb_fix_message.msg_client_order_id)::text)
   Filter: ((msg_log_time = '2004-06-01 00:00:00'::timestamp without
time zone) AND (msg_log_time  '2004-06-01 13:30:00'::timestamp without
time zone) AND ((msg_message_type)::text = 'D'::text) OR
((msg_message_type)::text = 'G'::text)) AND ((mb_ord_type)::text =
'1'::text))

While running, this query produces 100% iowait usage on its processor and
takes a ungodly amount of time (about an hour).

The postgres settings are as follows:

shared_buffers = 32768  # min 16, at least max_connections*2, 8KB
each
sort_mem = 262144   # min 64, size in KB

And the /etc/sysctl.conf has:
kernel.shmall = 274235392
kernel.shmmax = 274235392

The system has 4GB of RAM.

I am pretty sure of these settings, but only from my reading of the docs
and others' recommendations online.

Thanks,

Andrew Janian
OMS Development
Scottrade Financial Services
(314) 965-1555 x 1513
Cell: (314) 369-2083

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Does PostgreSQL run with Oracle?

2004-10-15 Thread Richard_D_Levine
My basic question to the community is is PostgreSQL approximately as fast
as Oracle?

I don't want benchmarks, they're BS.  I want a gut feel from this community
because I know many of you are in mixed shops that run both products, or
have had experience with both.

I fully intend to tune, vacuum, analyze, size buffers, etc.  I've read what
people have written on the topic, and from that my gut feel is that using
PostgreSQL will not adversely affect performance of my application versus
Oracle.  I know it won't adversely affect my pocket book.  I also know that
requests for help will be quick, clear, and multifaceted.

I'm currently running single processor UltraSPARC workstations, and intend
to use Intel Arch laptops and Linux.  The application is a big turnkey
workstation app.  I know the hardware switch alone will enhance
performance, and may do so to the point where even a slower database will
still be adequate.

Whadyall think?

Thanks,

Rick



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]