Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread PFC

It would also be reasonable to allow clustering individual partitions;
creating table or column constraints on some partitions and not others;
I have a session mamagement which works like that, using views now.
	sessions.online is a table of the online sessions. It has a UNIQUE on  
user_id.
	sessions.archive contains all the closed sessions. Obviously it does not  
have a UNIQUE on user_id.



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> So I think Phase I should look like:

>   An ALTER TABLE command to make an inherited table "abstract" in the object
>   oriented sense. That is, no records can be inserted in the parent table. If
>   you follow the oracle model this is also where you specify the partition
>   key. There's no index associated with this partition key though.

Check.

>   A command to create a new partition, essentially syntactic sugar for a
>   CREATE TABLE with an implied INHERITS clause and a constraint on the
>   partition key. If you follow the oracle model then you explicitly specify
>   which range or specific value of the partition key this partition holds.

Check.

>   A command to remove a partition from the partitioned table and turn it into
>   a regular table.

Ugh.  Why?  You can access the table directly anyway.

>   A command to take a regular table and turn it into a partition.

Double ugh.  Verifying that the table matches the partition scheme seems
like a lot of ugly, bug-prone, unnecessary code.  What's the use case
for this anyway?

Those last two are *certainly* not Phase I requirements, and I don't
think we need them at all ever.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Stacy White
From: "Tom Lane" <[EMAIL PROTECTED]>
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > We probably also need multi-table indexes.
> As Josh says, that seems antithetical to the main point of partitioning,
> which is to be able to rapidly remove (and add) partitions of a table.
> If you have to do index cleaning before you can drop a partition, what's
> the point of partitioning?

Global indexes (as opposed to partition local indexes) are useful in cases
where you have a large number of partitions, index columns different than
the partition key, and index values that limit the query to just a subset of
the partitions.

The two domains that I'm most familiar with are warehouse management, and
the film industry. In both these cases it's logical to partition on
day/week/month, it's frequently important to keep a lot of history, and it's
common to have products that only show activity for a few months.  In one of
our production systems we have 800 partitions (by week, with a lot of
history), but a popular product might have only 20 weeks worth of activity.
Selecting records for the product requires at least 800 random-access reads
if you have local indexes on 'product_no', 780 of which just tell the
executor that the partition doesn't include any information on the product.

This is definitely a phase II item, but as I said before it's worth
considering since good DBAs can do a lot with global indexes.

FWIW, we see large benefits from partitioning other than the ability to
easily drop data, for example:

- We can vacuum only the active portions of a table
- Postgres automatically keeps related records clustered together on disk,
which makes it more likely that the blocks used by common queries can be
found in cache
- The query engine uses full table scans on the relevant sections of data,
and quickly skips over the irrelevant sections
- 'CLUSTER'ing a single partition is likely to be significantly more
performant than clustering a large table

In fact, we have yet to drop a partition on any of our Oracle or Postgres
production systems.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Josh Berkus
Alvaro, Greg, Tom,

> Hmm.  You are right, but without that we won't be able to enforce
> uniqueness on the partitioned table (we could only enforce it on each
> partition, which would mean we can't partition on anything else than
> primary keys if the tables have one).  IMHO this is something to
> consider.

Sure.  However, for most partitioned use cases, the partition column will be 
part of the real key of the table (for example, for a security log, the real 
key might be (timestamp, machine, application, event_type) with the partition 
on extract(hour from timestamp)).   As a result, there is no need to enforce 
inter-partition uniqueness; the paritioning scheme enforces it already.

The only need for inter-partition uniqueness is on surrogate integer keys.  
This can already be enforced de-facto simply by using a sequence.  While it 
would be possible to create a uniqueness check that spans partitions, it 
would be very expensive to do so, thus elminating some of the advantage of 
partitioning in the first place.   I'm not saying that we won't want this 
some day as an option, I just see it as a Phase III refinement.

Greg, first of all, thanks for helping clean up my muddy thinking about 
implementing partitions.  Comments below:

> Phase I should be an entirely manual system where you add and remove
> partitions manually and create and drop indexes you want manually. You need
> these low level interfaces anyways for a complete system, it doesn't make
> sense to have everything automatic and then later try to wedge in a low
> level interface. Only once you have that do you then start offering options
> to do these things automatically.

This makes sense.  Thanks!

> whole system make a lot more sense: individual partitions are really
> tables. The partitioned tables themselves are just meta-objects like views.

So, like the current pseudo-partitioning implementation, partitions would be 
"full tables" just with some special rules for query-rewriting when they are 
pulled.  This makes sense, I think I just got carried away in another 
direction.

> It's also clear that the last thing you want is an index on the partition
> key. A big part of the advantage of partitioned tables is precisely that
> you get the advantage of an index on a column without the extra expense.

Well, you need it with the current pseudo-partitioning.   What would allow us 
to eliminate indexing the partition key is special re-writing rules that only 
pull the partitions compliant with the outer query.  Until that step takes 
place, the indexes are very much needed.  So maybe the advanced planner 
rewriting is a Phase I item, not a Phase II item?

> The $64 question is how to specify the partitioning rules. That is, the
> rule for determining which partition an insert should go into and which
> partitions to look for records in. Oracle handles this by specifying a list
> of columns when creating the partitioned table and then specifying either a
> range or specific values for each individual partition. I can imagine other
> approaches but none that allow for the planner and optimizer to take as
> much advantage of the information.

Well, I would think that specifying an expression that defines a new partition 
at each change in value (like EXTRACT(day FROM timestamp) on a time-based 
partitioning) would cover 90% of implemenations and be a lot simpler to 
administer.   The Oracle approach has the advantage of allowing "custom 
paritioning" at the expense of greater complexity.

>   A command to remove a partition from the partitioned table and turn it
> into a regular table.
>
>   A command to take a regular table and turn it into a partition. Again
> here you specify the range or value of the partition key. There has to be
> some verification that the table really holds the correct data though.
> Perhaps this could be skipped by providing a table with a properly
> constructed constraint in place.

Like Tom, I don't see the point in these.  What do they do that CREATE TABLE 
AS and/or INSERT INTO do not?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Oleg Bartunov
On Sun, 20 Mar 2005, Josh Berkus wrote:

whole system make a lot more sense: individual partitions are really
tables. The partitioned tables themselves are just meta-objects like views.
If partition is a table, so I could define different indices for them ?
In our prototype of scaled full text search we create another index
which is optimized for "archived" (not changed) data - it's sort of
standard inverted index which is proven to be scaled, while tsearch2's index
is good for "online" data. All interfaces ( dictionaries, parsers, ranking)
are the same, so it's possible to combine search results.
This is rather easy to implement using table inheritance, but I'd like
to do this with partitioning
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> >   A command to remove a partition from the partitioned table and turn it 
> > into
> >   a regular table.
> 
> Ugh.  Why?  You can access the table directly anyway.
> 
> >   A command to take a regular table and turn it into a partition.
> 
> Double ugh.  Verifying that the table matches the partition scheme seems
> like a lot of ugly, bug-prone, unnecessary code.  What's the use case
> for this anyway?
> 
> Those last two are *certainly* not Phase I requirements, and I don't
> think we need them at all ever.

These are effectively equivalent to "ALTER TABLE RENAME". Without these
commands you would be in pretty much the same position as a DBA without the
ability to rename tables.

The heart of partitioned tables is being able to load and unload entire
partitions quickly. You have to have somewhere to "unload" them too. Most
people aren't happy just watching their data disappear entirely. They want to
move them other tables or even other databases. 

Similarly, they have to have somewhere to load them from. They're usually not
happy loading data directly into their production data warehouse tables
without manipulating the data, or doing things like clustering or indexing.

You could argue for some sort of setup where you could take a partition
"offline" during which you could safely do things like export or manipulate
the data. But that's awfully limiting. What if I want to do things like add
columns, or change data types, or any other manipulation that breaks the
symmetry with the production partitioned table.

I don't think it's really hard at all to check that the table matches the
partition scheme. You can just require that there be an existing table
constraint in place that matches the partitioning scheme. I think you can even
be fascist about the exact syntax of the constraint fitting precisely a
specified format.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Tom Lane
"Stacy White" <[EMAIL PROTECTED]> writes:
> FWIW, we see large benefits from partitioning other than the ability to
> easily drop data, for example:

> - We can vacuum only the active portions of a table
> - Postgres automatically keeps related records clustered together on disk,
> which makes it more likely that the blocks used by common queries can be
> found in cache
> - The query engine uses full table scans on the relevant sections of data,
> and quickly skips over the irrelevant sections
> - 'CLUSTER'ing a single partition is likely to be significantly more
> performant than clustering a large table

Global indexes would seriously reduce the performance of both vacuum and
cluster for a single partition, and if you want seq scans you don't need
an index for that at all.  So the above doesn't strike me as a strong
argument for global indexes ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> You could argue for some sort of setup where you could take a partition
> "offline" during which you could safely do things like export or manipulate
> the data. But that's awfully limiting. What if I want to do things like add
> columns, or change data types, or any other manipulation that breaks the
> symmetry with the production partitioned table.

[ scrapes eyebrows off ceiling... ]  You don't really expect to be able
to do that kind of thing to just one partition do you?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Greg Stark
Josh Berkus  writes:

> Well, I would think that specifying an expression that defines a new 
> partition 
> at each change in value (like EXTRACT(day FROM timestamp) on a time-based 
> partitioning) would cover 90% of implemenations and be a lot simpler to 
> administer.   The Oracle approach has the advantage of allowing "custom 
> paritioning" at the expense of greater complexity.

Hm. This is where I might be less helpful. Once you're submersed in one way of
doing things it can be hard to think outside the box like this.

But I fear this scheme might be harder to actually take advantage of. If I do
a query like 

 WHERE timestamp BETWEEN '2005-01-01 11:00' AND '2005-01-01 12:00'

How do you determine which partitions that range will cover?

Also, it seems like it would be inconvenient to try to construct expressions
to handle things like "start a new partition ever 1 million values".

And worse, how would you handle changing schemes with this? Like, say we want
to switch from starting one partition per month to starting one partition per
week?



I think some actual use cases might be helpful for you. I can contribute an
interesting one, though I have to be intentionally vague even though I don't
work on that system any more.

We had a table with a layout like:

txnid serial,
groupid integer,
data...

Each day a cron job created 6 new groups (actually later that was changed to
some other number). It then added a new partition to handle the range of the
new day's groups. Later another cron job exchanged out the partition from a
week earlier and exported that table, transfered it to another machine and
loaded it there.

txnid was a unique identifier but we couldn't have a unique constraint because
that would have required a global index. That didn't cause any problems since
it was a sequence generated column anyways.

We did have a unique index on  which is a local index because
groupid was the partition key. In reality nothing in our system ever really
needed a txn without knowing which group it came from anyways, so it was easy
to change our queries to take advantage of this.

We had a lot of jobs, some *extremely* performance sensitive that depended on
being able to scan the entire list of txns for a given day or a given set of
groupids. The partitions meant it could do a full table scan which made these
extremely fast.

This was with Oracle 8i. All partition keys in 8i were ranges. In 9 Oracle
added the ability to make partition reference specific id values. Sort of like
how you're describing having a key expression. We might have considered using
that scheme with groupid but then it would have meant adding a bunch of new
partitions each day and having some queries that would involve scanning
multiple partitions.

-- 
Greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Global indexes would seriously reduce the performance of both vacuum and
> cluster for a single partition, and if you want seq scans you don't need
> an index for that at all.  So the above doesn't strike me as a strong
> argument for global indexes ...

I think he means some sort of plan for queries like

  select * from invoices where customer_id = 1

where customer 1 only did business with us for two years. One could imagine
some kind of very coarse grained bitmap index that just knows which partitions
customer_id=1 appears in, and then does a sequential scan of those partitions.

But I think you can do nearly as well without using global indexes of any
type. Assuming you had local indexes on customer_id for each partition and
separate histograms for each partition the planner could conclude that it
needs sequential scans for some partitions and a quick index lookup expecting
0 records for other partitions.

Not as good as pruning partitions entirely but if you're doing a sequential
scan the performance hit of a few index lookups isn't a problem.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > You could argue for some sort of setup where you could take a partition
> > "offline" during which you could safely do things like export or manipulate
> > the data. But that's awfully limiting. What if I want to do things like add
> > columns, or change data types, or any other manipulation that breaks the
> > symmetry with the production partitioned table.
> 
> [ scrapes eyebrows off ceiling... ]  You don't really expect to be able
> to do that kind of thing to just one partition do you?

Well no. That's exactly why I would want to pull the partition out of the
partitioned table so that I can then do whatever work I need to archive it
without affecting the partitioned table.

Take an analogous situation. I have a huge log file I want to rotate. The
quickest most efficient way to do this would be to move it aside, HUP the
daemon (or whatever else I have to do to get it to open a new file) then gzip
and archive the old log files.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] View vs function

2005-03-20 Thread Keith Worthington
Hi All,
I have been reading about set returning functions.  What I would like to 
know is is there a performance advantage in using SRFs versus querying a 
view.  Assuming the underlying SQL is the same for the view vs the 
function except for the WHERE clause which of these would you expect to 
be faster?  Or does the planner realize all this...

SELECT * FROM view_big_query WHERE column1 = 1234;
SELECT * FROM func_bug_query(1234);
--
Kind Regards,
Keith
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] View vs function

2005-03-20 Thread Bruno Wolff III
On Sun, Mar 20, 2005 at 22:39:57 -0500,
  Keith Worthington <[EMAIL PROTECTED]> wrote:
> Hi All,
> 
> I have been reading about set returning functions.  What I would like to 
> know is is there a performance advantage in using SRFs versus querying a 
> view.  Assuming the underlying SQL is the same for the view vs the 
> function except for the WHERE clause which of these would you expect to 
> be faster?  Or does the planner realize all this...

In general you are going to be better off with a view, since the planner
knows what the view is doing and there may be some optimizations it
can make. Functions are just black boxes to the planner.

> 
> SELECT * FROM view_big_query WHERE column1 = 1234;
> 
> SELECT * FROM func_bug_query(1234);
> 
> -- 
> Kind Regards,
> Keith
> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>   http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Postgres on RAID5

2005-03-20 Thread David Dougall
In my experience, if you are concerned about filesystem performance, don't
use ext3.  It is one of the slowest filesystems I have ever used
especially for writes.  I would suggest either reiserfs or xfs.
--David Dougall


On Fri, 11 Mar 2005, Arshavir Grigorian wrote:

> Hi,
>
> I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has
> an Ext3 filesystem which is used by Postgres. Currently we are loading a
> 50G database on this server from a Postgres dump (copy, not insert) and
> are experiencing very slow write performance (35 records per second).
>
> Top shows that the Postgres process (postmaster) is being constantly put
> into D state for extended periods of time (2-3 seconds) which I assume
> is because it's waiting for disk io. I have just started gathering
> system statistics and here is what sar -b shows: (this is while the db
> is being loaded - pg_restore)
>
>  tpsrtps wtps  bread/s  bwrtn/s
> 01:35:01 PM275.77 76.12199.66709.59   2315.23
> 01:45:01 PM287.25 75.56211.69706.52   2413.06
> 01:55:01 PM281.73 76.35205.37711.84   2389.86
> 02:05:01 PM282.83 76.14206.69720.85   2418.51
> 02:15:01 PM284.07 76.15207.92707.38   2443.60
> 02:25:01 PM265.46 75.91189.55708.87   2089.21
> 02:35:01 PM285.21 76.02209.19709.58   2446.46
> Average:   280.33 76.04204.30710.66   2359.47
>
> This is a Sun e450 with dual TI UltraSparc II processors and 2G of RAM.
> It is currently running Debian Sarge with a 2.4.27-sparc64-smp custom
> compiled kernel. Postgres is installed from the Debian package and uses
> all the configuration defaults.
>
> I am also copying the pgsql-performance list.
>
> Thanks in advance for any advice/pointers.
>
>
> Arshavir
>
> Following is some other info that might be helpful.
>
> /proc/scsi# mdadm -D /dev/md1
> /dev/md1:
>  Version : 00.90.00
>Creation Time : Wed Feb 23 17:23:41 2005
>   Raid Level : raid5
>   Array Size : 123823616 (118.09 GiB 126.80 GB)
>  Device Size : 8844544 (8.43 GiB 9.06 GB)
> Raid Devices : 15
>Total Devices : 17
> Preferred Minor : 1
>  Persistence : Superblock is persistent
>
>  Update Time : Thu Feb 24 10:05:38 2005
>State : active
>   Active Devices : 15
> Working Devices : 16
>   Failed Devices : 1
>Spare Devices : 1
>
>   Layout : left-symmetric
>   Chunk Size : 64K
>
> UUID : 81ae2c97:06fa4f4d:87bfc6c9:2ee516df
>   Events : 0.8
>
>  Number   Major   Minor   RaidDevice State
> 0   8   640  active sync   /dev/sde
> 1   8   801  active sync   /dev/sdf
> 2   8   962  active sync   /dev/sdg
> 3   8  1123  active sync   /dev/sdh
> 4   8  1284  active sync   /dev/sdi
> 5   8  1445  active sync   /dev/sdj
> 6   8  1606  active sync   /dev/sdk
> 7   8  1767  active sync   /dev/sdl
> 8   8  1928  active sync   /dev/sdm
> 9   8  2089  active sync   /dev/sdn
>10   8  224   10  active sync   /dev/sdo
>11   8  240   11  active sync   /dev/sdp
>12  650   12  active sync   /dev/sdq
>13  65   16   13  active sync   /dev/sdr
>14  65   32   14  active sync   /dev/sds
>
>15  65   48   15  spare   /dev/sdt
>
> # dumpe2fs -h /dev/md1
> dumpe2fs 1.35 (28-Feb-2004)
> Filesystem volume name:   
> Last mounted on:  
> Filesystem UUID:  1bb95bd6-94c7-4344-adf2-8414cadae6fc
> Filesystem magic number:  0xEF53
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal dir_index needs_recovery large_file
> Default mount options:(none)
> Filesystem state: clean
> Errors behavior:  Continue
> Filesystem OS type:   Linux
> Inode count:  15482880
> Block count:  30955904
> Reserved block count: 1547795
> Free blocks:  28767226
> Free inodes:  15482502
> First block:  0
> Block size:   4096
> Fragment size:4096
> Blocks per group: 32768
> Fragments per group:  32768
> Inodes per group: 16384
> Inode blocks per group:   512
> Filesystem created:   Wed Feb 23 17:27:13 2005
> Last mount time:  Wed Feb 23 17:45:25 2005
> Last write time:  Wed Feb 23 17:45:25 2005
> Mount count:  2
> Maximum mount count:  28
> Last checked: Wed Feb 23 17:27:13 2005
> Check interval:   15552000 (6 months)
> Next check after: Mon Aug 22 18:27:13 2005
> Reserved blocks uid:  0 (user root)
> Reserved blo

Effect of Stripe Size (was [PERFORM] Postgres on RAID5)

2005-03-20 Thread Ruth Ivimey-Cook
Folks,

> You said:
> "If your write size is smaller than chunk_size*N (N = number 
> of data blocks in a stripe), in order to calculate correct 
> parity you have to read data from the remaining drives."
> 
> Neil explained it in this message:
> http://marc.theaimsgroup.com/?l=linux-raid&m=108682190730593&w=2

Having read that and the recent posts:

Has anyone done any performance checks on the md code to determine what, if
any, effect the stripe size has on performance? One might suppose the variables
would be stripe size, file size and read vs write. Possibly number of drives in
array, too.

Reason for asking:

When I set up my raid5 array, I chose a stripe of 256K, on the grounds that a
large number of the files on the drive are multi-megabytes (fairly evenly in
the 20MB - 100MB range) and I supposed that a large stripe would speed things
up for those files. Was I right?

Regards,

Ruth



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres on RAID5

2005-03-20 Thread Michael Tokarev
David Dougall wrote:
In my experience, if you are concerned about filesystem performance, don't
use ext3.  It is one of the slowest filesystems I have ever used
especially for writes.  I would suggest either reiserfs or xfs.
I'm a bit afraid to start yet another filesystem flamewar, but.
Please don't make such a claims without providing actual numbers
and config details.  Pretty please.
ext3 performs well for databases, there's no reason for it to be
slow.  Ok, enable data=journal and use it with eg Oracle - you will
see it is slow.  But in that case it isn't the filesystem to blame,
it's operator error, simple as that.
And especially reiserfs, with its tail packing enabled by default,
is NOT suitable for databases...
/mjt
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Which one is faster: one way reading ="single pass reading"

2005-03-20 Thread Rosny
Hi,

Which one is faster: one way reading ="single pass reading"
Assumption :
a. Need to have 3 millions records
b. Need to have call 10  or 20 records repeatly
  (so for database it will be 10 times connection, each connection with one
record.
   or can be fancy 1 connection call return 10 sets of records)


1. Reading from Flat file
   Assume already give file name and just need to read the file
  (since it is flat file, each record represent a filename, with multiple
directory category)

2. Reading from XML file
   Assume schema already given just need to read the file
  (since it is xml file, each record represent an xml filename, with
multiple directory category)

3. Reading from Postgresql
   Assume primary key has been done with indexing
   just need to search the number and grap the text content
  (assume 3 millions of records, search the number, read the content file)

trying to recreate WebDBReader (from nutch) using C#
http://nutch.sourceforge.net/docs/api/net/nutch/db/WebDBReader.html

Thank you in advances,
Rosny





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Questions about 2 databases.

2005-03-20 Thread Mirko Zeibig
jelle wrote:
> The insert heavy sessions average 175 page hits generating XML, 1000 
> insert/updates which comprise 90% of the insert/update load, of which
> 200 inserts need to be transferred to the master db. The other 
> sessions are read/cache bound. I hoping to get a speed-up from moving
>  the temporary stuff off the master db and using 1 transaction
> instead of 175 to the disk based master db.

Just a thought:
Wouldn't it be sufficient to have the "temporary", fast session-table
in a RAM-disk? I suspect you could do this rather easily using a TABLESPACE.
All the indices could be in this TABLESPACE as well (at least after
having a quick look at the short help for CREATE TABLE and assuming you are
using PostgreSQL >= 8.0).

Regards
Mirko

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Hardware impact on performances

2005-03-20 Thread Camille Chafer




Hi,

I'm using PostgreSQL 8 for a mmorpg.
The part of each operation is : select: 50%, update: 40%, insert: 10%.
I have no more than 4-5 concurrent connections to the database, but
each of them does A LOT of queries (several per second).
The database size is about 1GB, but it'll probably be around 2GB in a
fews months.
The OS will be FreeBSD (version production 5.3 probably, or 4.10)

At this time, i'm looking for a new server. Before to buy it, I grab
some informations..
So, my question is : what would be the best hardware for this type of
needs ?
Of course, I'm not asking for a trademark and/or for prices, but for
hints.

- What is the most important part of the system : CPU ? RAM ? Disks ?
- Is a server with 2 or more CPUs much better than a server with a
single one, for a pgsql database ?
- How much RAM do I need ? The size of the data ? Twice the size ?
- I heard Raid1+0 is better than Raid 5. Is it right ? What would be
the best configuration, regarding performances and security ?
- Does the CPU type (i386, PowerPC, ) matters ?
- A lot of queries probably generate a lot of network output. Does the
network controller matters ?
- And finally, last question : is it possible to run a single
postgresql database on several servers ? (hardware clustering)

Thanks in advance for your answers, and sorry for my crap english (i'm
french).

Camille Chafer





Re: [PERFORM] interesting benchmarks PG/Firebird Linux/Windows fsync/nofsync

2005-03-20 Thread Qingqing Zhou
"Bruce Momjian"  writes
>
> Yes, we now enable FILE_FLAG_WRITE_THROUGH on Win32 for open_sync and I
> am about to open a discussion whether this should be the default for
> Win32, and whether we should backpatch this to 8.0.X.

Just a short msg: Oracle/SQL Server enable it as default in win32 *no matter
what* ...

Regards,
Qingqing



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] cpu_tuple_cost

2005-03-20 Thread Daniel Schuchardt
Tom Lane wrote:
Reducing random_page_cost is usually the best way to get the planner to
favor indexscans more.
Ok, I tried a bit with random_page_cost and I have set it to 1 to become 
 PG using the index on mitpln:

CIMSOFT=# ANALYSE mitpln;
ANALYZE
CIMSOFT=# SET random_page_cost=2;
SET
CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE 
date_to_yearmonth_dec(mpl_date)='20050';
  QUERY PLAN


 Seq Scan on mitpln  (cost=0.00..1173.78 rows=1431 width=69) (actual 
time=219.000..1125.000 rows=1266 loops=1)
   Filter: ((date_to_yearmonth_dec((mpl_date)::timestamp without time 
zone))::text = '20050'::text)
 Total runtime: 1125.000 ms
(3 rows)

CIMSOFT=# SET random_page_cost=1;
SET
CIMSOFT=# EXPLAIN ANALYSE SELECT * FROM mitpln WHERE 
date_to_yearmonth_dec(mpl_date)='20050';
  QUERY PLAN


 Index Scan using mitpln_yearmonth_dec on mitpln  (cost=0.00..699.01 
rows=1431 width=69) (actual time=0.000..16.000 rows=1266 loops=1)
   Index Cond: ((date_to_yearmonth_dec((mpl_date)::timestamp without 
time zone))::text = '20050'::text)
 Total runtime: 16.000 ms
(3 rows)

CIMSOFT=# \d mitpln
   Table "public.mitpln"
Column| Type  | Modifiers
--+---+-
 mpl_id   | integer   | not null default 
nextval('public.mitpln_mpl_id_seq'::text)
 mpl_date | date  |
 mpl_minr | integer   | not null
 mpl_tpl_name | character varying(20) |
 mpl_feiertag | character varying(50) |
 mpl_min  | real  |
 mpl_saldo| real  |
 mpl_buch | boolean   | not null default false
 mpl_absaldo  | real  |
 mpl_vhz  | real  |
 dbrid| character varying | default nextval('db_id_seq'::text)
Indexes:
"mitpln_pkey" PRIMARY KEY, btree (mpl_id)
"mitpln_idindex" UNIQUE, btree (dbrid)
"xtt5126" UNIQUE, btree (mpl_date, mpl_minr)
"mitpln_yearmonth_dec" btree 
(date_to_yearmonth_dec(mpl_date::timestamp with
out time zone))
"mpl_minr" btree (mpl_minr)
"mpl_minr_nobuch" btree (mpl_minr) WHERE NOT mpl_buch

CIMSOFT=# SELECT count(*) FROM mitpln;
 count
---
 26330
(1 row)
CREATE OR REPLACE FUNCTION date_to_yearmonth_dec(TIMESTAMP) RETURNS 
VARCHAR AS'
BEGIN
 RETURN extract(year FROM $1) || extract(month FROM $1)-1;
END'LANGUAGE plpgsql IMMUTABLE;

Daniel
PS : thats a 2.4 GHZ P4 Server with 1 GB Ram and RAID - SCSI
(WIN2000, PG8.0.1)
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres on RAID5

2005-03-20 Thread Michael Tokarev
Arshavir Grigorian wrote:
Alex Turner wrote:
[]
Well, by putting the pg_xlog directory on a separate disk/partition, I 
was able to increase this rate to about 50 or so per second (still 
pretty far from your numbers). Next I am going to try putting the 
pg_xlog on a RAID1+0 array and see if that helps.
pg_xlog is written syncronously, right?  It should be, or else reliability
of the database will be at a big question...
I posted a question on Feb-22 here in linux-raid, titled "*terrible*
direct-write performance with raid5".  There's a problem with write
performance of a raid4/5/6 array, which is due to the design.
Consider raid5 array (raid4 will be exactly the same, and for raid6,
just double the parity writes) with N data block and 1 parity block.
At the time of writing a portion of data, parity block should be
updated too, to be consistent and recoverable.  And here, the size of
the write plays very significant role.  If your write size is smaller
than chunk_size*N (N = number of data blocks in a stripe), in order
to calculate correct parity you have to read data from the remaining
drives.  The only case where you don't need to read data from other
drives is when you're writing by the size of chunk_size*N, AND the
write is block-aligned.  By default, chunk_size is 64Kb (min is 4Kb).
So the only reasonable direct-write size of N drives will be 64Kb*N,
or else raid code will have to read "missing" data to calculate the
parity block.  Ofcourse, in 99% cases you're writing in much smaller
sizes, say 4Kb or so.  And here, the more drives you have, the
LESS write speed you will have.
When using the O/S buffer and filesystem cache, the system has much
more chances to re-order requests and sometimes even omit reading
entirely (when you perform many sequentional writes for example,
without sync in between), so buffered writes might be much fast.
But not direct or syncronous writes, again especially when you're
doing alot of sequential writes...
So to me it looks like an inherent problem of raid5 architecture
wrt database-like workload -- databases tends to use syncronous
or direct writes to ensure good data consistency.
For pgsql, which (i don't know for sure but reportedly) uses syncronous
writs only for the transaction log, it is a good idea to put that log
only to a raid1 or raid10 array, but NOT to raid5 array.
Just IMHO ofcourse.
/mjt
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Postgres on RAID5 (possible sync blocking read type

2005-03-20 Thread David Greaves
Greg Stark wrote:
Arshavir Grigorian <[EMAIL PROTECTED]> writes:
 

Hi,
I have a RAID5 array (mdadm) with 14 disks + 1 spare. This partition has an
Ext3 filesystem which is used by Postgres. 
   

People are going to suggest moving to RAID1+0. I'm unconvinced that RAID5
across 14 drivers shouldn't be able to keep up with RAID1 across 7 drives
though. It would be interesting to see empirical data.
One thing that does scare me is the Postgres transaction log and the ext3
journal both sharing these disks with the data. Ideally both of these things
should get (mirrored) disks of their own separate from the data files.
But 2-3s pauses seem disturbing. I wonder whether ext3 is issuing a cache
flush on every fsync to get the journal pushed out. This is a new linux
feature that's necessary with ide but shouldn't be necessary with scsi.
It would be interesting to know whether postgres performs differently with
fsync=off. This would even be a reasonable mode to run under for initial
database loads. It shouldn't make much of a difference with hardware like this
though. And you should be aware that running under this mode in production
would put your data at risk.
Hi
I'm coming in from the raid list so I didn't get the full story.
May I ask what kernel?
I only ask because I upgraded to 2.6.11.2 and happened to be watching 
xosview on my (probably) completely different setup (1Tb xfs/lvm2/raid5 
served by nfs to a remote sustained read/write app), when I saw all read 
activity cease for 2/3 seconds whilst the disk wrote, then disk read 
resumed. This occured repeatedly during a read/edit/write of a 3Gb file.

Performance not critical here so on the "hmm, that's odd" todo list :)
David
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] column name is "LIMIT"

2005-03-20 Thread Qingqing Zhou
So is it to make SQL parser context-sensitive - say the parser will
understand that in statement "SELECT * from LIMIT", LIMIT is just a table
name, instead of keyword?

There might be some conflicts when using Yacc, but I am not sure how
difficult will be ...

Cheers,
Qingqing

"Christopher Kings-Lynne" <[EMAIL PROTECTED]>
> You will still need to use double quotes in 8.0.1...
>
> Chris
>
> Gourish Singbal wrote:
> > Thanks a lot,
> >
> > we might be upgrading to 8.0.1 soon.. till than using double quotes
> > should be fine.
> >
> > regards
> > gourish
> >
> > On Mon, 14 Mar 2005 18:25:22 +1100, Russell Smith <[EMAIL PROTECTED]>
wrote:
> >
> >>On Mon, 14 Mar 2005 06:14 pm, Gourish Singbal wrote:
> >>
> >>>Guys,
> >>>
> >>>I am having a problem firing queries on one of the tables which is
> >>>having "limit" as the column name.
> >>>
> >>>If a run an insert/select/update command on that table i get the below
error.
> >>>
> >>>ERROR:  syntax error at or near "limit" at character 71
> >>
> >>select "limit" from limit_table WHERE "limit" < 50 LIMIT 2;
> >>
> >>You need to quote the field name, and make sure the case is correct.
> >>
> >>>Any Help would be realyl great to solve the problem.
> >>>
> >>>postgresql 7.4.5 and linux OS
> >>>
> >>
> >>You should probably upgrade to 7.4.7
> >>
> >>Regards
> >>
> >>Russell Smith.
> >>
> >
> >
> >
>
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Postgres on RAID5

2005-03-20 Thread Guy
You said:
"If your write size is smaller than chunk_size*N (N = number of data blocks
in a stripe), in order to calculate correct parity you have to read data
from the remaining drives."

Neil explained it in this message:
http://marc.theaimsgroup.com/?l=linux-raid&m=108682190730593&w=2

Guy

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael Tokarev
Sent: Monday, March 14, 2005 5:47 PM
To: Arshavir Grigorian
Cc: linux-raid@vger.kernel.org; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgres on RAID5

Arshavir Grigorian wrote:
> Alex Turner wrote:
> 
[]
> Well, by putting the pg_xlog directory on a separate disk/partition, I 
> was able to increase this rate to about 50 or so per second (still 
> pretty far from your numbers). Next I am going to try putting the 
> pg_xlog on a RAID1+0 array and see if that helps.

pg_xlog is written syncronously, right?  It should be, or else reliability
of the database will be at a big question...

I posted a question on Feb-22 here in linux-raid, titled "*terrible*
direct-write performance with raid5".  There's a problem with write
performance of a raid4/5/6 array, which is due to the design.

Consider raid5 array (raid4 will be exactly the same, and for raid6,
just double the parity writes) with N data block and 1 parity block.
At the time of writing a portion of data, parity block should be
updated too, to be consistent and recoverable.  And here, the size of
the write plays very significant role.  If your write size is smaller
than chunk_size*N (N = number of data blocks in a stripe), in order
to calculate correct parity you have to read data from the remaining
drives.  The only case where you don't need to read data from other
drives is when you're writing by the size of chunk_size*N, AND the
write is block-aligned.  By default, chunk_size is 64Kb (min is 4Kb).
So the only reasonable direct-write size of N drives will be 64Kb*N,
or else raid code will have to read "missing" data to calculate the
parity block.  Ofcourse, in 99% cases you're writing in much smaller
sizes, say 4Kb or so.  And here, the more drives you have, the
LESS write speed you will have.

When using the O/S buffer and filesystem cache, the system has much
more chances to re-order requests and sometimes even omit reading
entirely (when you perform many sequentional writes for example,
without sync in between), so buffered writes might be much fast.
But not direct or syncronous writes, again especially when you're
doing alot of sequential writes...

So to me it looks like an inherent problem of raid5 architecture
wrt database-like workload -- databases tends to use syncronous
or direct writes to ensure good data consistency.

For pgsql, which (i don't know for sure but reportedly) uses syncronous
writs only for the transaction log, it is a good idea to put that log
only to a raid1 or raid10 array, but NOT to raid5 array.

Just IMHO ofcourse.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] View vs function

2005-03-20 Thread Neil Conway
Bruno Wolff III wrote:
Functions are just black boxes to the planner.
... unless the function is a SQL function that is trivial enough for the 
planner to inline it into the plan of the invoking query. Currently, we 
won't inline set-returning SQL functions that are used in the query's 
rangetable, though. This would be worth doing, I think -- I'm not sure 
how much work it would be, though.

-Neil
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Stacy White
From: "Tom Lane" <[EMAIL PROTECTED]>
> "Stacy White" <[EMAIL PROTECTED]> writes:
> > FWIW, we see large benefits from partitioning other than the ability to
> > easily drop data, for example:
>
> > - We can vacuum only the active portions of a table
> > - Postgres automatically keeps related records clustered together on
disk,
> > which makes it more likely that the blocks used by common queries can be
> > found in cache
> > - The query engine uses full table scans on the relevant sections of
data,
> > and quickly skips over the irrelevant sections
> > - 'CLUSTER'ing a single partition is likely to be significantly more
> > performant than clustering a large table
> Global indexes would seriously reduce the performance of both vacuum and
> cluster for a single partition, and if you want seq scans you don't need
> an index for that at all.  So the above doesn't strike me as a strong
> argument for global indexes ...

Tom, this list was in response to your question "If you have to do index
cleaning before you can drop a partition, what's the point of
partitioning?".  I was trying to make the point that partioning isn't just
about being able to quickly drop data.  The argument for global indexes came
in the form of my war story and the description of the conditions under
which global indexes will perform better than local indexes (see my original
email for details) .  But, like I said, this would definitely be a phase
II/III item.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] What needs to be done for real Partitioning?

2005-03-20 Thread Stacy White
From: "Greg Stark" <[EMAIL PROTECTED]>
> Tom Lane <[EMAIL PROTECTED]> writes:
> Not as good as pruning partitions entirely but if you're doing a
sequential
> scan the performance hit of a few index lookups isn't a problem.

Greg, I think you've got the right idea.  For large databases, though, it
won't be uncommon to have large numbers of partitions, in which case we're
not talking about a few index lookups.  The database I used in my example
wasn't huge, but the table in question had over 800 partitions.  A larger
database could have thousands.  I suppose the importance of global indexes
depends on the sizes of the databases your target audience is running.

Here's some more detail on our real-world experience:  The group made the
decision to partition some of the larger tables for better performance.  The
idea that global indexes aren't useful is pretty common in the database
world, and 2 or 3 good DBAs suggested that the 'product_no' index be local.
But with the local indexes, performance on some queries was bad enough that
the group actually made the decision to switch back to unpartitioned tables.
(The performance problems came about because of the overhead involved in
searching >800 indices to find the relevant rows).

Luckily they that had the chance to work with a truly fantastic DBA (the
author of an Oracle Press performance tuning book even) before they could
switch back.  He convinced them to make some of their indexes global.
Performance dramatically improved (compared with both the unpartitioned
schema, and the partitioned-and-locally-indexed schema), and they've since
stayed with partitioned tables and a mix of local and global indexes.

But once again, I think that global indexes aren't as important as the Phase
I items in any of the Phase I/Phase II breakdowns that have been proposed in
this thread.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] View vs function

2005-03-20 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> Bruno Wolff III wrote:
>> Functions are just black boxes to the planner.

> ... unless the function is a SQL function that is trivial enough for the 
> planner to inline it into the plan of the invoking query. Currently, we 
> won't inline set-returning SQL functions that are used in the query's 
> rangetable, though. This would be worth doing, I think -- I'm not sure 
> how much work it would be, though.

Yeah, I've been thinking the same.  It seems like it shouldn't be unduly
difficult --- not harder than inlining scalar-valued SQL functions, just
different validity conditions.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org