Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
Problem solved.. I set the fetchSize to a reasonable value instead of the default of unlimited in the PreparedStatement and now the query is . After some searching it seeems this is a common problem, would it make sense to change the default value to something other than 0 in the JDBC driver? If

Re: [PERFORM] Four table join with million records - performance improvement?

2004-09-13 Thread Tom Lane
Vijay Moses <[EMAIL PROTECTED]> writes: > Hi i have four sample tables ename, esal, edoj and esum > All of them have 100 records. Im running the following > query : select ename.eid, name, sal, doj, summary from > ename,esal,edoj,esum where ename.eid=esal.eid and ename.eid=edoj.eid > and ename.

Re: [PERFORM] tblspaces integrated in new postgresql (version 8.0)

2004-09-13 Thread Tom Lane
[EMAIL PROTECTED] writes: > I am interested in tablespaces--what exactly is this feature, some sort of > organizational addition (?) and howcan I best take advantage of this? See http://developer.postgresql.org/docs/postgres/manage-ag-tablespaces.html It doesn't talk a lot yet about *why* yo

Re: [PERFORM] Help with extracting large volumes of records across related

2004-09-13 Thread Mischa Sandberg
Damien Dougan wrote: Basically we have a number of tables, which are exposed as 2 public views (say PvA and PvB). For each row in PvA, there are a number of related rows in PvB (this number is arbitrary, which is one of the reasons why it cant be expressed as additional columns in PvA - so we reall

[PERFORM] Four table join with million records - performance improvement?

2004-09-13 Thread Vijay Moses
Hi i have four sample tables ename, esal, edoj and esum All of them have 100 records. Im running the following query : select ename.eid, name, sal, doj, summary from ename,esal,edoj,esum where ename.eid=esal.eid and ename.eid=edoj.eid and ename.eid=esum.eid. Its a join of all four tables which

[PERFORM] tblspace

2004-09-13 Thread Bill Fefferman
Hi, I have downloaded the new postgresql (version 8.0 beta2) and I was wondering what performance features I can take advantage of before I start to dump my 3/4 terrabyte database into the new database. More specifically I am interested in tablespaces--what exactly is this feature, some sort of o

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-13 Thread Mischa Sandberg
Mark Cotner wrote: Hi all, I had a difficult time deciding which list to post this to, so please forgive me if this list doesn't perfectly match my questions. My decision will not solely be based on performance, but it is the primary concern. I would be very appreciative if you all could comment

[PERFORM] tblspaces integrated in new postgresql (version 8.0)

2004-09-13 Thread bill
Hi, I have downloaded the new postgresql (version 8.0 beta2) and I was wondering what performance features I can take advantage of before I start to dump my 3/4 terrabyte database into the new format. More specifically I am interested in tablespaces--what exactly is this feature, some sort of orga

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes: > If we had a majority of queries filling more than one block we would > be checkpointing like crazy and we don't normally get reports about > that. [ raised eyebrow... ] And of course the 30-second-checkpoint-warning stuff is a useless feature that no on

Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Tom Lane
Stephen Crowley <[EMAIL PROTECTED]> writes: > On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: >> Stephen Crowley <[EMAIL PROTECTED]> writes: >>> Does postgres cache the entire result set before it begins returning >>> data to the client? >> >> The backend doesn't, but libpq

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Gaetano Mendola
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: | Gaetano Mendola <[EMAIL PROTECTED]> writes: | |>Now that the argument is already open, why postgres choose |>on linux fdatasync? I'm understanding from other posts that |>on this platform open_sync is better than fdatasync. | | | AFAIR

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Bruce Momjian
Tom Lane wrote: > The tests that started this thread are pretty unconvincing in my eyes, > because they are comparing open_sync against code that fsyncs after each > one-block write. Under those circumstances, *of course* fsync will lose > (or at least do no better), because it's forcing the same

Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
On Mon, 13 Sep 2004 21:11:07 -0400, Tom Lane <[EMAIL PROTECTED]> wrote: > Stephen Crowley <[EMAIL PROTECTED]> writes: > > Does postgres cache the entire result set before it begins returning > > data to the client? > > The backend doesn't, but libpq does, and I think JDBC does too. > > I'd recomm

Re: [PERFORM] Large # of rows in query extremely slow, not using

2004-09-13 Thread Rod Taylor
On Mon, 2004-09-13 at 20:51, Stephen Crowley wrote: > Does postgres cache the entire result set before it begins returning > data to the client? Sometimes you need to be careful as to how the clients treat the data. For example psql will resize columns width on the length (width) of the data ret

Re: [PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Tom Lane
Stephen Crowley <[EMAIL PROTECTED]> writes: > Does postgres cache the entire result set before it begins returning > data to the client? The backend doesn't, but libpq does, and I think JDBC does too. I'd recommend using a cursor so you can FETCH a reasonable number of rows at a time. > Also, wh

[PERFORM] Large # of rows in query extremely slow, not using index

2004-09-13 Thread Stephen Crowley
Does postgres cache the entire result set before it begins returning data to the client? I have a table with ~8 million rows and I am executing a query which should return about ~800,000 rows. The problem is that as soon as I execute the query it absolutely kills my machine and begins swapping for

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-13 Thread Simon Riggs
Mark, I thought some additional comments on top of Christopher's excellent notes might help you. > Christopher Browne > The world rejoiced as Mischa Sandberg > <[EMAIL PROTECTED]> wrote: > > Mark Cotner wrote: > >> Requirements: > >> Merge table definition equivalent. We use these > >> extensiv

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Tom Lane
Gaetano Mendola <[EMAIL PROTECTED]> writes: > Now that the argument is already open, why postgres choose > on linux fdatasync? I'm understanding from other posts that > on this platform open_sync is better than fdatasync. AFAIR, we've seen *one* test from *one* person alleging that. And it was def

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Josh Berkus
Gaetano, > Now that the argument is already open, why postgres choose > on linux fdatasync? I'm understanding from other posts that > on this platform open_sync is better than fdatasync. Not necessarily. For example, here's my test results, on Linux 2.6.7, writing to a ReiserFS mount on a Soft

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Gaetano Mendola
Bruce Momjian wrote: Have you seen /src/tools/fsync? Now that the argument is already open, why postgres choose on linux fdatasync? I'm understanding from other posts that on this platform open_sync is better than fdatasync. However I choose open_sync. During initdb why don't detect this parameter

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread mudfoot
Quoting Bruce Momjian <[EMAIL PROTECTED]>: > > Have you seen /src/tools/fsync? > I have now. Thanks. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread Bruce Momjian
Have you seen /src/tools/fsync? --- [EMAIL PROTECTED] wrote: > Hi, I'd like to help with the topic in the Subject: line. It seems to be a > TODO item. I've reviewed some threads discussing the matter, so I hope I've

Re: [PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Pierre-Frédéric Caillaud
Thanks for the thanks ! Generally, when grouping stuff together, it is a good idea to have two sorted lists, and to scan them simultaneously. I have already used this solution several times outside of Postgres, and it worked very well (it was with Berkeley DB and there were 3 lists to sc

Re: [PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Damien Dougan
Pierre-Frederic, Paul, Thanks for your fast response (especially for the python code and performance figure) - I'll chase this up as a solution - looks most promising! Cheers, Damien ---(end of broadcast)--- TIP 6: Have you searched our list arch

Re: [PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Pierre-Frédéric Caillaud
My simple python program dumps 1653992 items in 1654000 categories in : real3m12.029s user1m36.720s sys 0m2.220s It was running on the same machine as postgresql (AthlonXP 2500). I Ctrl-C'd it before it dumped all the database but you get an idea. If you don't know Python and Generators

Re: [PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Pierre-Frédéric Caillaud
There's a very simple solution using cursors. As an example : create table categories ( id serial primary key, name text ); create table items ( id serial primary key, cat_id integer references categories(id), name text ); create index items_cat_idx on items( cat_id ); insert st

Re: [PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Paul Thomas
On 13/09/2004 12:38 Damien Dougan wrote: [snip] Are there any tools/tricks/tips with regards to extracting large volumes of data across related tables from Postgres? It doesnt have to export into XML, we can do post-processing on the extracted data as needed - the important thing is to keep the rel

[PERFORM] Help with extracting large volumes of records across related tables

2004-09-13 Thread Damien Dougan
Hi All, I am having a performance problem extracting a large volume of data from Postgres 7.4.2, and was wondering if there was a more cunning way to get the data out of the DB... This isn't a performance problem with any particular PgSQL operation, its more a strategy for getting large volumes

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-13 Thread Christopher Browne
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Mark Cotner) wrote: > Agreed, I did some preliminary testing today and am very impressed. > I wasn't used to running analyze after a data load, but once I did > that everything was snappy. Something worth observing is that this is true

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-13 Thread Mark Cotner
See comments . . . thanks for the feedback. 'njoy, Mark --- Christopher Browne <[EMAIL PROTECTED]> wrote: > The world rejoiced as Mischa Sandberg > <[EMAIL PROTECTED]> wrote: > > Mark Cotner wrote: > >> Requirements: > >> Merge table definition equivalent. We use these > >> extensively. > > >