Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Greg Smith
On Wed, 6 Feb 2008, Dimitri Fontaine wrote: In fact, the -F option works by having pgloader read the given number of lines but skip processing them, which is not at all what Greg is talking about here I think. Yeah, that's not useful. Greg, what would you think of a pgloader which will separ

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le Wednesday 06 February 2008 18:49:56 Luke Lonergan, vous avez écrit : > Improvements are welcome, but to compete in the industry, loading will need > to speed up by a factor of 100. Oh, I meant to compete with internal COPY command instead of \copy one, not with the competition. AIUI competing

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le Wednesday 06 February 2008 18:37:41 Dimitri Fontaine, vous avez écrit : > Le mercredi 06 février 2008, Greg Smith a écrit : > > If I'm loading a TB file, odds are good I can split that into 4 or more > > vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 > > loaders at once, and

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Luke Lonergan
Improvements are welcome, but to compete in the industry, loading will need to speed up by a factor of 100. Note that Bizgres loader already does many of these ideas and it sounds like pgloader does too. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Dimitri Fontaine

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Greg Smith a écrit : > If I'm loading a TB file, odds are good I can split that into 4 or more > vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders > at once, and get way more than 1 disk worth of throughput reading. pgloader already supports st

Re: [PERFORM] Benchmark Data requested

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Greg Smith a écrit : > COPY. If you're loading a TB, if you're smart it's going onto the server > itself if it all possible and loading directly from there. Would probably > get a closer comparision against psql \copy, but recognize you're always > going to be compare

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Jignesh K. Shah
Greg Smith wrote: On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in "round-robin" manner to "n" different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Luke Lonergan
Hi Greg, On 2/6/08 7:56 AM, "Greg Smith" <[EMAIL PROTECTED]> wrote: > If I'm loading a TB file, odds are good I can split that into 4 or more > vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders > at once, and get way more than 1 disk worth of throughput reading. You > ha

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Tom Lane
Theo Kramer <[EMAIL PROTECTED]> writes: > On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote: >> Since the SQL is not your fault and difficult to control, it is an >> argument in favour of an optional planner mode that would perform >> additional checks for redundant clauses of various kinds. The

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Greg Smith
On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in "round-robin" manner to "n" different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of these giant COPY sit

Re: [PERFORM] Benchmark Data requested

2008-02-06 Thread Greg Smith
On Wed, 6 Feb 2008, Dimitri Fontaine wrote: Did you compare to COPY or \copy? COPY. If you're loading a TB, if you're smart it's going onto the server itself if it all possible and loading directly from there. Would probably get a closer comparision against psql \copy, but recognize you're

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Roberts, Jon
> > > >>> Since the SQL is not your fault and difficult to control, it is an > >>> argument in favour of an optional planner mode that would perform > >>> additional checks for redundant clauses of various kinds. The > > default > >>> for that would be "off" since most people don't suffer from this

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Erik Jones
On Feb 6, 2008, at 7:35 AM, Roberts, Jon wrote: Since the SQL is not your fault and difficult to control, it is an argument in favour of an optional planner mode that would perform additional checks for redundant clauses of various kinds. The default for that would be "off" since most peopl

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Roberts, Jon
> > Since the SQL is not your fault and difficult to control, it is an > > argument in favour of an optional planner mode that would perform > > additional checks for redundant clauses of various kinds. The default > > for that would be "off" since most people don't suffer from this > > problem. B

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Simon Riggs a écrit : > For me, it would be good to see a --parallel=n parameter that would > allow pg_loader to distribute rows in "round-robin" manner to "n" > different concurrent COPY statements. i.e. a non-routing version. What happen when you want at most N paral

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Theo Kramer
On Wed, 2008-02-06 at 11:53 +, Simon Riggs wrote: > On Wed, 2008-02-06 at 09:42 +0100, SURANTYN Jean François wrote: > > > That issue is very annoying because with generated SQL queries (from > > Business Objects for example) on big tables, it is possible that some > > queries have several tim

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 09:42 +0100, SURANTYN Jean François wrote: > That issue is very annoying because with generated SQL queries (from > Business Objects for example) on big tables, it is possible that some > queries have several times the same "where" condition ("where n=1 and > n=1" for example

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 12:27 +0100, Dimitri Fontaine wrote: > Multi-Threading behavior and CE support > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > > Now, pgloader will be able to run N threads, each one loading some > data to a > partitionned child-table target. N will certainly be configured > depending

Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas

2008-02-06 Thread Dimitri Fontaine
Hi, I've been thinking about this topic some more, and as I don't know when I'll be able to go and implement it I'd want to publish the ideas here. This way I'll be able to find them again :) Le mardi 05 février 2008, Dimitri Fontaine a écrit : > Le mardi 05 février 2008, Simon Riggs a écrit :

Re: [PERFORM] Benchmark Data requested

2008-02-06 Thread Dimitri Fontaine
Le mercredi 06 février 2008, Greg Smith a écrit : > pgloader is a great tool for a lot of things, particularly if there's any > chance that some of your rows will get rejected. But the way things pass > through the Python/psycopg layer made it uncompetative (more than 50% > slowdown) against the s

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Richard Huxton
SURANTYN Jean François wrote: > Many thanks for your quick reply > > In fact, that issue comes from a recent migration from Oracle to > Postgresql, and even if some queries were not optimized by the past > (example: where n=1 and n=1), Oracle was able to rewrite them and to > "hide" the bad querie

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread SURANTYN Jean François
Many thanks for your quick reply In fact, that issue comes from a recent migration from Oracle to Postgresql, and even if some queries were not optimized by the past (example: where n=1 and n=1), Oracle was able to rewrite them and to "hide" the bad queries". But now that we have migrated to P

Re: [PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread Richard Huxton
SURANTYN Jean François wrote: > my_db=# explain select * from test where n = 1; > my_db=# explain select * from test where n = 1 and n = 1; > In the first SELECT query (with "where n=1"), the estimated number of > returned rows is correct (10), whereas in the second SELECT query > (with "where n=

[PERFORM] Optimizer : query rewrite and execution plan ?

2008-02-06 Thread SURANTYN Jean François
Hi I have discovered an issue on my Postgresql database recently installed : it seems that the optimizer can not, when possible, simplify and rewrite a simple query before running it. Here is a simple and reproducible example : my_db=# create table test (n numeric); CREATE my_db=# insert into