[PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Jim C. Nasby
Attached is a plpython_error_1.out file that will fix cuckoo.
Unfortunately, I couldn't figure out how to submit it as an actual
patch, so it's just a file attachment. It should go in
src/pl/plpython/expected.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?
-- test error handling, i forgot to restore Warn_restart in
-- the trigger handler once. the errors and subsequent core dump were
-- interesting.
SELECT invalid_type_uncaught('rick');
WARNING:  plpython: in function invalid_type_uncaught:
DETAIL:  plpy.SPIError: Unknown error in PLy_spi_prepare
ERROR:  type test does not exist
SELECT invalid_type_caught('rick');
WARNING:  plpython: in function invalid_type_caught:
DETAIL:  plpy.SPIError: Unknown error in PLy_spi_prepare
ERROR:  type test does not exist
SELECT invalid_type_reraised('rick');
WARNING:  plpython: in function invalid_type_reraised:
DETAIL:  plpy.SPIError: Unknown error in PLy_spi_prepare
ERROR:  type test does not exist
SELECT valid_type('rick');
 valid_type 

 
(1 row)

--
-- Test Unicode error handling.
--
SELECT unicode_return_error();
ERROR:  plpython: function unicode_return_error could not create return value
DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)
INSERT INTO unicode_test (testvalue) VALUES ('test');
ERROR:  plpython: function unicode_trigger_error could not modify tuple
DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)
SELECT unicode_plan_error1();
WARNING:  plpython: in function unicode_plan_error1:
DETAIL:  plpy.Error: Unknown error in PLy_spi_execute_plan
ERROR:  plpython: function unicode_plan_error1 could not execute plan
DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)
SELECT unicode_plan_error2();
ERROR:  plpython: function unicode_plan_error2 could not execute plan
DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan



Alon Goldshuv wrote:


I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Patch attached.

 



I do not have time to review this  2900 line patch analytically, nor to 
benchmark it. I have done some functional testing of it on Windows, and 
tried to break it in text and CSV modes, and with both Unix and Windows 
type line endings - I have not observed any breakage.


This does need lots of eyeballs, though.

cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Jim C. Nasby
On Tue, Jul 19, 2005 at 10:03:39AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Attached is a plpython_error_1.out file that will fix cuckoo.
 
 What is the reason for the difference in the error message spelling
 in the first place?  Is this a Python version thing (and if so,
 which version is newer --- that should have pride of place as
 plpython_error.out I should think)?  Or is there some other reason
 that we need to understand more closely instead of just slapping on
 a band-aid?

I don't think it's a version issue; cuckoo is at 2.4, platypus used to
be at 2.3 but I upgraded it to 2.4 to see if that was the issue, but
platypus kept working.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Tom Lane
Jim C. Nasby [EMAIL PROTECTED] writes:
 On Tue, Jul 19, 2005 at 10:03:39AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
 Attached is a plpython_error_1.out file that will fix cuckoo.
 
 What is the reason for the difference in the error message spelling
 in the first place?  Is this a Python version thing (and if so,
 which version is newer --- that should have pride of place as
 plpython_error.out I should think)?  Or is there some other reason
 that we need to understand more closely instead of just slapping on
 a band-aid?

 I don't think it's a version issue; cuckoo is at 2.4, platypus used to
 be at 2.3 but I upgraded it to 2.4 to see if that was the issue, but
 platypus kept working.

Hmm ... if it's *not* a version thing then I really do want to know
what's causing it.  Anyone have an idea why this machine is saying
'\u80' where everyone else's python says u'\x80' ?


*** ./expected/plpython_error.out   Mon Jul 18 22:06:49 2005
--- ./results/plpython_error.outMon Jul 18 23:53:30 2005
***
*** 24,38 
  --
  SELECT unicode_return_error();
  ERROR:  plpython: function unicode_return_error could not create return 
value
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
u'\x80' in position 0: ordinal not in range(128)
  INSERT INTO unicode_test (testvalue) VALUES ('test');
  ERROR:  plpython: function unicode_trigger_error could not modify tuple
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
u'\x80' in position 0: ordinal not in range(128)
  SELECT unicode_plan_error1();
  WARNING:  plpython: in function unicode_plan_error1:
  DETAIL:  plpy.Error: Unknown error in PLy_spi_execute_plan
  ERROR:  plpython: function unicode_plan_error1 could not execute plan
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
u'\x80' in position 0: ordinal not in range(128)
  SELECT unicode_plan_error2();
  ERROR:  plpython: function unicode_plan_error2 could not execute plan
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
u'\x80' in position 0: ordinal not in range(128)
--- 24,38 
  --
  SELECT unicode_return_error();
  ERROR:  plpython: function unicode_return_error could not create return 
value
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)
  INSERT INTO unicode_test (testvalue) VALUES ('test');
  ERROR:  plpython: function unicode_trigger_error could not modify tuple
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)
  SELECT unicode_plan_error1();
  WARNING:  plpython: in function unicode_plan_error1:
  DETAIL:  plpy.Error: Unknown error in PLy_spi_execute_plan
  ERROR:  plpython: function unicode_plan_error1 could not execute plan
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)
  SELECT unicode_plan_error2();
  ERROR:  plpython: function unicode_plan_error2 could not execute plan
! DETAIL:  exceptions.UnicodeEncodeError: 'ascii' codec can't encode character 
'\u80' in position 0: ordinal not in range(128)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
On Thu, 14 Jul 2005 17:22:18 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Michael Fuhr
On Tue, Jul 19, 2005 at 02:48:52PM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  I don't think it's a version issue; cuckoo is at 2.4, platypus used to
  be at 2.3 but I upgraded it to 2.4 to see if that was the issue, but
  platypus kept working.
 
 Hmm ... if it's *not* a version thing then I really do want to know
 what's causing it.  Anyone have an idea why this machine is saying
 '\u80' where everyone else's python says u'\x80' ?

Is it possible that plpython.so is linked against an old version
of libpython?  I see that the error message changed a few years ago:

http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Python/exceptions.c?r1=1.44r2=1.45
http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Python/exceptions.c?r1=1.45r2=1.46

As I recall, Python must be configured with --enable-shared or you
don't get a shared version of libpython, so if you installed a new
Python but not a new version of libpython.*.so, then plpython.so
might be linked against an old version.

Does this machine have ldd or the equivalent?  If so, can you compare
ldd /path/to/python and ldd /path/to/plpython.so?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Jim C. Nasby
On Tue, Jul 19, 2005 at 01:54:00PM -0600, Michael Fuhr wrote:
 On Tue, Jul 19, 2005 at 02:48:52PM -0400, Tom Lane wrote:
  Jim C. Nasby [EMAIL PROTECTED] writes:
   I don't think it's a version issue; cuckoo is at 2.4, platypus used to
   be at 2.3 but I upgraded it to 2.4 to see if that was the issue, but
   platypus kept working.
  
  Hmm ... if it's *not* a version thing then I really do want to know
  what's causing it.  Anyone have an idea why this machine is saying
  '\u80' where everyone else's python says u'\x80' ?
 
 Is it possible that plpython.so is linked against an old version
 of libpython?  I see that the error message changed a few years ago:
 
 http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Python/exceptions.c?r1=1.44r2=1.45
 http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Python/exceptions.c?r1=1.45r2=1.46
 
 As I recall, Python must be configured with --enable-shared or you
 don't get a shared version of libpython, so if you installed a new
 Python but not a new version of libpython.*.so, then plpython.so
 might be linked against an old version.
 
 Does this machine have ldd or the equivalent?  If so, can you compare
 ldd /path/to/python and ldd /path/to/plpython.so?

Oddly, no, it doesn't seem to have ldd. And the buildfarm script seems
to clean everything up even in the pgsqlkeep directories; or at least I
couldn't find a plpython.so laying around.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Michael Fuhr
On Tue, Jul 19, 2005 at 03:11:35PM -0500, Jim C. Nasby wrote:
 On Tue, Jul 19, 2005 at 01:54:00PM -0600, Michael Fuhr wrote:
  Does this machine have ldd or the equivalent?  If so, can you compare
  ldd /path/to/python and ldd /path/to/plpython.so?
 
 Oddly, no, it doesn't seem to have ldd. And the buildfarm script seems
 to clean everything up even in the pgsqlkeep directories; or at least I
 couldn't find a plpython.so laying around.

[googles]

otool -L appears to be the Darwin equivalent of ldd.  If you can
manage to find a plpython.so then it would be interesting to see
which libpython it's linked against.

Can you search the system for all files named libpython* and post
what you find?

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting. 

Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.






On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:

 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Andrew Dunstan



Jim C. Nasby wrote:


And the buildfarm script seems
to clean everything up even in the pgsqlkeep directories; or at least I
couldn't find a plpython.so laying around.
 



Nothing should be removed. If you are using the experimental code I 
recently gave you all bets are off, but under normal circumstances if 
you run with --keepall then your plpython.so should be there.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create
tables:
http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
1500 customer.tbl
   600037902 lineitem.tbl
  25 nation.tbl
   15000 orders.tbl
2000 part.tbl
8000 partsupp.tbl
   5 region.tbl
 100 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
  On Thu, 14 Jul 2005 17:22:18 -0700
  Alon Goldshuv [EMAIL PROTECTED] wrote:
  
  I revisited my patch and removed the code duplications that were there, and
  added support for CSV with buffered input, so CSV now runs faster too
  (although it is not as optimized as the TEXT format parsing). So now
  TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
  file.
  
  Hi Alon,
  
  I'm curious, what kind of system are you testing this on?  I'm trying to
  load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
  interested in the results you would expect.
  
  Mark
  
 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Alon Goldshuv
Mark,

Thanks for the info.

Yes, isolating indexes out of the picture is a good idea for this purpose.

I can't really give a guess to how fast the load rate should be. I don't
know how your system is configured, and all the hardware characteristics
(and even if I knew that info I may not be able to guess...). I am pretty
confident that the load will be faster than before, I'll risk that ;-)
Looking into your TPC-H size and metadata I'll estimate that
partsupp,customer and orders will have the most significant increase in load
rate. You could start with those.

I guess the only way to really know is to try... Load several times with the
existing PG-COPY and then load several times with the patched COPY and
compare. I'll be curious to hear your results.

Thx,
Alon.

 


On 7/19/05 2:37 PM, Mark Wong [EMAIL PROTECTED] wrote:

 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f
 7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
 1500 customer.tbl
600037902 lineitem.tbl
   25 nation.tbl
15000 orders.tbl
 2000 part.tbl
 8000 partsupp.tbl
5 region.tbl
  100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
 I revisited my patch and removed the code duplications that were there, and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
 file.
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 
 
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Jim C. Nasby
On Tue, Jul 19, 2005 at 02:42:07PM -0600, Michael Fuhr wrote:
 On Tue, Jul 19, 2005 at 03:11:35PM -0500, Jim C. Nasby wrote:
  On Tue, Jul 19, 2005 at 01:54:00PM -0600, Michael Fuhr wrote:
   Does this machine have ldd or the equivalent?  If so, can you compare
   ldd /path/to/python and ldd /path/to/plpython.so?
  
  Oddly, no, it doesn't seem to have ldd. And the buildfarm script seems
  to clean everything up even in the pgsqlkeep directories; or at least I
  couldn't find a plpython.so laying around.
 
 [googles]
 
 otool -L appears to be the Darwin equivalent of ldd.  If you can
 manage to find a plpython.so then it would be interesting to see
 which libpython it's linked against.

I'm going to run a build with the non-experimental version of run_build.pl and
see if I'll have some files left then.
 
 Can you search the system for all files named libpython* and post
 what you find?

[EMAIL PROTECTED]:42]~:11%locate libpython
/Applications/NeoOfficeJ.app/Contents/MacOS/libpython.dylib
/Applications/NeoOfficeJ.app/Contents/MacOS/libpython2.2.dylib
/Applications/NeoOfficeJ.app/Contents/MacOS/libpython2.dylib
/Applications/OpenOffice.org1.1.2/program/libpython.dylib
/Applications/OpenOffice.org1.1.2/program/libpython2.2.dylib
/Applications/OpenOffice.org1.1.2/program/libpython2.dylib
/opt/local/lib/libpython2.4.dylib
/opt/local/lib/python2.4/config/libpython2.4.a
/opt/local/var/db/dports/software/python24/2.4_3/opt/local/lib/libpython2.4.dylib
/opt/local/var/db/dports/software/python24/2.4_3/opt/local/lib/python2.4/config/libpython2.4.a
[EMAIL PROTECTED]:42]~:12%

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Andrew Dunstan

Mark,

You should definitely not be doing this sort of thing, I believe:

CREATE TABLE orders (
o_orderkey INTEGER,
o_custkey INTEGER,
o_orderstatus CHAR(1),
o_totalprice REAL,
o_orderDATE DATE,
o_orderpriority CHAR(15),
o_clerk CHAR(15),
o_shippriority INTEGER,
o_comment VARCHAR(79),
PRIMARY KEY (o_orderkey))

Create the table with no constraints, load the data, then set up primary keys 
and whatever other constraints you want using ALTER TABLE. Last time I did a 
load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from 
deferring constarint creation.


cheers

andrew



Mark Wong wrote:


Hi Alon,

Yeah, that helps.  I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;)  Here's a link to my script to create
tables:
http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
-rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
-rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
-rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
-rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
-rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
-rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
-rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
   1500 customer.tbl
  600037902 lineitem.tbl
 25 nation.tbl
  15000 orders.tbl
   2000 part.tbl
   8000 partsupp.tbl
  5 region.tbl
100 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 


Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting. 


Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.






On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:

   


On Thu, 14 Jul 2005 17:22:18 -0700
Alon Goldshuv [EMAIL PROTECTED] wrote:

 


I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
   


Hi Alon,

I'm curious, what kind of system are you testing this on?  I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
interested in the results you would expect.

Mark

 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Mark Wong
Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.

Mark

On Tue, 19 Jul 2005 18:17:52 -0400
Andrew Dunstan [EMAIL PROTECTED] wrote:

 Mark,
 
 You should definitely not be doing this sort of thing, I believe:
 
 CREATE TABLE orders (
   o_orderkey INTEGER,
   o_custkey INTEGER,
   o_orderstatus CHAR(1),
   o_totalprice REAL,
   o_orderDATE DATE,
   o_orderpriority CHAR(15),
   o_clerk CHAR(15),
   o_shippriority INTEGER,
   o_comment VARCHAR(79),
   PRIMARY KEY (o_orderkey))
 
 Create the table with no constraints, load the data, then set up primary keys 
 and whatever other constraints you want using ALTER TABLE. Last time I did a 
 load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup 
 from deferring constarint creation.
 
 
 cheers
 
 andrew
 
 
 
 Mark Wong wrote:
 
 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
 1500 customer.tbl
600037902 lineitem.tbl
   25 nation.tbl
15000 orders.tbl
 2000 part.tbl
 8000 partsupp.tbl
5 region.tbl
  100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
   
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 
 
 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
   
 
 I revisited my patch and removed the code duplications that were there, 
 and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original 
 file.
 
 
 Hi Alon,
 
 I'm curious, what kind of system are you testing this on?  I'm trying to
 load 100GB of data in our dbt3 workload on a 4-way itanium2.  I'm
 interested in the results you would expect.
 
 Mark
 
   
 
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)

2005-07-19 Thread Simon Riggs
On Sun, 2005-04-24 at 02:28 -0400, Tom Lane wrote:
 In the current code there is no such thing as a hard read-only behavior
 --- for example we will try to update commit-status hint bits no matter
 what.  Allowing that to be turned off would be interesting for a number
 of purposes, such as burning a database onto CD.

Short patch enclosed to turn off writing of commit-status hint bits.

I've called this cache_txn_status_with_data but I'm open to other
suggestions...

This should allow migration of older child tables to hierarchical
storage when using a large historical table design.

Best Regards, Simon Riggs
Index: src/backend/storage/buffer/bufmgr.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v
retrieving revision 1.189
diff -c -c -r1.189 bufmgr.c
*** src/backend/storage/buffer/bufmgr.c	19 May 2005 21:35:46 -	1.189
--- src/backend/storage/buffer/bufmgr.c	19 Jul 2005 22:50:25 -
***
*** 63,68 
--- 63,69 
  
  /* GUC variables */
  bool		zero_damaged_pages = false;
+ boolcache_txn_status_with_data = true;
  double		bgwriter_lru_percent = 1.0;
  double		bgwriter_all_percent = 0.333;
  int			bgwriter_lru_maxpages = 5;
***
*** 1638,1643 
--- 1639,1647 
  		return;
  	}
  
+ if (!cache_txn_status_with_data)
+ return;
+ 
  	bufHdr = BufferDescriptors[buffer - 1];
  
  	Assert(PrivateRefCount[buffer - 1]  0);
Index: src/backend/utils/misc/guc.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.274
diff -c -c -r1.274 guc.c
*** src/backend/utils/misc/guc.c	14 Jul 2005 05:13:42 -	1.274
--- src/backend/utils/misc/guc.c	19 Jul 2005 22:50:34 -
***
*** 487,492 
--- 487,502 
  		false, NULL, NULL
  	},
  	{
+ 		{cache_txn_status_with_data, PGC_USERSET, DEVELOPER_OPTIONS,
+ 			gettext_noop(Writes transaction status to each row.),
+ 			gettext_noop(PostgreSQL normally avoids commit log accesses by cacheing
+ 			 transaction status though this may not be efficient with write-once,
+   then read-only access.)
+ 		},
+ 		cache_txn_status_with_data,
+ 		true, NULL, NULL
+ 	},
+ 	{
  		{full_page_writes, PGC_SIGHUP, WAL_SETTINGS,
  			gettext_noop(Writes full pages to WAL when first modified after a checkpoint.),
  			gettext_noop(A page write in process during an operating system crash might be 
Index: src/include/storage/bufmgr.h
===
RCS file: /projects/cvsroot/pgsql/src/include/storage/bufmgr.h,v
retrieving revision 1.93
diff -c -c -r1.93 bufmgr.h
*** src/include/storage/bufmgr.h	20 Mar 2005 22:00:54 -	1.93
--- src/include/storage/bufmgr.h	19 Jul 2005 22:50:46 -
***
*** 27,32 
--- 27,33 
  
  /* in bufmgr.c */
  extern bool zero_damaged_pages;
+ extern bool cache_txn_status_with_data;
  extern double bgwriter_lru_percent;
  extern double bgwriter_all_percent;
  extern int	bgwriter_lru_maxpages;

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Jim C. Nasby
On Tue, Jul 19, 2005 at 04:51:03PM -0500, Jim C. Nasby wrote:
  Can you search the system for all files named libpython* and post
  what you find?
 
 [EMAIL PROTECTED]:42]~:11%locate libpython
 /Applications/NeoOfficeJ.app/Contents/MacOS/libpython.dylib
 /Applications/NeoOfficeJ.app/Contents/MacOS/libpython2.2.dylib
 /Applications/NeoOfficeJ.app/Contents/MacOS/libpython2.dylib
 /Applications/OpenOffice.org1.1.2/program/libpython.dylib
 /Applications/OpenOffice.org1.1.2/program/libpython2.2.dylib
 /Applications/OpenOffice.org1.1.2/program/libpython2.dylib
 /opt/local/lib/libpython2.4.dylib
 /opt/local/lib/python2.4/config/libpython2.4.a
 /opt/local/var/db/dports/software/python24/2.4_3/opt/local/lib/libpython2.4.dylib
 /opt/local/var/db/dports/software/python24/2.4_3/opt/local/lib/python2.4/config/libpython2.4.a
 [EMAIL PROTECTED]:42]~:12%

[EMAIL 
PROTECTED]:00]~/buildfarm/HEAD/pgsqlkeep.1121809875/src/pl/plpython:41%otool -L 
libplpython.0.0.so 
libplpython.0.0.so:
/System/Library/Frameworks/Python.framework/Versions/2.3/Python 
(compatibility version 2.3.0, current version 2.3.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 71.1.3)
[EMAIL PROTECTED]:00]~/buildfarm/HEAD/pgsqlkeep.1121809875/src/pl/plpython:42%

[EMAIL PROTECTED]:01]~/buildfarm/HEAD/lastrun-logs:12%grep -i python 
configure.log make.log 
configure.log:checking whether to build Python modules... yes
configure.log:checking for python... /opt/local/bin/python
configure.log:checking for Python distutils module... yes
configure.log:checking Python configuration directory... 
/opt/local/lib/python2.4/config
configure.log:checking how to link an embedded Python application... 
-L/opt/local/lib/python2.4/config -lpython2.4 -ldl
make.log:ccache gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes 
-Wpointer-arith -Wendif-labels -fno-strict-aliasing -g -DFRONTEND 
-I../../../src/interfaces/libpq 
-DVAL_CONFIGURE=\'--with-includes=/opt/local/include' 
'--with-libraries=/opt/local/lib' '--enable-cassert' '--enable-debug' 
'--enable-integer-datetimes' '--with-perl' '--with-python' '--with-tcl' 
'--with-openssl' '--enable-depend' '--enable-nls' 
'--prefix=/Users/buildfarm/buildfarm/HEAD/inst' '--with-pgport=5678' 'CC=ccache 
gcc'\ -I../../../src/include -I/opt/local/include  -c -o pg_config.o 
pg_config.c -MMD
make.log:ccache gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes 
-Wpointer-arith -Wendif-labels -fno-strict-aliasing -g  -I. 
-I/opt/local/include/python2.4 -I../../../src/include -I/opt/local/include  -c 
-o plpython.o plpython.c -MMD
make.log:In file included from /opt/local/include/python2.4/Python.h:77,
make.log: from plpython.c:37:
make.log:/opt/local/include/python2.4/objimpl.h:255: warning: use of `long 
double' type; its size may change in a future release
make.log:/opt/local/include/python2.4/objimpl.h:255: warning: (Long double 
usage is reported only once for each file.
make.log:/opt/local/include/python2.4/objimpl.h:255: warning: To disable this 
warning, use -Wno-long-double.)
make.log:ar crs libplpython.a `lorder plpython.o | tsort`
make.log:ranlib libplpython.a
make.log:ccache gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes 
-Wpointer-arith -Wendif-labels -fno-strict-aliasing -g  -bundle  plpython.o  
-L../../../src/port -L/opt/local/lib -bundle_loader 
../../../src/backend/postgres -framework Python  -o libplpython.0.0.so
make.log:rm -f libplpython.0.so
make.log:ln -s libplpython.0.0.so libplpython.0.so
make.log:rm -f libplpython.so
make.log:ln -s libplpython.0.0.so libplpython.so
[EMAIL PROTECTED]:02]~/buildfarm/HEAD/lastrun-logs:13%

Neither /opt/local/lib/libpython2.4.dylib or
/opt/local/lib/python2.4/config/libpython2.4.a reference the /System/Library
python, so I have no idea how it's finding it's way to that...

[EMAIL PROTECTED]:03]~/buildfarm/HEAD/lastrun-logs:13%otool -L 
/opt/local/lib/libpython2.4.dylib
/opt/local/lib/libpython2.4.dylib:
/opt/local/lib/libpython2.4.dylib (compatibility version 2.4.0, current 
version 2.4.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 71.1.1)
[EMAIL PROTECTED]:03]~/buildfarm/HEAD/lastrun-logs:14%otool -L 
/opt/local/lib/python2.4/config/libpython2.4.a 
Archive : /opt/local/lib/python2.4/config/libpython2.4.a
/opt/local/lib/python2.4/config/libpython2.4.a(getbuildinfo.o):
/opt/local/lib/python2.4/config/libpython2.4.a(acceler.o):
/opt/local/lib/python2.4/config/libpython2.4.a(grammar1.o):
/opt/local/lib/python2.4/config/libpython2.4.a(listnode.o):
/opt/local/lib/python2.4/config/libpython2.4.a(node.o):
/opt/local/lib/python2.4/config/libpython2.4.a(parser.o):
/opt/local/lib/python2.4/config/libpython2.4.a(parsetok.o):
/opt/local/lib/python2.4/config/libpython2.4.a(bitset.o):
/opt/local/lib/python2.4/config/libpython2.4.a(metagrammar.o):
/opt/local/lib/python2.4/config/libpython2.4.a(firstsets.o):
/opt/local/lib/python2.4/config/libpython2.4.a(grammar.o):

Re: [PATCHES] WIP XLog Switch

2005-07-19 Thread Simon Riggs
On Sat, 2005-05-21 at 18:38 +0100, Simon Riggs wrote:
 On Fri, 2005-05-20 at 23:16 -0400, Bruce Momjian wrote:
  Any farther on this?
 
 No, but it will be in by deadline.
 
 I need to make first base with partitioning before I can do this. Hence
 my (otherwise rude) ignoring of your sterling work on the exec logging
 patch.

If I were to complete this patch now over next few days, would anybody
object to its inclusion in 8.1 beta?

I don't like to leave things incomplete...

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Patch to fix plpython on OS X

2005-07-19 Thread Michael Fuhr
On Tue, Jul 19, 2005 at 06:06:00PM -0500, Jim C. Nasby wrote:
 [EMAIL 
 PROTECTED]:00]~/buildfarm/HEAD/pgsqlkeep.1121809875/src/pl/plpython:41%otool 
 -L libplpython.0.0.so 
 libplpython.0.0.so:
 /System/Library/Frameworks/Python.framework/Versions/2.3/Python 
 (compatibility version 2.3.0, current version 2.3.0)
 /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
 version 71.1.3)

If that first object has something to do with Python 2.3 then we
might have found the culprit.  But how'd you get that?

 configure.log:checking Python configuration directory... 
 /opt/local/lib/python2.4/config
 configure.log:checking how to link an embedded Python application... 
 -L/opt/local/lib/python2.4/config -lpython2.4 -ldl

The above looks reasonable.

 make.log:ccache gcc -no-cpp-precomp -O2 -Wall -Wmissing-prototypes 
 -Wpointer-arith -Wendif-labels -fno-strict-aliasing -g  -bundle  plpython.o  
 -L../../../src/port -L/opt/local/lib -bundle_loader 
 ../../../src/backend/postgres -framework Python  -o libplpython.0.0.so

Hmmm...what's that -framework Python business?  Looks mighty
suspicious in light of the otool output.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] COPY FROM performance improvements

2005-07-19 Thread Luke Lonergan
Good points on all, another element in the performance expectations is the
ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier.

This patch substantially (500%) improves the efficiency of parsing in the
COPY path, which, on a 3GHz P4 desktop with a commodity disk drive
represents 8 of a total of 30 seconds of processing time.  So, by reducing
the parsing time from 8 seconds to 1.5 seconds, the overall COPY time is
reduced from 30 seconds to 23.5 seconds, or a speedup of about 20%.

On a dual 2.2GHz Opteron machine with a 6-disk SCSI RAID subsystem capable
of 240MB/s sequential read and writes, the ratios change and we see between
35% and 95% increase in COPY performance, with the bottleneck being CPU.
The disk is only running at about 90MB/s during this period.

I'd expect that as your CPUs slow down relative to your I/O speed, and
Itaniums or IT2s are quite slow, you should see an increased effect of the
parsing improvements.

One good way to validate the effect is to watch the I/O bandwidth using
vmstat 1 (on Linux) while the load is progressing.  When you watch that with
the unpatched source and with the patched source, if they are the same, you
should see no benefit from the patch (you are I/O limited).

If you check your underlying sequential write speed, you will be
bottlenecked at roughly half that in performing COPY because of the
write-through the WAL.

- Luke

On 7/19/05 3:51 PM, Mark Wong [EMAIL PROTECTED] wrote:

 Whoopsies, yeah good point about the PRIMARY KEY.  I'll fix that.
 
 Mark
 
 On Tue, 19 Jul 2005 18:17:52 -0400
 Andrew Dunstan [EMAIL PROTECTED] wrote:
 
 Mark,
 
 You should definitely not be doing this sort of thing, I believe:
 
 CREATE TABLE orders (
 o_orderkey INTEGER,
 o_custkey INTEGER,
 o_orderstatus CHAR(1),
 o_totalprice REAL,
 o_orderDATE DATE,
 o_orderpriority CHAR(15),
 o_clerk CHAR(15),
 o_shippriority INTEGER,
 o_comment VARCHAR(79),
 PRIMARY KEY (o_orderkey))
 
 Create the table with no constraints, load the data, then set up primary keys
 and whatever other constraints you want using ALTER TABLE. Last time I did a
 load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup
 from deferring constarint creation.
 
 
 cheers
 
 andrew
 
 
 
 Mark Wong wrote:
 
 Hi Alon,
 
 Yeah, that helps.  I just need to break up my scripts a little to just
 load the data and not build indexes.
 
 Is the following information good enough to give a guess about the data
 I'm loading, if you don't mind? ;)  Here's a link to my script to create
 tables:
 http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb4
 4f7f23437e432path=scripts/pgsql/create_tables.sh.in
 
 File sizes:
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 customer.tbl
 -rw-r--r--  1 markw 50  74G Jul  8 15:03 lineitem.tbl
 -rw-r--r--  1 markw 50 2.1K Jul  8 15:03 nation.tbl
 -rw-r--r--  1 markw 50  17G Jul  8 15:03 orders.tbl
 -rw-r--r--  1 markw 50 2.3G Jul  8 15:03 part.tbl
 -rw-r--r--  1 markw 50  12G Jul  8 15:03 partsupp.tbl
 -rw-r--r--  1 markw 50  391 Jul  8 15:03 region.tbl
 -rw-r--r--  1 markw 50 136M Jul  8 15:03 supplier.tbl
 
 Number of rows:
 # wc -l *.tbl
1500 customer.tbl
   600037902 lineitem.tbl
  25 nation.tbl
   15000 orders.tbl
2000 part.tbl
8000 partsupp.tbl
   5 region.tbl
 100 supplier.tbl
 
 Thanks,
 Mark
 
 On Tue, 19 Jul 2005 14:05:56 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
  
 
 Hi Mark,
 
 I improved the data *parsing* capabilities of COPY, and didn't touch the
 data conversion or data insertion parts of the code. The parsing
 improvement
 will vary largely depending on the ratio of parsing -to- converting and
 inserting. 
 
 Therefore, the speed increase really depends on the nature of your data:
 
 100GB file with
 long data rows (lots of parsing)
 Small number of columns (small number of attr conversions per row)
 less rows (less tuple insertions)
 
 Will show the best performance improvements.
 
 However, same file size 100GB with
 Short data rows (minimal parsing)
 large number of columns (large number of attr conversions per row)
 AND/OR
 more rows (more tuple insertions)
 
 Will show improvements but not as significant.
 In general I'll estimate 40%-95% improvement in load speed for the 1st case
 and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
 etc... This is for TEXT format. As for CSV, it may be faster but not as
 much
 as I specified here. BINARY will stay the same as before.
 
 HTH
 Alon.
 
 
 
 
 
 
 On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote:
 

 
 On Thu, 14 Jul 2005 17:22:18 -0700
 Alon Goldshuv [EMAIL PROTECTED] wrote:
 
  
 
 I revisited my patch and removed the code duplications that were there,
 and
 added support for CSV with buffered input, so CSV now runs faster too
 (although it is not as optimized as the TEXT format parsing). So now
 TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
 file.
   

Re: [PATCHES] A couple of patches for PostgreSQL 64bit support

2005-07-19 Thread Koichi Suzuki

Mark,

I've not seen CVS in detail.   I begain this work against 8.0.1 and 
continued thru 8.0.2 to 8.0.3.  It was not a great work.   The patch is 
rather straightforward and I appreciate if you try to port against CVS.


Mark Wong wrote:

Hi,

I grabbed the patches to try, but I was wondering if it would be more
interesting to try them against CVS rather than 8.0.3 (and if it would
be easy to port :)?

Mark




--
---
Koichi Suzuki
Open Source Engineeering Departmeent,
NTT DATA Intellilink Corporation
Phone: +81-3-5566-9628  WWW: http://www.intellilink.co.jp
--

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)

2005-07-19 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Short patch enclosed to turn off writing of commit-status hint bits.

Doesn't this entirely destroy the ability to truncate clog, and
therefore the ability to survive XID wraparound?

It probably also breaks subxact and multixact logging, but I haven't
looked closely...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] Fix for interval division/multiplication

2005-07-19 Thread Bruce Momjian
I working on the 'date' interval patch, I found a bug in handling of
interval division and multiplication.  Look at this:

test= select '4 months'::interval / 5;
   ?column?
---
 1 mon -6 days
(1 row)

With the attached fix, it returns the right value:

test= select '4 months'::interval / 5;
 ?column?
--
 24 days
(1 row)

The bug is the use of rint() to round the division, but then using the
rounded value to find the remainder to pass down to the time field.

The attached patch has been applied to current CVS.  Should I backpatch
this fix?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/adt/timestamp.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.132
diff -c -c -r1.132 timestamp.c
*** src/backend/utils/adt/timestamp.c   12 Jul 2005 16:04:58 -  1.132
--- src/backend/utils/adt/timestamp.c   20 Jul 2005 03:38:52 -
***
*** 2201,2207 
result-time += (months - result-month) * INT64CONST(30) *
USECS_PER_DAY;
  #else
!   result-month = rint(months);
result-time = JROUND(span1-time * factor);
/* evaluate fractional months as 30 days */
result-time += JROUND((months - result-month) * 30 * SECS_PER_DAY);
--- 2201,2207 
result-time += (months - result-month) * INT64CONST(30) *
USECS_PER_DAY;
  #else
!   result-month = (int)months;
result-time = JROUND(span1-time * factor);
/* evaluate fractional months as 30 days */
result-time += JROUND((months - result-month) * 30 * SECS_PER_DAY);
***
*** 2246,2252 
INT64CONST(30) * USECS_PER_DAY) / 
factor;
  #else
months = span-month / factor;
!   result-month = rint(months);
result-time = JROUND(span-time / factor);
/* evaluate fractional months as 30 days */
result-time += JROUND((months - result-month) * 30 * SECS_PER_DAY);
--- 2246,2252 
INT64CONST(30) * USECS_PER_DAY) / 
factor;
  #else
months = span-month / factor;
!   result-month = (int)months;
result-time = JROUND(span-time / factor);
/* evaluate fractional months as 30 days */
result-time += JROUND((months - result-month) * 30 * SECS_PER_DAY);

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match