Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Andrew Dunstan
Gregory Stark wrote: The only file formats I ever wanted when I was doing this kind of stuff is tab separated, all the others (comma separated and (egads) *pipe* separated?!) are just disasters. Others of us have to operate in a world where we don't get to choose the format of data we

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes: > "Tom Lane" <[EMAIL PROTECTED]> writes: >> Of what use is the above comment? You have to parse the input into >> fields somehow. > Well those two requirements aren't inconsistent if you're using fixed-width > input text files. Not that I'm enamoured of s

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > "Simon Riggs" <[EMAIL PROTECTED]> writes: >> Feedback from someone else looking to the problem last year. IIRC there >> was a feeling that if we didn't have to search for delimiters the COPY >> FROM input parsing could be easier. > > Of what use is the abo

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 13:24 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > Feedback from someone else looking to the problem last year. IIRC there > > was a feeling that if we didn't have to search for delimiters the COPY > > FROM input parsing could be easier. > > Of what

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes: > Feedback from someone else looking to the problem last year. IIRC there > was a feeling that if we didn't have to search for delimiters the COPY > FROM input parsing could be easier. Of what use is the above comment? You have to parse the input into fie

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 12:09 -0500, Andrew Dunstan wrote: > OK. I'm still curious to know what the issues are with delimiter handling. Rumours only. Feedback from someone else looking to the problem last year. IIRC there was a feeling that if we didn't have to search for delimiters the COPY FROM

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Andrew Dunstan
Simon Riggs wrote: On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote: Simon Riggs wrote: IIRC there are issues with delimiter handling when we have lots of columns in the input on COPY FROM, and num of cols on COPY TO. I've not looked at those recently though. What sor

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote: > Simon Riggs wrote: > > > > IIRC there are issues with delimiter handling when we have lots of > > columns in the input on COPY FROM, and num of cols on COPY TO. I've not > > looked at those recently though. > > > > > > What sort of iss

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Andrew Dunstan
Simon Riggs wrote: IIRC there are issues with delimiter handling when we have lots of columns in the input on COPY FROM, and num of cols on COPY TO. I've not looked at those recently though. What sort of issues? Anything that breaks on this has catastrophic consequences. cheers andrew

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 16:25 +, Gregory Stark wrote: > "Tom Lane" <[EMAIL PROTECTED]> writes: > > > "Simon Riggs" <[EMAIL PROTECTED]> writes: > >> I'm slightly worried though since that seems to have changed from 8.2, > >> which I oprofiled over Christmas. > > > > If you were testing a case wit

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes: > "Simon Riggs" <[EMAIL PROTECTED]> writes: >> I'm slightly worried though since that seems to have changed from 8.2, >> which I oprofiled over Christmas. > > If you were testing a case with wider rows than Heikki tested, you'd see > less impact --- the cost

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes: > I'm slightly worried though since that seems to have changed from 8.2, > which I oprofiled over Christmas. If you were testing a case with wider rows than Heikki tested, you'd see less impact --- the cost of the old way was O(N^2) in the number of tuples

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 10:09 +, Heikki Linnakangas wrote: > Well, there's one big change: your patch to suppress WAL logging on > tables created in the same transaction. OK, just checking thats what you meant. > All the page locking related functions account for ~10% in total, > including

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Heikki Linnakangas
Simon Riggs wrote: On Thu, 2007-03-01 at 17:01 +, Heikki Linnakangas wrote: I ran oprofile on a COPY FROM to get an overview of where the CPU time is spent. To my amazement, the function at the top of the list was PageAddItem with 16% of samples. Excellent. I'm slightly worried though s

Re: [PATCHES] A little COPY speedup

2007-03-02 Thread Simon Riggs
On Thu, 2007-03-01 at 17:01 +, Heikki Linnakangas wrote: > I ran oprofile on a COPY FROM to get an overview of where the CPU time > is spent. To my amazement, the function at the top of the list was > PageAddItem with 16% of samples. Excellent. I'm slightly worried though since that seems

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Tom Lane
I wrote: > Barring objections, I'll tweak this as above and apply. I've applied the attached modified version of this patch. It seemed better to me to centralize the handling of this flag bit in PageAddItem and PageRepairFragmentation, instead of having it in the callers as you did. This means t

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: >> I'll post a patch along those lines. > Here it is. > I'm not fond of the macro names for the flag, but couldn't think of > anything shorter yet descriptive. Let's reverse the sense of the flag bit; this seems a good idea since the initial state

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Tom Lane wrote: I'm not sure whether I like your flag approach better than the last-used-offset one. The previous patch probably buys some teeny amount more performance, but the flag seems more robust (noting in passing that neither patch attempts to WAL-log its change

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Heikki Linnakangas
Tom Lane wrote: As you say, pd_tli is not really pulling its weight, but I'm also loath to remove it, as in a multi-timeline situation the page LSN is really not well defined if you don't know which timeline it refers to. Now we'd only need 16 bits to store the last-used offset, or a flags field

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > At the end of the thread, Bruce added the patch to his hold-queue, but I > couldn't find a trace of it after that so I'm not clear why it was > rejected in the end. This comment (by you) seems most relevant: I believe we concluded that the distrib

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: On every row, PageAddItem will scan all the line pointers on the target page, just to see that they're all in use, and create a new line pointer. That adds up, especially with narrow tuples like what I used in the test. Attached i

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > On every row, PageAddItem will scan all the line pointers on the target > page, just to see that they're all in use, and create a new line > pointer. That adds up, especially with narrow tuples like what I used in > the test. > Attached is a fix f

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Andrew Dunstan
Heikki Linnakangas wrote: One complaint we've heard from clients trying out EDB or PostgreSQL is that loading data is slower than on other DBMSs. I ran oprofile on a COPY FROM to get an overview of where the CPU time is spent. To my amazement, the function at the top of the list was PageAddIt

Re: [PATCHES] A little COPY speedup

2007-03-01 Thread Pavan Deolasee
Heikki Linnakangas wrote: Attached is a fix for that. It adds a flag to each heap page that indicates that "there isn't any free line pointers on this page, so don't bother trying". Heap pages haven't had any heap-specific per-page data before, so this patch adds a HeapPageOpaqueData-struct

[PATCHES] A little COPY speedup

2007-03-01 Thread Heikki Linnakangas
One complaint we've heard from clients trying out EDB or PostgreSQL is that loading data is slower than on other DBMSs. I ran oprofile on a COPY FROM to get an overview of where the CPU time is spent. To my amazement, the function at the top of the list was PageAddItem with 16% of samples. O