Re: [PATCHES] COPY FROM performance improvements
Tom, On 8/6/05 9:08 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > "Luke Lonergan" <[EMAIL PROTECTED]> writes: >>> I had some difficulty in generating test cases that weren't largely >>> I/O-bound, but AFAICT the patch as applied is about the same speed >>> as what you submitted. > >> You achieve the important objective of knocking the parsing stage down a >> lot, but your parsing code is actually about 20% slower than Alon's. > > I would like to see the exact test case you are using to make this > claim; the tests I did suggested my code is the same speed or faster. I showed mine - you show yours :-) Apparently our e-mail crossed. > As best I can tell, my version of CopyReadAttributes is significantly > quicker than Alon's, approximately balancing out the fact that my > version of CopyReadLine is slower. I did the latter first, and would > now be tempted to rewrite it in the same style as CopyReadAttributes, > ie one pass of memory-to-memory copy using pointers rather than buffer > indexes. See previous timings - looks like Alon's parsing is substantially faster. However, I'd like him to confirm by running with the "shunt" placed at different stages, in this case between parse and attribute conversion (not attribute parse). > BTW, late today I figured out a way to get fairly reproducible > non-I/O-bound numbers about COPY FROM: use a trigger that suppresses > the actual inserts, thus: > > create table foo ... > create function noway() returns trigger as > 'begin return null; end' language plpgsql; > create trigger noway before insert on foo > for each row execute procedure noway(); > then repeat: > copy foo from '/tmp/foo.data'; Cool! That's a better way than hacking code and inserting shunts. Alon will likely hit this tomorrow. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY FROM performance improvements
"Luke Lonergan" <[EMAIL PROTECTED]> writes: >> I had some difficulty in generating test cases that weren't largely >> I/O-bound, but AFAICT the patch as applied is about the same speed >> as what you submitted. > You achieve the important objective of knocking the parsing stage down a > lot, but your parsing code is actually about 20% slower than Alon's. I would like to see the exact test case you are using to make this claim; the tests I did suggested my code is the same speed or faster. The particular test case I was using was the "tenk1" data from the regression database, duplicated out to about 600K rows so as to run long enough to measure with some degree of repeatability. As best I can tell, my version of CopyReadAttributes is significantly quicker than Alon's, approximately balancing out the fact that my version of CopyReadLine is slower. I did the latter first, and would now be tempted to rewrite it in the same style as CopyReadAttributes, ie one pass of memory-to-memory copy using pointers rather than buffer indexes. BTW, late today I figured out a way to get fairly reproducible non-I/O-bound numbers about COPY FROM: use a trigger that suppresses the actual inserts, thus: create table foo ... create function noway() returns trigger as 'begin return null; end' language plpgsql; create trigger noway before insert on foo for each row execute procedure noway(); then repeat: copy foo from '/tmp/foo.data'; If the source file is not too large to fit in kernel disk cache, then after the first iteration there is no I/O at all. I got numbers that were reproducible within less than 1%, as opposed to 5% or more variation when the thing was partially I/O bound. Pretty useless in the real world, of course, but great for timing COPY's data-pushing. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] COPY FROM performance improvements
Tom, My direct e-mails to you are apparently blocked, so I'll send this to the list. I've attached the case we use for load performance testing, with the data generator modified to produce a single row version of the dataset. I do believe that you/we will need to invert the processing loop to get the maximum parsing speed. We will be implementing much higher loading speeds which require it to compete with Oracle, Netezza, Teradata, so we'll have to work this out for the best interests of our users. - Luke IVP.tgz Description: Binary data ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] COPY FROM performance improvements
Tom, The previous timings were for a table with 15 columns of mixed type. We also test with 1 column to make the parsing overhead more apparent. In the case of 1 text column with 145MB of input data: Your patch: Time: 6612.599 ms Alon's patch: Time: 6119.244 ms Alon's patch is 7.5% faster here, where it was only 3% faster on the 15 column case. This is consistent with a large difference in parsing speed between your approach and Alon's. I'm pretty sure that the "mistake" you refer to is responsible for the speed improvement, and was deliberately chosen to minimize memory copies, etc. Given that we're looking ahead to getting much higher speeds, approaching current high performance disk speeds, we've been looking more closely at the parsing speed. It comes down to a tradeoff between elegant code and speed. We'll prove it in lab tests soon, where we measure the parsing rate directly, but these experiments show it clearly, though indirectly. - Luke On 8/6/05 2:04 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > "Alon Goldshuv" <[EMAIL PROTECTED]> writes: >> New patch attached. It includes very minor changes. These are changes that >> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the >> previous patch. > > I've applied this with (rather extensive) revisions. I didn't like what > you had done with the control structure --- loading the input buffer > only at the outermost loop level was a bad design choice IMHO. You had > sprinkled the code with an unreasonable number of special cases in order > to try to cope with the effects of that mistake, but there were lots > of problems still left. Some of the bugs I noticed: > > * Broke old-protocol COPY, since that has no provision for stopping at > the EOF marker except by parsing the data carefully to start with. The > backend would just hang up unless the total data size chanced to be a > multiple of 64K. > > * Subtle change in interpretation of \. EOF marker (the existing code > will recognize it even when not at start of line). > > * Seems to have thrown away detection of newline format discrepancies. > > * Fails for zero-column tables. > > * Broke display of column values during error context callback (would > always show the first column contents no matter which one is being > complained of). > > * DetectLineEnd mistakenly assumes CR mode if very last character of first > bufferload is CR; need to reserve judgment until next char is available. > > * DetectLineEnd fails to account for backslashed control characters, > so it will e.g. accept \ followed by \n as determining the newline > style. > > * Fails to apply encoding conversion if first line exceeds copy buf > size, because when DetectLineEnd fails the quick-exit path doesn't do > it. > > * There seem to be several bugs associated with the fact that input_buf[] > always has 64K of data in it even when EOF has been reached on the > input. One example: > echo -n 123 >zzz1 > psql> create temp table t1(f1 text); > psql> copy t1 from '/home/tgl/zzz1'; > psql> select * from t1; > hmm ... where'd that 64K of whitespace come from? > > I rewrote the patch in a way that retained most of the speedups without > changing the basic control structure (except for replacing multiple > CopyReadAttribute calls with one CopyReadAttributes call per line). > > I had some difficulty in generating test cases that weren't largely > I/O-bound, but AFAICT the patch as applied is about the same speed > as what you submitted. > > regards, tom lane > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY FROM performance improvements
Tom, Thanks for finding the bugs and reworking things. > I had some difficulty in generating test cases that weren't largely > I/O-bound, but AFAICT the patch as applied is about the same speed > as what you submitted. You achieve the important objective of knocking the parsing stage down a lot, but your parsing code is actually about 20% slower than Alon's. Before your patch: Time: 14205.606 ms With your patch: Time: 10565.374 ms With Alon's patch: Time: 10289.845 ms The parsing part of the code in your version is slower, but as a percentage of the total it's hidden. The loss of 0.3 seconds on 143MB means: - If parsing takes a total of 0.9 seconds, the parsing rate is 160MB/s (143/0.9) - If we add another 0.3 seconds to parsing to bring it to 1.2, then the parsing rate becomes 120MB/s When we improve the next stages of the processing (attribute conversion, write-to disk), this will stand out a lot more. Our objective is to get the COPY rate *much* faster than the current poky rate of 14MB/s (after this patch). - Luke On 8/6/05 2:04 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > "Alon Goldshuv" <[EMAIL PROTECTED]> writes: >> New patch attached. It includes very minor changes. These are changes that >> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the >> previous patch. > > I've applied this with (rather extensive) revisions. I didn't like what > you had done with the control structure --- loading the input buffer > only at the outermost loop level was a bad design choice IMHO. You had > sprinkled the code with an unreasonable number of special cases in order > to try to cope with the effects of that mistake, but there were lots > of problems still left. Some of the bugs I noticed: > > * Broke old-protocol COPY, since that has no provision for stopping at > the EOF marker except by parsing the data carefully to start with. The > backend would just hang up unless the total data size chanced to be a > multiple of 64K. > > * Subtle change in interpretation of \. EOF marker (the existing code > will recognize it even when not at start of line). > > * Seems to have thrown away detection of newline format discrepancies. > > * Fails for zero-column tables. > > * Broke display of column values during error context callback (would > always show the first column contents no matter which one is being > complained of). > > * DetectLineEnd mistakenly assumes CR mode if very last character of first > bufferload is CR; need to reserve judgment until next char is available. > > * DetectLineEnd fails to account for backslashed control characters, > so it will e.g. accept \ followed by \n as determining the newline > style. > > * Fails to apply encoding conversion if first line exceeds copy buf > size, because when DetectLineEnd fails the quick-exit path doesn't do > it. > > * There seem to be several bugs associated with the fact that input_buf[] > always has 64K of data in it even when EOF has been reached on the > input. One example: > echo -n 123 >zzz1 > psql> create temp table t1(f1 text); > psql> copy t1 from '/home/tgl/zzz1'; > psql> select * from t1; > hmm ... where'd that 64K of whitespace come from? > > I rewrote the patch in a way that retained most of the speedups without > changing the basic control structure (except for replacing multiple > CopyReadAttribute calls with one CopyReadAttributes call per line). > > I had some difficulty in generating test cases that weren't largely > I/O-bound, but AFAICT the patch as applied is about the same speed > as what you submitted. > > regards, tom lane > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] COPY FROM performance improvements
"Alon Goldshuv" <[EMAIL PROTECTED]> writes: > New patch attached. It includes very minor changes. These are changes that > were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the > previous patch. I've applied this with (rather extensive) revisions. I didn't like what you had done with the control structure --- loading the input buffer only at the outermost loop level was a bad design choice IMHO. You had sprinkled the code with an unreasonable number of special cases in order to try to cope with the effects of that mistake, but there were lots of problems still left. Some of the bugs I noticed: * Broke old-protocol COPY, since that has no provision for stopping at the EOF marker except by parsing the data carefully to start with. The backend would just hang up unless the total data size chanced to be a multiple of 64K. * Subtle change in interpretation of \. EOF marker (the existing code will recognize it even when not at start of line). * Seems to have thrown away detection of newline format discrepancies. * Fails for zero-column tables. * Broke display of column values during error context callback (would always show the first column contents no matter which one is being complained of). * DetectLineEnd mistakenly assumes CR mode if very last character of first bufferload is CR; need to reserve judgment until next char is available. * DetectLineEnd fails to account for backslashed control characters, so it will e.g. accept \ followed by \n as determining the newline style. * Fails to apply encoding conversion if first line exceeds copy buf size, because when DetectLineEnd fails the quick-exit path doesn't do it. * There seem to be several bugs associated with the fact that input_buf[] always has 64K of data in it even when EOF has been reached on the input. One example: echo -n 123 >zzz1 psql> create temp table t1(f1 text); psql> copy t1 from '/home/tgl/zzz1'; psql> select * from t1; hmm ... where'd that 64K of whitespace come from? I rewrote the patch in a way that retained most of the speedups without changing the basic control structure (except for replacing multiple CopyReadAttribute calls with one CopyReadAttributes call per line). I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] O_DIRECT for WAL writes
Here are comments that Daniel McNeil made earlier, which I've neglected to forward earlier. I've cc'ed him and Mark Havercamp, which some of you got to meet the other day. Mark - With O_DIRECT on Linux, when the write() returns the i/o has been transferred to the disk. Normally, this i/o will be DMAed directly from user-space to the device. The current exception is when doing an O_DIRECT write to a hole in a file. (If an program does a truncate() or lseek()/write() that makes a file larger, the file system does not allocated space between the old end of file and the new end of file.) An O_DIRECT write to hole like this, requires the file system to allocated space, but there is a race condition between the O_DIRECT write doing the allocate and then write to initialized the newly allocated data and any other process that attempts a buffered (page cache) read of the same area in the file -- it was possible for the read to data from the allocated region before the O_DIRECT write(). The fix in Linux is for the O_DIRECT write() to fall back to use buffer i/o to do the write() and flush the data from the page cache to the disk. A write() with O_DIRECT only means the data has been transferred to the disk. Depending on the file system and mount options, it does not mean the meta data for the file has been written to disk (see fsync man page). Fsync() will guarantee the data and metadata have been written to disk. Lastly, if a disk has a write back cache, an O_DIRECT write() does not guarantee that the disk has put the data on the physical media. I think some of the journal file systems now support i/o barriers on commit which will flush the disk write back cache. (I'm still looking the kernel code to see how this is done). Conclusion: O_DIRECT + fsync() can make sense. It avoids the copying of data to the page cache before being written and will also guarantee that the file's metadata is also written to disk. It also prevents the page cache from filling up with write data that will never be read (I assume it is only read if a recovery is necessary - which should be rare). It can also helps disks with write back cache when using the journaling file system that use i/o barriers. You would want to use large writes, since the kernel page cache won't be writing multiple pages for you. I need to look at the kernel code more to comment on O_DIRECT with O_SYNC. Questions: Does the database transaction logger preallocate the log file? Does the logger care about the order in which each write hits the disk? Now someone else can comment on my comments. Daniel ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] default tablespace for roles
Laszlo Hornyak <[EMAIL PROTECTED]> writes: > It would be nice to have tablespaces for each users. This is a small > pathc that does the job. Isn't this entirely redundant with the existing default_tablespace GUC variable, ie, ALTER USER foo SET default_tablespace = whatever; The patch's behavior of overriding session-local settings of default_tablespace seems quite undesirable in any case. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PATCHES] default tablespace for roles
Hi! It would be nice to have tablespaces for each users. This is a small pathc that does the job. - gramar file: "alter|create user ... with default tablespace 'tblspc'" added; - new column in pg_authid: roltblspc, an Oid referring to the pg_tablespace - at alter/add role neccessary default tablepsace modifications+checking if it exists - pg_dumall outputs script that alters users with default tablespace after dumping users and tablespaces (for 8.2, if you like it) Regards, Laszlo Index: src/include/catalog/pg_authid.h === RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_authid.h,v retrieving revision 1.3 diff -r1.3 pg_authid.h 51a52 > Oidroltblspc;/* default tablespace oid */ 74c75 < #define Natts_pg_authid11 --- > #define Natts_pg_authid12 82,85c83,87 < #define Anum_pg_authid_rolconnlimit8 < #define Anum_pg_authid_rolpassword9 < #define Anum_pg_authid_rolvaliduntil10 < #define Anum_pg_authid_rolconfig11 --- > #define Anum_pg_authid_roltblspc8 > #define Anum_pg_authid_rolconnlimit9 > #define Anum_pg_authid_rolpassword10 > #define Anum_pg_authid_rolvaliduntil11 > #define Anum_pg_authid_rolconfig12 94c96 < DATA(insert OID = 10 ( "POSTGRES" t t t t t t -1 _null_ _null_ _null_ )); --- > DATA(insert OID = 10 ( "POSTGRES" t t t t t t 1663 -1 _null_ _null_ _null_ )); Index: src/bin/pg_dump/pg_dumpall.c === RCS file: /projects/cvsroot/pgsql/src/bin/pg_dump/pg_dumpall.c,v retrieving revision 1.66 diff -r1.66 pg_dumpall.c 718a719,735 > > res = executeQuery(conn, "SELECT rolname, spcname from pg_catalog.pg_authid inner join pg_catalog.pg_tablespace on pg_tablespace.oid = pg_authid.roltblspc"); > if (PQntuples(res) > 0) > printf("--\n-- User default tablespaces\n--\n\n"); > > for (i = 0; i < PQntuples(res); i++) > { > char*rolname = PQgetvalue(res, i, 0); > char*spcname = PQgetvalue(res, i, 1); > > printf("ALTER ROLE %s WITH DEFAULT TABLESPACE '%s';\n", rolname, spcname); > > } > > PQclear(res); > printf("\n\n"); > Index: src/backend/parser/gram.y === RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v retrieving revision 2.507 diff -r2.507 gram.y 690a691,694 > | DEFAULT TABLESPACE Sconst > { > $$ = makeDefElem("roltblspc", (Node *)makeString($3)); > } Index: src/backend/commands/user.c === RCS file: /projects/cvsroot/pgsql/src/backend/commands/user.c,v retrieving revision 1.160 diff -r1.160 user.c 93a94 > char *tblspc = NULL; 104a106 > DefElem*dtblspc = NULL; 123a126,133 > if(strcmp(defel->defname, "roltblspc") == 0){ > if(dtblspc) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > errmsg("conflicting or redundant options"))); > dtblspc = defel; > } else > 227a238,239 > if (dtblspc) > tblspc = strVal(dtblspc -> arg); 307a320,328 > if(tblspc) { > Oid userTblSpc; > userTblSpc = get_tablespace_oid(tblspc); > if(!OidIsValid(userTblSpc)) > elog(ERROR, "Tablespace %s does not exist", tblspc); > new_record[Anum_pg_authid_roltblspc -1] = DatumGetObjectId(userTblSpc); > } else { > new_record_nulls[Anum_pg_authid_roltblspc -1] = 'n'; > } 419a441 > char *tblspc = NULL; 429a452 > DefElem*dtblspc = NULL; 435a459,466 > if(strcmp(defel->defname, "roltblspc") == 0){ > if(dtblspc) > ereport(ERROR, > (errcode(ERRCODE_SYNTAX_ERROR), > errmsg("conflicting or redundant options"))); > dtblspc = defel; > } else > 538c569,571 < --- > if(dtblspc) { > tblspc = strVal(dtblspc -> arg); > } 653a687,692 > if(tblspc) > { > new_record[Anum_pg_authid_roltblspc -1] = get_tablespace_oid(tblspc); > new_record_repl[Anum_pg_authid_roltblspc -1] = 'r'; > } > Index: src/backend/commands/tablespace.c === RCS file: /projects/cvsroot/pgsql/src/backend/commands/tablespace.c,v retrieving revision 1.26 diff -r1.26 tablespace.c 56a57 > #include "catalog/pg_authid.h" 888a890,902 > OiduserDefTblSpc; > HeapTupleauthId; > OidsessionUser; > > /* Get the session users default tablespace */ > sessionUser = GetSessionUserId(); > authId = SearchSysCache(AUTHOID, ObjectIdGetDatum(sessionUser), 0, 0, 0); > userDefTblSpc = ((Form_pg_authid) GETSTRUCT(