Re: [PATCHES] Auto Partitioning Patch - WIP version 1
Hi all, I think the intent of the syntax / parser patch from Gavin and Jeff was to get consensus from PG on the syntax prior to proceeding with the next chunk of work. The next chunk of work is now well underway - with support for ALTER TABLE and partitioning, along with fast inserts into the parent table. This involves changes to the catalog, so we'll also need to discuss this as part of a submission. GP is in the middle of merging 8.3 into our product, so it will be a few weeks at least before we can push any more info to the list. Was there consensus on the syntax? IIRC, there was a cessation of contrary comments on the matter. If so, the parser patch was provided earlier - we could posibly refresh it. The way it works in our dev branch now is that the partition syntax is turned off by default using a GUC, but is fully functional wrt creating rules, etc. This allows for experimentation. - Original Message - From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: Tom Lane [EMAIL PROTECTED] Cc: Bruce Momjian [EMAIL PROTECTED]; Simon Riggs [EMAIL PROTECTED]; pgsql-patches@postgresql.org pgsql-patches@postgresql.org Sent: Sat Mar 22 01:19:01 2008 Subject: Re: [PATCHES] Auto Partitioning Patch - WIP version 1 Hi, On Fri, Mar 21, 2008 at 9:23 PM, Tom Lane [EMAIL PROTECTED] wrote: Bruce Momjian [EMAIL PROTECTED] writes: NikhilS wrote: Thanks for taking a look. But if I am not mistaken Gavin and co. are working on a much exhaustive proposal. In light of that maybe this patch might not be needed in the first place? I will wait for discussion and a subsequent collective consensus here, before deciding the further course of actions. I think it is unwise to wait on Gavin for a more complex implemention --- we might end up with nothing for 8.4. As long as your syntax is compatible with whatever Gavin proposed Gavin can add on to your patch once it is applied. It would be equally unwise to apply a stopgap patch if we're not certain it will be upward compatible with what we want to do later. I haven't been through the partitioning threads at all yet, but I think what we probably want to have when we emerge from commit fest is some consensus on what the road map is for partitioning. +2 Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] CopyReadLineText optimization
Cool! It's been a while since we've done the same kind of thing :-) - Luke -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Heikki Linnakangas Sent: Saturday, February 23, 2008 5:30 PM To: pgsql-patches@postgresql.org Subject: [PATCHES] CopyReadLineText optimization The purpose of CopyReadLineText is to scan the input buffer, and find the next newline, taking into account any escape characters. It currently operates in a loop, one byte at a time, searching for LF, CR, or a backslash. That's a bit slow: I've been running oprofile on COPY, and I've seen CopyReadLine to take around ~10% of the CPU time, and Joshua Drake just posted a very similar profile to hackers. Attached is a patch that modifies CopyReadLineText so that it uses memchr to speed up the scan. The nice thing about memchr is that we can take advantage of any clever optimizations that might be in libc or compiler. In the tests I've been running, it roughly halves the time spent in CopyReadLine (including the new memchr calls), thus reducing the total CPU overhead by ~5%. I'm planning to run more tests with data that has backslashes and with different width tables to see what the worst-case and best-case performance is like. Also, it doesn't work for CSV format at the moment; that needs to be fixed. 5% isn't exactly breathtaking, but it's a start. I tried the same trick to CopyReadAttributesText, but unfortunately it doesn't seem to help there because you need to stop the efficient word-at-a-time scan that memchr does (at least with glibc, YMMV) whenever there's a column separator, while in CopyReadLineText you get to process the whole line in one call, assuming there's no backslashes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] Bitmap index scan preread using posix_fadvise (Was: There's random access and then there's random access)
Nice! - Luke On 1/30/08 9:22 AM, Gregory Stark [EMAIL PROTECTED] wrote: Here's the WIP patch for doing prereading when doing bitmap index scans. I was performance testing it as I was developing it here: http://archives.postgresql.org/pgsql-hackers/2007-12/msg00395.php Note that this only kicks in for bitmap index scans which are kind of tricky to generate. I used the attached function to generate them in the post above. Also note I wouldn't expect to see much benefit unless you're on a raid array, even a small one. But if you are on a raid array then the benefit should be immediately obvious or else posix_fadvise just isn't working for you. I would be interested in hearing on which OSes it does or doesn't work. *If* this is the approach we want to take rather than restructure the buffer manager to avoid taking two trips by marking the buffer i/o-in-progress and saving the pinned buffer in the bitmap heap scan then this is more or less in final form. Aside from some autoconf tests and the documentation for the GUC I think it's all in there. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] updated hash functions for postgresql v1
We just applied this and saw a 5 percent speedup on a hash aggregation query with four colums in a 'group by' clause run against a single TPC-H table (lineitem). CK - can you post the query? - Luke Msg is shrt cuz m on ma treo -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Sunday, October 28, 2007 04:11 PM Eastern Standard Time To: Kenneth Marshall Cc: pgsql-patches@postgresql.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject:Re: [PATCHES] updated hash functions for postgresql v1 On Sun, 2007-10-28 at 13:05 -0500, Kenneth Marshall wrote: On Sun, Oct 28, 2007 at 05:27:38PM +, Simon Riggs wrote: On Sat, 2007-10-27 at 15:15 -0500, Kenneth Marshall wrote: Its features include a better and faster hash function. Looks very promising. Do you have any performance test results to show it really is faster, when compiled into Postgres? Better probably needs some definition also; in what way are the hash functions better? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com The new hash function is roughly twice as fast as the old function in terms of straight CPU time. It uses the same design as the current hash but provides code paths for aligned and unaligned access as well as separate mixing functions for different blocks in the hash run instead of having one general purpose block. I think the speed will not be an obvious win with smaller items, but will be very important when hashing larger items (up to 32kb). Better in this case means that the new hash mixes more thoroughly which results in less collisions and more even bucket distribution. There is also a 64-bit varient which is still faster since it can take advantage of the 64-bit processor instruction set. Ken, I was really looking for some tests that show both of the above were true. We've had some trouble proving the claims of other algorithms before, so I'm less inclined to take those things at face value. I'd suggest tests with Integers, BigInts, UUID, CHAR(20) and CHAR(100). Others may have different concerns. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] updated hash functions for postgresql v1
That's on Greenplum latest. We used this query to expose CPU heavy aggregation. The 1GB overall TPCH size is chosen to fit into the RAM of a typical workstation/laptop with 2GB of RAM. That ensures the time is spent in the CPU processing of the hashagg, which is what we'd like to measure here. The PG performance will be different, but the measurement approach should be the same IMO. The only suggestion to make it easier is to use 250MB scale factor, as we use four cores against 1GB. The principal is the same. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Sunday, October 28, 2007 04:48 PM Eastern Standard Time To: CK.Tan Cc: Luke Lonergan; Kenneth Marshall; pgsql-patches@postgresql.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject:Re: [PATCHES] updated hash functions for postgresql v1 On Sun, 2007-10-28 at 13:19 -0700, CK Tan wrote: Hi, this query on TPCH 1G data gets about 5% improvement. select count (*) from (select l_orderkey, l_partkey, l_comment, count(l_tax) from lineitem group by 1, 2, 3) tmpt; On Oct 28, 2007, at 1:17 PM, Luke Lonergan wrote: We just applied this and saw a 5 percent speedup on a hash aggregation query with four colums in a 'group by' clause run against a single TPC-H table (lineitem). CK - can you post the query? Is this on Postgres or Greenplum? That looks like quite a wide set of columns. Sounds good though. Can we get any more measurements in? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Re: [PATCHES] Seq scans status update
Hi All, On 5/31/07 12:40 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: BTW, we've been talking about the L2 cache effect but we don't really know for sure if the effect has anything to do with the L2 cache. But whatever it is, it's real. The mailing list archives contain the ample evidence of: - it's definitely an L2 cache effect - on fast I/O hardware tests show large benefits of keeping the ring in L2 I see no reason to re-open the discussion about these, can we accept these as fact and continue? - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] Seq scans status update
Hi Heikki, On 5/17/07 10:28 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: is also visible on larger scans that don't fit in cache with bigger I/O hardware, and this patch would increase the max. I/O throughput that we can handle on such hardware. I don't have such hardware available, I hope someone else will try that. Yes, this is absolutely the case, in addition to the benefits of not polluting the bufcache with seq scans (as discussed in detail previously). We've adopted this (see CK's patch) with excellent benefits. We can try your version on a machine with fast I/O and get back to you with a comparison of this and CK's version. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Avg performance for int8/numeric
So, if I understand this correctly, we're calling Alloc and ContextAlloc 10 times for every row being summed? There are approx 10M rows and the profile snippet below shows 100M calls to each of those. - Luke On 11/24/06 4:46 PM, Mark Kirkwood [EMAIL PROTECTED] wrote: time seconds secondscalls s/call s/call name 14.42 2.16 2.16 12977 0.00 0.00 AllocSetAlloc 9.08 3.52 1.36 2000 0.00 0.00 add_abs 5.54 4.35 0.83 1000 0.00 0.00 slot_deform_tuple 5.41 5.16 0.81 60001673 0.00 0.00 AllocSetFree 4.34 5.81 0.65 1000 0.00 0.00 construct_md_array 4.21 6.44 0.63 2003 0.00 0.00 make_result 3.54 6.97 0.53 1000 0.00 0.00 numeric_add 3.27 7.46 0.49 3003 0.00 0.00 set_var_from_num 3.00 7.91 0.45 12652 0.00 0.00 MemoryContextAlloc ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Bitmap index AM
Bruce, We have a bitmap index AM in Bizgres (on PG 8.1.3) that is functional and achieves very worthwhile (10x+) performance and space objectives. It is a large patch, and introduces the access method along with modifying the appropriate executor nodes. The latter part was necessary because of the need to bypass the in-memory bitmap index when an on-disk bitmap is available. Because this patch is large, how do you suggest we go through review? Also, there is some further work that Jie is doing to support efficient multi-column indexes that will simplify the code, so we're not quite ready for patch submission. - Luke On 6/12/06 9:13 AM, Bruce Momjian pgman@candle.pha.pa.us wrote: Victor Yegorov wrote: Hi again. Here's an updated patch, that fixes several bugs and is synced with HEAD. Are you closer to submitting this patch for application? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [PATCH] Improve EXPLAIN ANALYZE overhead by
Martijn, On 5/11/06 12:17 AM, Martijn van Oosterhout kleptog@svana.org wrote: Did you test it? There are some cases where this might still leave some noticable overhead (high loop count). I'm just not sure if they occur all that often in practice... I've sent it to our team for testing, let's see if we get some info to forward. We're running the 10TB TPC-H case and I'm asking for EXPLAIN ANALYZE that might take days to complete, so we certainly have some test cases for this ;-) - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [PATCH] Improve EXPLAIN ANALYZE overhead by
Nice one Martijn - we have immediate need for this, as one of our sizeable queries under experimentation took 3 hours without EXPLAIN ANALYZE, then over 20 hours with it... - Luke On 5/9/06 2:38 PM, Martijn van Oosterhout kleptog@svana.org wrote: On Tue, May 09, 2006 at 05:16:57PM -0400, Rocco Altier wrote: - To get this close it needs to get an estimate of the sampling overhead. It does this by a little calibration loop that is run once per backend. If you don't do this, you end up assuming all tuples take the same time as tuples with the overhead, resulting in nodes apparently taking longer than their parent nodes. Incidently, I measured the overhead to be about 3.6us per tuple per node on my (admittedly slightly old) machine. Could this be deferred until the first explain analyze? So that we aren't paying the overhead of the calibration in all backends, even the ones that won't be explaining? If you look it's only done on the first call to InstrAlloc() which should be when you run EXPLAIN ANALYZE for the first time. In any case, the calibration is limited to half a millisecond (that's 500 microseconds), and it'll be a less on fast machines. Have a nice day, ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] Page at a time index scan
Tom, On 5/8/06 11:46 AM, Tom Lane [EMAIL PROTECTED] wrote: I made a table of 16M rows with an index over a random-data integer column. With a thoroughly disordered index (built on-the-fly as the random data was inserted), the time to VACUUM after deleting a small number of rows was 615 seconds with yesterday's code, 31 seconds today. With a perfectly-ordered index (identical table, but CREATE INDEX after all the data is in place), the times were about 28 and 26 seconds respectively. Very impressive! This corroborates findings we've had with index maintenance in the field - thanks for finding/fixing this. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] WIP: further sorting speedup
Title: Re: [PATCHES] WIP: further sorting speedup Cool! Well test this sometime soon and get back to you. Were kind of jammed this week, hopefully well get some time. So you know, weve done some more work on the external sort to remove the tape abstraction from the code, which makes a significant improvement. This involved removing both the Knuth tapes, and the logtape.c codepath. The result is a reasonable improvement in performance (tens of percent), and a dramatic reduction in the amount of code. Since were looking for a 4-fold improvement based on comparisons to other commercial databases, we feel were not done yet. Our next step (before we got jammed getting our latest MPP release out) was to implement these: Locate the cause for the excessive time in heap_getattr (you just did it) Implement something other than replacement selection for creating runs to optimize cache use - Luke On 2/19/06 6:40 PM, Tom Lane [EMAIL PROTECTED] wrote: After applying Simon's recent sort patch, I was doing some profiling and noticed that sorting spends an unreasonably large fraction of its time extracting datums from tuples (heap_getattr or index_getattr). The attached patch does something about this by pulling out the leading sort column of a tuple when it is received by the sort code or re-read from a tape. This increases the space needed by 8 or 12 bytes (depending on sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as the on-disk representation goes. The effort needed to extract the datum at this point is well repaid because the tuple will normally undergo multiple comparisons while it remains in memory. In some quick tests the patch seemed to make for a significant speedup, on the order of 30%, despite increasing the number of runs emitted because of the smaller available memory. The choice to pull out just the leading column, rather than all columns, is driven by concerns of (a) code complexity and (b) memory space. Having the extra columns pre-extracted wouldn't buy anything anyway in the common case where the leading key determines the result of a comparison. This is still WIP because it leaks memory intra-query (I need to fix it to clean up palloc'd space better). I thought I'd post it now in case anyone wants to try some measurements for their own favorite test cases. In particular it would be interesting to see what happens for a multi-column sort with lots of duplicated keys in the first column, which is the case where the least advantage would be gained. Comments? regards, tom lane
Re: [PATCHES] WIP: further sorting speedup
Title: Re: [PATCHES] WIP: further sorting speedup The improvement was pre-Simons patch, and it came from implementing a single pass merge instead of a variable pass based on the number of tapes, as it is in Knuths tape algorithm. Also, the additional tricks in logtape.c were higher in the profile than what I see here. Simons patch had the effect of reducing the number of passes by increasing the number of tapes depending on the memory available, but thats a long tail effect as seen in figure (70?) in Knuth. Where Id like this to go is the implementation of a two pass create runs, merge, where the second merge can be avoided unless random access is needed (as discussed previously on list). In the run creation phase, the idea would be to implement something like quicksort or another L2-cache friendly algorithm (ideas?) - Luke On 2/19/06 8:19 PM, Tom Lane [EMAIL PROTECTED] wrote: Luke Lonergan [EMAIL PROTECTED] writes: So you know, we=B9ve done some more work on the external sort to remove the =B3tape=B2 abstraction from the code, which makes a significant improvement. Improvement where? That code's down in the noise so far as I can tell. I see results like this (with the patched code): CPU: P4 / Xeon with 2 hyper-threads, speed 2793.08 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 24 samples % symbol name 147310 31.9110 tuplesort_heap_siftup 68381 14.8130 comparetup_index 34063 7.3789 btint4cmp 22573 4.8899 AllocSetAlloc 19317 4.1845 writetup_index 18953 4.1057 tuplesort_gettuple_common 18100 3.9209 mergepreread 17083 3.7006 GetMemoryChunkSpace 12527 2.7137 LWLockAcquire 11686 2.5315 LWLockRelease 6172 1.3370 tuplesort_heap_insert 5392 1.1680 index_form_tuple 5323 1.1531 PageAddItem 4943 1.0708 LogicalTapeWrite 4525 0.9802 LogicalTapeRead 4487 0.9720 LockBuffer 4217 0.9135 heapgettup 3891 0.8429 IndexBuildHeapScan 3862 0.8366 ltsReleaseBlock It appears that a lot of the cycles blamed on tuplesort_heap_siftup are due to cache misses associated with referencing memtuples[] entries that have fallen out of L2 cache. Not sure how to improve that though. regards, tom lane
Re: [PATCHES] OS X shared memory documentation
Thanks for this Jim! We'll test it and report results. - Luke From: [EMAIL PROTECTED] on behalf of Jim C. Nasby Sent: Sat 2/11/2006 9:26 PM To: pgsql-patches@postgresql.org Subject: [PATCHES] OS X shared memory documentation I finally figured out what the issue was with OS X blowing away shared memory settings from /etc/sysctl.conf. I added a note to the 8.1 docs already, and here's a patch against -HEAD. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [BUGS] BUG #2114: (patch) COPY FROM ... end of
Bruce, On 12/27/05 10:20 AM, Bruce Momjian pgman@candle.pha.pa.us wrote: OK, original patch applied to HEAD and smaller version to 8.1.X, and regression test added, now attached. Great, good catch. Have you tested performance, before and after? The only good way to test performance is using a fast enough I/O subsystem that you are CPU-bound, which means 60MB/s of write speed. I'd be happy to get you an account on one. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY FROM performance improvements
Simon, That part of the code was specifically written to take advantage of processing pipelines in the hardware, not because the actual theoretical algorithm for that approach was itself faster. Yup, good point. Nobody's said what compiler/hardware they have been using, so since both Alon and Tom say their character finding logic is faster, it is likely to be down to that? Name your platforms gentlemen, please. In this case, we've been using gcc (3.2.3 RHEL3 Linux, 3.4.3 Solaris 10) on Opteron and Intel Xeon and Pentium 4. Alon's performance comparisons for the parse only were done on a HT enabled P4 3.0GHz on RHEL3 with gcc 3.2.3, probably with optimization -O2, but possibly -O3. Note that the level of microparallelism on upcoming CPUs is increasing with increasing pipeline depth. Though there will be a step back on the Intel line with the introduction of the Centrino-based Xeon cores in 2006/7, other CPUs continue the trend, and I expect the next generation of multi-core CPUs to possibly introduce threaded micro-architectures which can also be scheduled as pipelines. The gcc 4 compiler introduces auto vectorization, which may enhance the optimality of some loops. I think the key thing is to make as much parallelism apparent to the compiler as possible, which will generally mean loops. This means faster code on all modern CPUs and it won't hurt older CPU speeds. My feeling is that we may learn something here that applies more widely across many parts of the code. Yes, I think one thing we've learned is that there are important parts of the code, those that are in the data path (COPY, sort, spill to disk, etc) that are in dire need of optimization. For instance, the fgetc() pattern should be banned everywhere in the data path. BTW - we are tracking down (in our spare time :-() the extremely slow sort performance. We're seeing sort times of 1.7MB/s on our fastest machines, even when the work_mem is equal to the square root of the sort set. This is a *serious* problem for us and we aren't getting to it - ideas are welcome. Optimization here means both the use of good fundamental algorithms and micro-optimization (minimize memory copies, expose long runs of operations to the compiler, maximize computational intensity by working in cache-resident blocks, etc). - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] COPY FROM performance improvements
Alvaro, On 8/10/05 9:46 AM, Alvaro Herrera [EMAIL PROTECTED] wrote: AFAIR he never claimed otherwise ... his point was that to gain that additional speedup, the code has to be made considerable worse (in maintenability terms.) Have you (or Alon) tried to port the rest of the speed improvement to the new code? Maybe it's possible to have at least some of it without worsening the maintenability too badly. As I suggested previously, there is another, more maintainable way to get more performance from the parsing logic. It involves replacing something like this: char c = input_routine() if (c == '\n') { else if ( . . . } With something like this: char [32] carr; nread = Input_routine_new(carr,32) for (i=0; inread; i++) { if (carr[I] == '\n') { . . . } And this section would run much faster (3x?). This is what I think could make the overall patch 50% faster than it is now (on the parsing part). The issue that I expect we'll hear about is that since the parsing is already 500% faster, it has vanished in the profile. That's why Tom's testing is not showing much difference between his and Alon's code, we actually drop the other sections to bring it forward where we see the bigger difference. However, what I'm arguing here and elsewhere is that there's still a lot more of this kind of optimization to be done. 12 MB/s COPY speed is not enough. There's 40% of the time in processing left to smack down. Another question that comes to mind is: have you tried another compiler? I see you are all using GCC at most 3.4; maybe the new optimizing infrastructure in GCC 4.1 means you can have most of the speedup without uglifying the code. What about Intel's compiler? We have routinely distributed PostgreSQL with the Intel compiler, up until recently. Interestingly, GCC now beats it handily in our tests on Opteron and matches it on Xeon, which is too bad - it's my fav compiler. The problem with this code is that it doesn't have enough micro-parallelism without loops on the character parsing core. The compiler can only do register optimizations and branch prediction (poorly) unless it is given more to work with. PostgreSQL needs major improvement to compete with Oracle and even MySQL on speed. No whacking on the head is going to change that. Certainly. I think the point is what cost do we want to pay for the speedup. I think we all agree that even if we gain a 200% speedup by rewriting COPY in assembly, it's simply not acceptable. Understood, and I totally agree. Another point may be that Bizgres can have a custom patch for the extra speedup, without inflicting the maintenance cost on the community. We are committed to making Postgres the best DBMS for Business Intelligence. Bizgres makes it safe for businesses to rely on open source for their production uses. As far as features go, I think the best way for our customers is to make sure that Bizgres features are supporting the PostgreSQL core and vis-versa. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY FROM performance improvements
Tom, Thanks for finding the bugs and reworking things. I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. You achieve the important objective of knocking the parsing stage down a lot, but your parsing code is actually about 20% slower than Alon's. Before your patch: Time: 14205.606 ms With your patch: Time: 10565.374 ms With Alon's patch: Time: 10289.845 ms The parsing part of the code in your version is slower, but as a percentage of the total it's hidden. The loss of 0.3 seconds on 143MB means: - If parsing takes a total of 0.9 seconds, the parsing rate is 160MB/s (143/0.9) - If we add another 0.3 seconds to parsing to bring it to 1.2, then the parsing rate becomes 120MB/s When we improve the next stages of the processing (attribute conversion, write-to disk), this will stand out a lot more. Our objective is to get the COPY rate *much* faster than the current poky rate of 14MB/s (after this patch). - Luke On 8/6/05 2:04 PM, Tom Lane [EMAIL PROTECTED] wrote: Alon Goldshuv [EMAIL PROTECTED] writes: New patch attached. It includes very minor changes. These are changes that were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the previous patch. I've applied this with (rather extensive) revisions. I didn't like what you had done with the control structure --- loading the input buffer only at the outermost loop level was a bad design choice IMHO. You had sprinkled the code with an unreasonable number of special cases in order to try to cope with the effects of that mistake, but there were lots of problems still left. Some of the bugs I noticed: * Broke old-protocol COPY, since that has no provision for stopping at the EOF marker except by parsing the data carefully to start with. The backend would just hang up unless the total data size chanced to be a multiple of 64K. * Subtle change in interpretation of \. EOF marker (the existing code will recognize it even when not at start of line). * Seems to have thrown away detection of newline format discrepancies. * Fails for zero-column tables. * Broke display of column values during error context callback (would always show the first column contents no matter which one is being complained of). * DetectLineEnd mistakenly assumes CR mode if very last character of first bufferload is CR; need to reserve judgment until next char is available. * DetectLineEnd fails to account for backslashed control characters, so it will e.g. accept \ followed by \n as determining the newline style. * Fails to apply encoding conversion if first line exceeds copy buf size, because when DetectLineEnd fails the quick-exit path doesn't do it. * There seem to be several bugs associated with the fact that input_buf[] always has 64K of data in it even when EOF has been reached on the input. One example: echo -n 123 zzz1 psql create temp table t1(f1 text); psql copy t1 from '/home/tgl/zzz1'; psql select * from t1; hmm ... where'd that 64K of whitespace come from? I rewrote the patch in a way that retained most of the speedups without changing the basic control structure (except for replacing multiple CopyReadAttribute calls with one CopyReadAttributes call per line). I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] COPY FROM performance improvements
Tom, The previous timings were for a table with 15 columns of mixed type. We also test with 1 column to make the parsing overhead more apparent. In the case of 1 text column with 145MB of input data: Your patch: Time: 6612.599 ms Alon's patch: Time: 6119.244 ms Alon's patch is 7.5% faster here, where it was only 3% faster on the 15 column case. This is consistent with a large difference in parsing speed between your approach and Alon's. I'm pretty sure that the mistake you refer to is responsible for the speed improvement, and was deliberately chosen to minimize memory copies, etc. Given that we're looking ahead to getting much higher speeds, approaching current high performance disk speeds, we've been looking more closely at the parsing speed. It comes down to a tradeoff between elegant code and speed. We'll prove it in lab tests soon, where we measure the parsing rate directly, but these experiments show it clearly, though indirectly. - Luke On 8/6/05 2:04 PM, Tom Lane [EMAIL PROTECTED] wrote: Alon Goldshuv [EMAIL PROTECTED] writes: New patch attached. It includes very minor changes. These are changes that were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the previous patch. I've applied this with (rather extensive) revisions. I didn't like what you had done with the control structure --- loading the input buffer only at the outermost loop level was a bad design choice IMHO. You had sprinkled the code with an unreasonable number of special cases in order to try to cope with the effects of that mistake, but there were lots of problems still left. Some of the bugs I noticed: * Broke old-protocol COPY, since that has no provision for stopping at the EOF marker except by parsing the data carefully to start with. The backend would just hang up unless the total data size chanced to be a multiple of 64K. * Subtle change in interpretation of \. EOF marker (the existing code will recognize it even when not at start of line). * Seems to have thrown away detection of newline format discrepancies. * Fails for zero-column tables. * Broke display of column values during error context callback (would always show the first column contents no matter which one is being complained of). * DetectLineEnd mistakenly assumes CR mode if very last character of first bufferload is CR; need to reserve judgment until next char is available. * DetectLineEnd fails to account for backslashed control characters, so it will e.g. accept \ followed by \n as determining the newline style. * Fails to apply encoding conversion if first line exceeds copy buf size, because when DetectLineEnd fails the quick-exit path doesn't do it. * There seem to be several bugs associated with the fact that input_buf[] always has 64K of data in it even when EOF has been reached on the input. One example: echo -n 123 zzz1 psql create temp table t1(f1 text); psql copy t1 from '/home/tgl/zzz1'; psql select * from t1; hmm ... where'd that 64K of whitespace come from? I rewrote the patch in a way that retained most of the speedups without changing the basic control structure (except for replacing multiple CopyReadAttribute calls with one CopyReadAttributes call per line). I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY FROM performance improvements
Tom, My direct e-mails to you are apparently blocked, so I'll send this to the list. I've attached the case we use for load performance testing, with the data generator modified to produce a single row version of the dataset. I do believe that you/we will need to invert the processing loop to get the maximum parsing speed. We will be implementing much higher loading speeds which require it to compete with Oracle, Netezza, Teradata, so we'll have to work this out for the best interests of our users. - Luke IVP.tgz Description: Binary data ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] COPY FROM performance improvements
Tom, On 8/6/05 9:08 PM, Tom Lane [EMAIL PROTECTED] wrote: Luke Lonergan [EMAIL PROTECTED] writes: I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. You achieve the important objective of knocking the parsing stage down a lot, but your parsing code is actually about 20% slower than Alon's. I would like to see the exact test case you are using to make this claim; the tests I did suggested my code is the same speed or faster. I showed mine - you show yours :-) Apparently our e-mail crossed. As best I can tell, my version of CopyReadAttributes is significantly quicker than Alon's, approximately balancing out the fact that my version of CopyReadLine is slower. I did the latter first, and would now be tempted to rewrite it in the same style as CopyReadAttributes, ie one pass of memory-to-memory copy using pointers rather than buffer indexes. See previous timings - looks like Alon's parsing is substantially faster. However, I'd like him to confirm by running with the shunt placed at different stages, in this case between parse and attribute conversion (not attribute parse). BTW, late today I figured out a way to get fairly reproducible non-I/O-bound numbers about COPY FROM: use a trigger that suppresses the actual inserts, thus: create table foo ... create function noway() returns trigger as 'begin return null; end' language plpgsql; create trigger noway before insert on foo for each row execute procedure noway(); then repeat: copy foo from '/tmp/foo.data'; Cool! That's a better way than hacking code and inserting shunts. Alon will likely hit this tomorrow. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY FROM performance improvements
Cool! At what rate does your disk setup write sequential data, e.g.: time dd if=/dev/zero of=bigfile bs=8k count=50 (sized for 2x RAM on a system with 2GB) BTW - the Compaq smartarray controllers are pretty broken on Linux from a performance standpoint in our experience. We've had disastrously bad results from the SmartArray 5i and 6 controllers on kernels from 2.4 - 2.6.10, on the order of 20MB/s. For comparison, the results on our dual opteron with a single LSI SCSI controller with software RAID0 on a 2.6.10 kernel: [EMAIL PROTECTED] dbfast]$ time dd if=/dev/zero of=bigfile bs=8k count=50 50+0 records in 50+0 records out real0m24.702s user0m0.077s sys 0m8.794s Which calculates out to about 161MB/s. - Luke On 7/21/05 2:55 PM, Mark Wong [EMAIL PROTECTED] wrote: I just ran through a few tests with the v14 patch against 100GB of data from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to give a few details, I only loaded data and started a COPY in parallel for each the data files: http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/ Here's a visual of my disk layout, for those familiar with the database schema: http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4 -010-dbt3.html I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers. Let me know if you have any questions. Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY FROM performance improvements
Joshua, On 7/21/05 5:08 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: O.k. this strikes me as interesting, now we know that Compaq and Dell are borked for Linux. Is there a name brand server (read Enterprise) that actually does provide reasonable performance? I think late model Dell (post the bad chipset problem, circa 2001-2?) and IBM and Sun servers are fine because they all use simple SCSI adapters from LSI or Adaptec. The HP Smartarray is an aberration, they don't have good driver support for Linux and as a consequence have some pretty bad problems with both performance and stability. On Windows they perform quite well. Also - there are very big issues with some SATA controllers and Linux we've seen, particularly the Silicon Image, Highpoint other non-Intel controllers. Not sure about Nvidia, but the only ones I trust now are 3Ware and the others mentioned in earlier posts. - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY FROM performance improvements
Joshua, On 7/21/05 7:53 PM, Joshua D. Drake [EMAIL PROTECTED] wrote: Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are still really, really slow. That's too bad, can you cite some model numbers? SCSI? I have great success with Silicon Image as long as I am running them with Linux software RAID. The LSI controllers are also really nice. That's good to hear, I gave up on Silicon Image controllers on Linux about 1 year ago, which kernel are you using with success? Silicon Image controllers are the most popular, so it's important to see them supported well, though I'd rather see more SATA headers than 2 off of the built-in chipsets. - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY FROM performance improvements
Good points on all, another element in the performance expectations is the ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier. This patch substantially (500%) improves the efficiency of parsing in the COPY path, which, on a 3GHz P4 desktop with a commodity disk drive represents 8 of a total of 30 seconds of processing time. So, by reducing the parsing time from 8 seconds to 1.5 seconds, the overall COPY time is reduced from 30 seconds to 23.5 seconds, or a speedup of about 20%. On a dual 2.2GHz Opteron machine with a 6-disk SCSI RAID subsystem capable of 240MB/s sequential read and writes, the ratios change and we see between 35% and 95% increase in COPY performance, with the bottleneck being CPU. The disk is only running at about 90MB/s during this period. I'd expect that as your CPUs slow down relative to your I/O speed, and Itaniums or IT2s are quite slow, you should see an increased effect of the parsing improvements. One good way to validate the effect is to watch the I/O bandwidth using vmstat 1 (on Linux) while the load is progressing. When you watch that with the unpatched source and with the patched source, if they are the same, you should see no benefit from the patch (you are I/O limited). If you check your underlying sequential write speed, you will be bottlenecked at roughly half that in performing COPY because of the write-through the WAL. - Luke On 7/19/05 3:51 PM, Mark Wong [EMAIL PROTECTED] wrote: Whoopsies, yeah good point about the PRIMARY KEY. I'll fix that. Mark On Tue, 19 Jul 2005 18:17:52 -0400 Andrew Dunstan [EMAIL PROTECTED] wrote: Mark, You should definitely not be doing this sort of thing, I believe: CREATE TABLE orders ( o_orderkey INTEGER, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice REAL, o_orderDATE DATE, o_orderpriority CHAR(15), o_clerk CHAR(15), o_shippriority INTEGER, o_comment VARCHAR(79), PRIMARY KEY (o_orderkey)) Create the table with no constraints, load the data, then set up primary keys and whatever other constraints you want using ALTER TABLE. Last time I did a load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from deferring constarint creation. cheers andrew Mark Wong wrote: Hi Alon, Yeah, that helps. I just need to break up my scripts a little to just load the data and not build indexes. Is the following information good enough to give a guess about the data I'm loading, if you don't mind? ;) Here's a link to my script to create tables: http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb4 4f7f23437e432path=scripts/pgsql/create_tables.sh.in File sizes: -rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 customer.tbl -rw-r--r-- 1 markw 50 74G Jul 8 15:03 lineitem.tbl -rw-r--r-- 1 markw 50 2.1K Jul 8 15:03 nation.tbl -rw-r--r-- 1 markw 50 17G Jul 8 15:03 orders.tbl -rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 part.tbl -rw-r--r-- 1 markw 50 12G Jul 8 15:03 partsupp.tbl -rw-r--r-- 1 markw 50 391 Jul 8 15:03 region.tbl -rw-r--r-- 1 markw 50 136M Jul 8 15:03 supplier.tbl Number of rows: # wc -l *.tbl 1500 customer.tbl 600037902 lineitem.tbl 25 nation.tbl 15000 orders.tbl 2000 part.tbl 8000 partsupp.tbl 5 region.tbl 100 supplier.tbl Thanks, Mark On Tue, 19 Jul 2005 14:05:56 -0700 Alon Goldshuv [EMAIL PROTECTED] wrote: Hi Mark, I improved the data *parsing* capabilities of COPY, and didn't touch the data conversion or data insertion parts of the code. The parsing improvement will vary largely depending on the ratio of parsing -to- converting and inserting. Therefore, the speed increase really depends on the nature of your data: 100GB file with long data rows (lots of parsing) Small number of columns (small number of attr conversions per row) less rows (less tuple insertions) Will show the best performance improvements. However, same file size 100GB with Short data rows (minimal parsing) large number of columns (large number of attr conversions per row) AND/OR more rows (more tuple insertions) Will show improvements but not as significant. In general I'll estimate 40%-95% improvement in load speed for the 1st case and 10%-40% for the 2nd. But that also depends on the hardware, disk speed etc... This is for TEXT format. As for CSV, it may be faster but not as much as I specified here. BINARY will stay the same as before. HTH Alon. On 7/19/05 12:54 PM, Mark Wong [EMAIL PROTECTED] wrote: On Thu, 14 Jul 2005 17:22:18 -0700 Alon Goldshuv [EMAIL PROTECTED] wrote: I revisited my patch and removed the code duplications that were there, and added support for CSV with buffered input, so CSV now runs faster too (although it is not as optimized as the TEXT format parsing). So now TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
Re: [PATCHES] backslashes in pgindent
Bruce, I found that parentheses in gawk regular expressions require backslashes so they are not treated as regex groupings: $ echo '('|awk '$0 ~ /(/ {print $0}' awk: cmd. line:1: fatal: Unmatched ( or \(: /(/ $ echo '('|awk '$0 ~ /\(/ {print $0}' ( Now, it seems closing parentheses are OK because there is no open group, but I think I should use backslashes there too: $ echo ')'|awk '$0 ~ /)/ {print $0}' ) $ echo ')'|awk '$0 ~ /\)/ {print $0}' Does your awk produce different results? What version is it? Mine is GNU Awk 3.0.6. Yes - on the last test, mine emits the ) and yours apparently does not. The version I ran with is 3.1.4. The escaped parenthesis in the unpatched pgindent causes the following warning: $ pgindent test.c Hope you installed /src/tools/pgindent/indent.bsd.patch. awk: cmd. line:12: warning: escape sequence `\)' treated as plain `)' Which implies an unnecessary escaping, which appears to function correctly without the escape. Cheers, - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] backslashes in pgindent
Bruce, On 7/15/05 9:59 PM, Bruce Momjian pgman@candle.pha.pa.us wrote: Actually, mine returns ')' too for the last command. I didn't copy that into the email. How about the top tests? Notice I get an error on the first one without the backslash. Are you OK escaping '(' but not ')'? That might be a solution. You know, I'm not sure - I don't know the intended meaning of this line: awk ' BEGIN {line1 = ; line2 = } { line2 = $0; if (NR = 2) print line1; if (NR = 2 line2 ~ ^{[]*$ line1 !~ ^struct line1 !~ ^enum line1 !~ ^typedef line1 !~ ^extern[ ][ ]*\C\ line1 !~ = = line1 ~ \)) print int pgindent_func_no_var_fix;; line1 = line2; } END Is the escaped paren within meant to be a literal? - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY fast parse patch
Andrew, OK ... that seems fair enough. The next question is where the data being loaded comes from? pg_dump? How does load speed compare with using COPY's binary mode? Oddly, our tests in the past have shown that binary is actually slower. Luke ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] COPY fast parse patch
Andrew, I will be the first to admit that there are probably some very good possibilities for optimisation of this code. My impression though has been that in almost all cases it's fast enough anyway. I know that on some very modest hardware I have managed to load a 6m row TPC line-items table in just a few minutes. Before we start getting too hung up, I'd be interested to know just how much data people want to load and how fast they want it to be. If people have massive data loads that take hours, days or weeks then it's obviously worth improving if we can. I'm curious to know what size datasets people are really handling this way. x0+ GB files are common in data warehousing. The issue is often can we load our data within the time allotted for the batch window, usually a matter of an hour or two. Assuming that TPC lineitem is 140Bytes/row, 6M rows in 3 minutes is 4.7 MB/s. To load a 10GB file at that rate takes about 2/3 hour. If one were to restore a 300GB database, it would take 18 hours. Maintenance operations are impractical after a few hours, 18 is a non-starter. In practice, we're usually replacing an Oracle system with PostgreSQL, and the load speed difference between the two is currently embarrassing and makes the work impractical. - Luke ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org