Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic
Christoper, On 2/15/06 11:14 PM, Christopher Kings-Lynne [EMAIL PROTECTED] wrote: Any comments on this? Is he referring to EnterpriseDB extensions that they don't make public? I've noticed a lot of press lately is mentioning their name next to ingres as an alternative to MySQL, so the MySQL folks might be feeling some Postgres heat from their direction. I also wonder where their project is too - they seem publicly opaque about progress, etc. From the web site's statements it looks like they've written a tool to tune the postgresql.conf file from which they claim a 50% speed-up, but that's not new or unique fork-level functionality. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic
Any comments on this? Is he referring to EnterpriseDB extensions that they don't make public? I've noticed a lot of press lately is mentioning their name next to ingres as an alternative to MySQL, so the MySQL folks might be feeling some Postgres heat from their direction. I also wonder where their project is too - they seem publicly opaque about progress, etc. From the web site's statements it looks like they've written a tool to tune the postgresql.conf file from which they claim a 50% speed-up, but that's not new or unique fork-level functionality. What they don't say is whether that is a 50% speed up from the default settings or a 50% increase from a carefully hand tunes file. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Generating config stuff from single source
Am Donnerstag, 16. Februar 2006 02:50 schrieb Tom Lane: That's fine for users, but what new demands are you about to place on developers? Does this require tools not already needed in order to build from a CVS pull? (There's sure no xsltproc on this machine...) It is to be expected that sooner or later we'll move from SGML to XML documentation builds, at which point xsltproc will become a semi-requirement anyway. I don't think this requirement is too onerous; libxslt is portable and easy to install. The m4 idea seems attractive to me because that's already effectively required as part of the autoconf infrastructure (and I think bison uses it too these days). That is true, but I'm afraid that this will lead to code that only a few people will be able to maintain. (Try programming a loop in m4 to start.) A similar issue that's been bothering me for awhile is that it'd be a lot less error prone if keywords.c and the keyword list productions in gram.y were generated off a common declarative source (which might as well produce the keyword documentation appendix too). That could be a job for m4, I think. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic
Christopher Kings-Lynne wrote: http://www.flamingspork.com/blog/2006/02/16/enterprisedb-where-is-the-source/ Any comments on this? Is he referring to EnterpriseDB extensions that they don't make public? I think so. Trying to battle the perception that EnterpriseDB is an open source database. Seems though that little effort is made to understand the actual relationship between EnterpriseDB and PostGreSQL. Looks like an attempt at pitting dual license GPL/closed source vs. proprietary BSD based. regards, Lukas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Generating config stuff from single source
On Thu, Feb 16, 2006 at 02:36:01AM +0100, Peter Eisentraut wrote: We are currently maintaining information about configuration parameters in at least three places: the documentation, guc.c, and postgresql.conf.sample. I would like to generate these from a single source. Computationally, this is not very challenging, it's just a bit of work. I imagine as the source an XML file with a custom schema; see below for an example. I think this is the best source format because it allows integrating the DocBook-formatted descriptions without too much trouble and it allows for file format validation. An alternative might be m4 but that would not offer these features. To process this we'd use XSLT stylesheets run through xsltproc. We'd run this part during the tarball building phase, so users would not need it. Obviously, all of this will need some fine-tuning, but can we agree on this general direction? Is there any reason why we can't just use something like awk? It's installed almost everywhere (it may be required, not sure) and a lot more people know how to use it. I havn't managed to wrap my brain around xslt yet. The input file could be simply line based. Attached is a simple script that takes the input below and produces something resembling what you suggested. It wouldn't be too hard to get it to produce multiple output formats and dump the output to different files... Group: Query Tuning Subgroup: Planner Method Configuration Name: enable_hashagg Context: userset ... etc ... -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. BEGIN { havedata = 0; group = subgroup = ; print parameters\n; } /^[A-Za-z]+: *(.*)/ { keyword = tolower(substr( $0, 1, index( $0, : ) - 1 ) ); data = substr( $0, index( $0, : ) + 1 ); sub( /^ +/, , data ); sub( / +$/, , data ); if( keyword == group ) { if( group != ) { print /group\n } printf grouptitle%s/title\n, data; group = data; subgroup = ; next; } if( keyword == subgroup ) { if( subgroup != ) { printf /subgroup[%s]\n, subgroup } printf subgrouptitle%s/title\n, data; subgroup = data; next; } havedata = 1; fields[keyword] = data; next; } /^ *$/ { if( havedata == 0 ) { next } print parameter\n; for( keyword in fields ) { printf %s%s%s\n, keyword, fields[keyword], keyword } print /parameter\n; havedata = 0; delete fields; next; } { printf Invalid input: %s\n, $0; exit; } END { if( havedata == 1 ) { print parameter\n; for( keyword in fields ) { printf %s%s%s\n, keyword, fields[keyword], keyword } print /parameter\n; } if( subgroup != ) { print /subgroup\n } if( group != ) { print /group\n } print /parameters\n; } signature.asc Description: Digital signature
Re: [HACKERS] qsort again
* Neil Conway: On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote: It seems clear that our qsort.c is doing a pretty awful job of picking qsort pivots, while glibc is mostly managing not to make that mistake. I haven't looked at the glibc code yet to see what they are doing differently. glibc qsort is actually merge sort, so I'm not surprised it avoids this problem. qsort also performs twice as many key comparisons as the theoretical minimum. If key comparison is not very cheap, other schemes (like heapsort, for example) are more attractive. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Doubt in parser
hi currently i looking at the postgres src code. I saw the scanner and parser implemetations at two different places (src/backend/parser/ and /src/bakend/bootstrp). Can anybody tell me the purpose of having two phases?? or will this help to parse the queries at different levels? Thanks Dhanaraj ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Doubt in parser
On Thu, Feb 16, 2006 at 06:07:25PM +0530, Dhanaraj wrote: hi currently i looking at the postgres src code. I saw the scanner and parser implemetations at two different places (src/backend/parser/ and /src/bakend/bootstrp). Can anybody tell me the purpose of having two phases?? or will this help to parse the queries at different levels? The first one is the actual parser for queries you send. The latter is the bootstrap parser which is only used during the inital bootstrap of a database. It needs to be seperate because of things like the names of columns are stored in a pg_attribute, yet how can you fill the table if you don't know what the columns are called. The latter is basically a glorified data loader to handle this special case. It can't do queries or anything like that. You can basically ignore it for normal development. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] qsort again
On Thu, Feb 16, 2006 at 01:10:48PM +0100, Florian Weimer wrote: * Neil Conway: On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote: It seems clear that our qsort.c is doing a pretty awful job of picking qsort pivots, while glibc is mostly managing not to make that mistake. I haven't looked at the glibc code yet to see what they are doing differently. glibc qsort is actually merge sort, so I'm not surprised it avoids this problem. qsort also performs twice as many key comparisons as the theoretical minimum. If key comparison is not very cheap, other schemes (like heapsort, for example) are more attractive. Last time around there were a number of different algorithms tested. Did anyone run those tests while getting it to count the number of actual comparisons (which could easily swamp the time taken to do the actual sort in some cases)? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [PERFORM] [HACKERS] qsort again
Martijn van Oosterhout schrieb: Last time around there were a number of different algorithms tested. Did anyone run those tests while getting it to count the number of actual comparisons (which could easily swamp the time taken to do the actual sort in some cases)? The last time I did such tests is almost 10 years ago. I had used MetroWerks CodeWarrior C/C++, which had Quicksort as algorithm in the Lib C. Anyhow, I tested a few algorithms including merge sort and heapsort. I end up with heapsort because it was the fastest algorithm for our issue. We joined two arrays where each array was sorted and run qsort to sort the new array. Sven. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
At 06:35 AM 2/16/2006, Steinar H. Gunderson wrote: On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote: Even better (and more easily scaled as the number of GPR's in the CPU changes) is to use the set {L; L+1; L+2; t1; R-2; R-1; R} This means that instead of 7 random memory accesses, we have 3; two of which result in a burst access for three elements each. Isn't that improvement going to disappear competely if you choose a bad pivot? Only if you _consistently_ (read: the vast majority of the time: quicksort is actually darn robust) choose a _pessimal_, not just bad, pivot quicksort will degenerate to the O(N^2) behavior everyone worries about. See Corman Rivest for a proof on this. Even then, doing things as above has benefits: 1= The worst case is less bad since the guaranteed O(lgs!) pivot choosing algorithm puts s elements into final position. Worst case becomes better than O(N^2/(s-1)). 2= The overhead of pivot choosing can overshadow the benefits using more traditional methods for even moderate values of s. See discussions on the quicksort variant known as samplesort and Sedgewick's PhD thesis for details. Using a pivot choosing algorithm that actually does some of the partitioning (and does it more efficiently than the usual partitioning algorithm does) plus using partition-in-place (rather then Lomuto's method) reduces overhead very effectively (at the cost of more complicated / delicate to get right partitioning code). The above reduces the number of moves used in a quicksort pass considerably regardless of the number of compares used. 3= Especially in modern systems where the gap between internal CPU bandwidth and memory bandwidth is so great, the overhead of memory accesses for comparisons and moves is the majority of the overhead for both the pivot choosing and the partitioning algorithms within quicksort. Particularly random memory accesses. The reason (#GPRs - 1) is a magic constant is that it's the most you can compare and move using only register-to-register operations. In addition, replacing as many of the memory accesses you must do with sequential rather than random memory accesses is a big deal: sequential memory access is measured in 10's of CPU cycles while random memory access is measured in hundreds of CPU cycles. It's no accident that the advances in Grey's sorting contest have involved algorithms that are both register and cache friendly, minimizing overall memory access and using sequential memory access as much as possible when said access can not be avoided. As caches grow larger and memory accesses more expensive, it's often worth it to use a BucketSort+QuickSort hybrid rather than just QuickSort. ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods. SIDE NOTE: IIRC glibc's qsort is actually merge sort. Merge sort performance is insensitive to all inputs, and there are way to optimize it as well. glibc-2.3.5/stdlib/qsort.c: /* Order size using quicksort. This implementation incorporates four optimizations discussed in Sedgewick: I can't see any references to merge sort in there at all. Well, then I'm not the only person on the lists whose memory is faulty ;-) The up side of MergeSort is that its performance is always O(NlgN). The down sides are that it is far more memory hungry than QuickSort and slower. Ron ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [HACKERS] qsort again
At 07:10 AM 2/16/2006, Florian Weimer wrote: * Neil Conway: On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote: It seems clear that our qsort.c is doing a pretty awful job of picking qsort pivots, while glibc is mostly managing not to make that mistake. I haven't looked at the glibc code yet to see what they are doing differently. glibc qsort is actually merge sort, so I'm not surprised it avoids this problem. qsort also performs twice as many key comparisons as the theoretical minimum. The theoretical minimum number of comparisons for a general purpose comparison based sort is O(lgN!). QuickSort uses 2NlnN ~= 1.38NlgN or ~1.38x the optimum without tuning (see Knuth, Sedgewick, Corman, ... etc) OTOH, QuickSort uses ~2x as many =moves= as the theoretical minimum unless tuned, and moves are more expensive than compares in modern systems. See my other posts for QuickSort tuning methods that attempt to directly address both issues. Ron ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Doubt in parser
Dhanaraj wrote: hi currently i looking at the postgres src code. I saw the scanner and parser implemetations at two different places (src/backend/parser/ and /src/bakend/bootstrp). Can anybody tell me the purpose of having two phases?? or will this help to parse the queries at different levels? The bootstrap parser is using only in bootstrap mode, which is when the template1 database is initially created. It has a completely different syntax than the main parser. If what you are looking for is to implement a new command or modify an existing one, ignore the bootstrap parser. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Hi, Ron, Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods. Sounds interesting, could you give us some pointers (names, URLs, papers) to such algorithms? Thanks a lot, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Last night I implemented a non-recursive introsort in C... let me test it a bit more and then I'll post it here for everyone else to try out.On 2/16/06, Markus Schaber [EMAIL PROTECTED] wrote:Hi, Ron, Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods.Sounds interesting, could you give us some pointers (names, URLs,papers) to such algorithms?Thanks a lot,Markus--Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GISFight against software patents in EU! www.ffii.org www.nosoftwarepatents.org---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org-- Jonah H. Harris, Database Internals Architect EnterpriseDB Corporation732.331.1324
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote: 3= Especially in modern systems where the gap between internal CPU bandwidth and memory bandwidth is so great, the overhead of memory accesses for comparisons and moves is the majority of the overhead for both the pivot choosing and the partitioning algorithms within quicksort. Particularly random memory accesses. The reason (#GPRs - 1) is a magic constant is that it's the most you can compare and move using only register-to-register operations. But how much of this applies to us? We're not sorting arrays of integers, we're sorting pointers to tuples. So while moves cost very little, a comparison costs hundreds, maybe thousands of cycles. A tuple can easily be two or three cachelines and you're probably going to access all of it, not to mention the Fmgr structures and the Datums themselves. None of this is cache friendly. The actual tuples themselves could be spread all over memory (I don't think any particular effort is expended trying to minimize fragmentation). Do these algorithms discuss the case where a comparison is more than 1000 times the cost of a move? Where this does become interesting is where we can convert a datum to an integer such that if f(A) f(B) then A B. Then we can sort on f(X) first with just integer comparisons and then do a full tuple comparison only if f(A) = f(B). This would be much more cache-coherent and make these algorithms much more applicable in my mind. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] Generating config stuff from single source
Peter Eisentraut [EMAIL PROTECTED] writes: Am Donnerstag, 16. Februar 2006 02:50 schrieb Tom Lane: That's fine for users, but what new demands are you about to place on developers? Does this require tools not already needed in order to build from a CVS pull? (There's sure no xsltproc on this machine...) It is to be expected that sooner or later we'll move from SGML to XML documentation builds, at which point xsltproc will become a semi-requirement anyway. I don't think this requirement is too onerous; libxslt is portable and easy to install. Forgot to mention, but: I don't find the above argument very convincing. The buildfarm machines are not expected to build documentation, and many developers seem not to have installed doc tools either. So I think this would be raising the bar another notch in terms of what's required to do development or testing, even if it does overlap with docs-build needs. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote: On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote: 3= Especially in modern systems where the gap between internal CPU bandwidth and memory bandwidth is so great, the overhead of memory accesses for comparisons and moves is the majority of the overhead for both the pivot choosing and the partitioning algorithms within quicksort. Particularly random memory accesses. The reason (#GPRs - 1) is a magic constant is that it's the most you can compare and move using only register-to-register operations. But how much of this applies to us? We're not sorting arrays of integers, we're sorting pointers to tuples. So while moves cost very little, a comparison costs hundreds, maybe thousands of cycles. A tuple can easily be two or three cachelines and you're probably going to access all of it, not to mention the Fmgr structures and the Datums themselves. Pointers are simply fixed size 32b or 64b quantities. They are essentially integers. Comparing and moving pointers or fixed size keys to those pointers is exactly the same problem as comparing and moving integers. Comparing =or= moving the actual data structures is a much more expensive and variable cost proposition. I'm sure that pg's sort functionality is written intelligently enough that the only real data moves are done in a final pass after the exact desired order has been found using pointer compares and (re)assignments during the sorting process. That's a standard technique for sorting data whose key or pointer is much smaller than a datum. Your cost comment basically agrees with mine regarding the cost of random memory accesses. The good news is that the number of datums to be examined during the pivot choosing process is small enough that the datums can fit into CPU cache while the pointers to them can be assigned to registers: making pivot choosing +very+ fast when done correctly. As you've noted, actual partitioning is going to be more expensive since it involves accessing enough actual datums that they can't all fit into CPU cache. The good news is that QuickSort has a very sequential access pattern within its inner loop. So while we must go to memory for compares, we are at least keeping the cost for it down it a minimum. In addition, said access is nice enough to be very prefetch and CPU cache hierarchy friendly. None of this is cache friendly. The actual tuples themselves could be spread all over memory (I don't think any particular effort is expended trying to minimize fragmentation). It probably would be worth it to spend some effort on memory layout just as we do for HD layout. Do these algorithms discuss the case where a comparison is more than 1000 times the cost of a move? A move is always more expensive than a compare when the datum is larger than its pointer or key. A move is always more expensive than a compare when it involves memory to memory movement rather than CPU location to CPU location movement. A move is especially more expensive than a compare when it involves both factors. Most moves do involve both. What I suspect you meant is that a key comparison that involves accessing the data in memory is more expensive than reassigning the pointers associated with those keys. That is certainly true. Yes. The problem has been extensively studied. ;-) Where this does become interesting is where we can convert a datum to an integer such that if f(A) f(B) then A B. Then we can sort on f(X) first with just integer comparisons and then do a full tuple comparison only if f(A) = f(B). This would be much more cache-coherent and make these algorithms much more applicable in my mind. In fact we can do better. Using hash codes or what-not to map datums to keys and then sorting just the keys and the pointers to those datums followed by an optional final pass where we do the actual data movement is also a standard technique for handling large data structures. Regardless of what tweaks beyond the basic algorithms we use, the algorithms themselves have been well studied and their performance well established. QuickSort is the best performing of the O(nlgn) comparison based sorts and it uses less resources than HeapSort or MergeSort. Ron ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
Ron [EMAIL PROTECTED] writes: Your cost comment basically agrees with mine regarding the cost of random memory accesses. The good news is that the number of datums to be examined during the pivot choosing process is small enough that the datums can fit into CPU cache while the pointers to them can be assigned to registers: making pivot choosing +very+ fast when done correctly. This is more or less irrelevant given that comparing the pointers is not the operation we need to do. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Markus Schaber wrote: Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods. Sounds interesting, could you give us some pointers (names, URLs, papers) to such algorithms? Most of these techniques boil down to good ol' bucket sort. A simple example: suppose you have a column of integer percentages, range zero to 100. You know there are only 101 distinct values. So create 101 buckets (e.g. linked lists), make a single pass through your data and drop each row's ID into the right bucket, then make a second pass through the buckets, and write the row ID's out in bucket order. This is an O(N) sort technique. Any time you have a restricted data range, you can do this. Say you have 100 million rows of scientific results known to be good to only three digits -- it can have at most 1,000 distinct values (regardless of the magnitude of the values), so you can do this with 1,000 buckets and just two passes through the data. You can also use this trick when the optimizer is asked for fastest first result. Say you have a cursor on a column of numbers with good distribution. If you do a bucket sort on the first two or three digits only, you know the first page of results will be in the first bucket. So you only need to apply qsort to that first bucket (which is very fast, since it's small), and you can deliver the first page of data to the application. This can be particularly effective in interactive situations, where the user typically looks at a few pages of data and then abandons the search. I doubt this is very relevant to Postgres. A relational database has to be general purpose, and it's hard to give it hints that would tell it when to use this particular optimization. Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
At 10:52 AM 2/16/2006, Ron wrote: At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote: Where this does become interesting is where we can convert a datum to an integer such that if f(A) f(B) then A B. Then we can sort on f(X) first with just integer comparisons and then do a full tuple comparison only if f(A) = f(B). This would be much more cache-coherent and make these algorithms much more applicable in my mind. In fact we can do better. Using hash codes or what-not to map datums to keys and then sorting just the keys and the pointers to those datums followed by an optional final pass where we do the actual data movement is also a standard technique for handling large data structures. I thought some follow up might be in order here. Let's pretend that we have the typical DB table where rows are ~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in such a table. A 32b hash code can be assigned to each row value such that only exactly equal rows will have the same hash code. A 32b pointer can locate any of the 256M-512M rows. Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b= 64b*(2^28 to 2^29)= 2-4GB of pointers+keys followed by an optional pass to rearrange the actual rows if we so wish. We get the same result while only examining and manipulating 1/50 to 1/25 as much data during the sort. If we want to spend more CPU time in order to save more space, we can compress the key+pointer representation. That usually reduces the amount of data to be manipulated to ~1/4 the original key+pointer representation, reducing things to ~512M-1GB worth of compressed pointers+keys. Or ~1/200 - ~1/100 the original amount of data we were discussing. Either representation is small enough to fit within RAM rather than requiring HD IO, so we solve the HD IO bottleneck in the best possible way: we avoid ever doing it. Ron ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
On Thu, Feb 16, 2006 at 11:32:55AM -0500, Ron wrote: At 10:52 AM 2/16/2006, Ron wrote: In fact we can do better. Using hash codes or what-not to map datums to keys and then sorting just the keys and the pointers to those datums followed by an optional final pass where we do the actual data movement is also a standard technique for handling large data structures. Or in fact required if the Datums are not all the same size, which is the case in PostgreSQL. I thought some follow up might be in order here. Let's pretend that we have the typical DB table where rows are ~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in such a table. A 32b hash code can be assigned to each row value such that only exactly equal rows will have the same hash code. A 32b pointer can locate any of the 256M-512M rows. That hash code is impossible the way you state it, since the set of strings is not mappable to a 32bit integer. You probably meant that a hash code can be assigned such that equal rows have equal hashes (drop the only). Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b= 64b*(2^28 to 2^29)= 2-4GB of pointers+keys followed by an optional pass to rearrange the actual rows if we so wish. We get the same result while only examining and manipulating 1/50 to 1/25 as much data during the sort. But this is what we do now. The tuples are loaded, we sort an array of pointers, then we write the output. Except we don't have the hash, so we require access to the 1TB of data to do the actual comparisons. Even if we did have the hash, we'd *still* need access to the data to handle tie-breaks. That's why your comment about moves always being more expensive than compares makes no sense. A move can be acheived simply by swapping two pointers in the array. A compare actually needs to call all sorts of functions. If and only if we have functions for every data type to produce an ordered hash, we can optimise sorts based on single integers. For reference, look at comparetup_heap(). It's just 20 lines, but each function call there expands to maybe a dozen lines of code. And it has a loop. I don't think we're anywhere near the stage where locality of reference makes much difference. We very rarely needs the tuples actualised in memory in the required order, just the pointers are enough. Have a ncie day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Craig A. James [EMAIL PROTECTED] writes: You can also use this trick when the optimizer is asked for fastest first result. Say you have a cursor on a column of numbers with good distribution. If you do a bucket sort on the first two or three digits only, you know the first page of results will be in the first bucket. So you only need to apply qsort to that first bucket (which is very fast, since it's small), and you can deliver the first page of data to the application. This can be particularly effective in interactive situations, where the user typically looks at a few pages of data and then abandons the search. I doubt this is very relevant to Postgres. A relational database has to be general purpose, and it's hard to give it hints that would tell it when to use this particular optimization. Actually, LIMIT does nicely for that hint; the PG planner has definitely got a concept of preferring fast-start plans for limited queries. The real problem in applying bucket-sort ideas is the lack of any datatype-independent way of setting up the buckets. Once or twice we've kicked around the idea of having some datatype-specific sorting code paths alongside the general-purpose one, but I can't honestly see this as being workable from a code maintenance standpoint. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
On Feb 16, 2006, at 8:32 AM, Ron wrote: Let's pretend that we have the typical DB table where rows are ~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in such a table. A 32b hash code can be assigned to each row value such that only exactly equal rows will have the same hash code. A 32b pointer can locate any of the 256M-512M rows. Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b +32b= 64b*(2^28 to 2^29)= 2-4GB of pointers+keys followed by an optional pass to rearrange the actual rows if we so wish. I don't understand this. This is a true statement: (H(x) != H(y)) = (x != y) This is not: (H(x) H(y)) = (x y) Hash keys can tell you there's an inequality, but they can't tell you how the values compare. If you want 32-bit keys that compare in the same order as the original values, here's how you have to get them: (1) sort the values into an array (2) use each value's array index as its key It reduces to the problem you're trying to use it to solve. -- Scott Lamb http://www.slamb.org/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Generating config stuff from single source
Peter Eisentraut [EMAIL PROTECTED] writes: Am Donnerstag, 16. Februar 2006 02:50 schrieb Tom Lane: The m4 idea seems attractive to me because that's already effectively required as part of the autoconf infrastructure (and I think bison uses it too these days). That is true, but I'm afraid that this will lead to code that only a few people will be able to maintain. (Try programming a loop in m4 to start.) A similar issue that's been bothering me for awhile is that it'd be a lot less error prone if keywords.c and the keyword list productions in gram.y were generated off a common declarative source (which might as well produce the keyword documentation appendix too). That could be a job for m4, I think. Hmm ... well, if we are going to do this other thing with xsltproc, the keyword problem should probably be solved with the same tool. I hesitate to mention this, but: we have several other derived files in the tarball (postgres.bki, fmgroids.h, etc) and those are all built with shell/awk/sed/perl scripts. How out of the question is it to solve the GUC problem with that technology? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Markus Schaber Sent: Thursday, February 16, 2006 5:45 AM To: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index Hi, Ron, Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods. Sounds interesting, could you give us some pointers (names, URLs, papers) to such algorithms? He refers to counting sort and radix sort (which comes in most significant digit and least significant digit format). These are also called distribution (as opposed to comparison) sorts. These sorts are O(n) as a function of the data size, but really they are O(M*n) where M is the average key length and n is the data set size. (In the case of MSD radix sort M is the average length to completely differentiate strings) So if you have an 80 character key, then 80*log(n) will only be faster than n*log(n) when you have 2^80th elements -- in other words -- never. If you have short keys, on the other hand, distribution sorts will be dramatically faster. On an unsigned integer, for instance, it requires 4 passes with 8 bit buckets and so 16 elements is the crossover to radix is faster than an O(n*log(n)) sort. Of course, there is a fixed constant of proportionality and so it will really be higher than that, but for large data sets distribution sorting is the best thing that there is for small keys. You could easily have an introspective MSD radix sort. The nice thing about MSD radix sort is that you can retain the ordering that has occurred so far and switch to another algorithm. An introspective MSD radix sort could call an introspective quick sort algorithm once it processed a crossover point of buckets of key data. In order to have distribution sorts that will work with a database system, for each and every data type you will need a function to return the bucket of bits of significance for the kth bucket of bits. For a character string, you could return key[bucket]. For an unsigned integer it is the byte of the integer to return will be a function of the endianness of the CPU. And for each other distinct data type a bucket function would be needed or a sort could not be generated for that type using the distribution method. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
At 12:19 PM 2/16/2006, Scott Lamb wrote: On Feb 16, 2006, at 8:32 AM, Ron wrote: Let's pretend that we have the typical DB table where rows are ~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in such a table. A 32b hash code can be assigned to each row value such that only exactly equal rows will have the same hash code. A 32b pointer can locate any of the 256M-512M rows. Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b +32b= 64b*(2^28 to 2^29)= 2-4GB of pointers+keys followed by an optional pass to rearrange the actual rows if we so wish. I don't understand this. This is a true statement: (H(x) != H(y)) = (x != y) This is not: (H(x) H(y)) = (x y) Hash keys can tell you there's an inequality, but they can't tell you how the values compare. If you want 32-bit keys that compare in the same order as the original values, here's how you have to get them: For most hash codes, you are correct. There is a class of hash or hash-like codes that maintains the mapping to support that second statement. More later when I can get more time. Ron ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] how solve diff of API counstruct_md_array between 8.1 and 8.2?
Hello I use counstruct_md_array function in my Orafunc module. CVS version has diff def now. I am findig way for simple solution of maintaince source code for both version. I have PG_VERSION variable, but it's unusable. Is there way for contrib's autors differentiate PostgreSQL versions? I don't want to have two versions of source code. Thank you Pavel Stehule _ Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. http://www.msn.cz/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 12:35 +0100, Steinar H. Gunderson wrote: glibc-2.3.5/stdlib/qsort.c: /* Order size using quicksort. This implementation incorporates four optimizations discussed in Sedgewick: I can't see any references to merge sort in there at all. stdlib/qsort.c defines _quicksort(), not qsort(), which is defined by msort.c. On looking closer, it seems glibc actually tries to determine the physical memory in the machine -- if it is sorting a single array that exceeds 1/4 of the machine's physical memory, it uses quick sort, otherwise it uses merge sort. -Neil ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] how solve diff of API counstruct_md_array between 8.1 and 8.2?
On Thu, Feb 16, 2006 at 08:36:34PM +0100, Pavel Stehule wrote: Hello I use counstruct_md_array function in my Orafunc module. CVS version has diff def now. I am findig way for simple solution of maintaince source code for both version. I have PG_VERSION variable, but it's unusable. Is there way for contrib's autors differentiate PostgreSQL versions? I don't want to have two versions of source code. For my stuff I've generally use CATALOG_VERSION_NO. It's not very easy, but by looking through CVS you can find when the function was created and in your code use: #ifdef CATALOG_VERSION_NO mmddN /* New stuff */ #else /* Old stuff */ #endif Hope this helps, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
[HACKERS] time waiting on patch queue
[redirecting to -hackers] Tom Lane wrote: At the moment it's not unusual for nontrivial patches to sit around for a month or two, unless they happen to attract special interest of one of the committers. Yeah. That's just horrible. It makes people lose interest and can hurt because of bitrot too. I wish we could aim at some maximum time at least for a first cut review. If there are areas where only one or two people have sufficient expertise, that's also a worry. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] how solve diff of API counstruct_md_array between
Martijn van Oosterhout wrote: On Thu, Feb 16, 2006 at 08:36:34PM +0100, Pavel Stehule wrote: I use counstruct_md_array function in my Orafunc module. CVS version has diff def now. I am findig way for simple solution of maintaince source code for both version. I have PG_VERSION variable, but it's unusable. Is there way for contrib's autors differentiate PostgreSQL versions? I don't want to have two versions of source code. For my stuff I've generally use CATALOG_VERSION_NO. It's not very easy, but by looking through CVS you can find when the function was created and in your code use: #ifdef CATALOG_VERSION_NO mmddN /* New stuff */ #else /* Old stuff */ #endif I do pretty much the same thing in PL/R. The good news is that CATALOG_VERSION_NO doesn't change for each major release once it is released. The following hasn't been updated since the 8.1 release, but you could use it as a starting point: #if (CATALOG_VERSION_NO = 200211021) #define PG_VERSION_73_COMPAT #elif (CATALOG_VERSION_NO = 200310211) #define PG_VERSION_74_COMPAT #elif (CATALOG_VERSION_NO = 200411041) #define PG_VERSION_80_COMPAT #else #define PG_VERSION_81_COMPAT #endif HTH, Joe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] In Japan with Josh Berkus
FYI, Josh Berkus and I are in Japan to give some presentations. We return to the USA on February 23. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote: Once or twice we've kicked around the idea of having some datatype-specific sorting code paths alongside the general-purpose one, but I can't honestly see this as being workable from a code maintenance standpoint. regards, tom lane It seems that instead of maintaining a different sorting code path for each data type, you could get away with one generic path and one (hopefully faster) path if you allowed data types to optionally support a 'sortKey' interface by providing a function f which maps inputs to 32- bit int outputs, such that the following two properties hold: f(a)=f(b) iff a=b if a==b then f(a)==f(b) So if a data type supports the sortKey interface you could perform the sort on f(value) and only refer back to the actual element comparison functions when two sortKeys have the same value. Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). Depending on the overhead, you might not even need to maintain 2 independent search code paths, since you could always use f(x)=0 as the default sortKey function which would degenerate to the exact same sort behavior in use today. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Hi, Mark, Mark Lewis schrieb: It seems that instead of maintaining a different sorting code path for each data type, you could get away with one generic path and one (hopefully faster) path if you allowed data types to optionally support a 'sortKey' interface by providing a function f which maps inputs to 32- bit int outputs, such that the following two properties hold: f(a)=f(b) iff a=b if a==b then f(a)==f(b) Hmm, to remove redundancy, I'd change the = to a and define: if a==b then f(a)==f(b) if ab then f(a)=f(b) Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). With int2 or some restricted ranges of oid and int4, we could even implement a bucket sort. Markus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, Feb 16, 2006 at 02:17:36PM -0800, Mark Lewis wrote: It seems that instead of maintaining a different sorting code path for each data type, you could get away with one generic path and one (hopefully faster) path if you allowed data types to optionally support a 'sortKey' interface by providing a function f which maps inputs to 32- bit int outputs, such that the following two properties hold: f(a)=f(b) iff a=b if a==b then f(a)==f(b) Note this is a property of the collation, not the type. For example strings can be sorted in many ways and the sortKey must reflect that. So in postgres terms it's a property of the btree operator class. It's something I'd like to do if I get A Round Tuit. :) Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. signature.asc Description: Digital signature
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
Markus Schaber [EMAIL PROTECTED] writes: Hmm, to remove redundancy, I'd change the = to a and define: if a==b then f(a)==f(b) if ab then f(a)=f(b) Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). How exactly do you imagine doing this for text? I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though. -- greg ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote: Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). How exactly do you imagine doing this for text? I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though. In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit sortKey as elsewhere suggested). The sorting key doesn't need to be a one-to-one mapping. -- Mark Lewis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] In Japan with Josh Berkus
Title: Re: [HACKERS] In Japan with Josh Berkus Drink Sake and eat some Yakitori for us folks in the west. Maybe shake a robot hand or two while youre at it :-) - Luke On 2/16/06 2:14 PM, Bruce Momjian pgman@candle.pha.pa.us wrote: FYI, Josh Berkus and I are in Japan to give some presentations. We return to the USA on February 23. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 16 Feb 2006, Mark Lewis wrote: On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote: Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). How exactly do you imagine doing this for text? I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though. In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit sortKey as elsewhere suggested). The sorting key doesn't need to be a one-to-one mapping. that would violate your second contraint ( f(a)==f(b) iff (a==b) ) if you could drop that constraint (the cost of which would be extra 'real' compares within a bucket) then a helper function per datatype could work as you are talking. David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] In Japan with Josh Berkus
Hi all, Josh's talk is now available at: http://snaga.org/01_Josh_Berkus.mp3 This file is very long, and an interpreter's voice to interpret into Japanese is also recorded. If you want to learn Japanese, please try it! :) Thanks. Luke Lonergan wrote: Drink Sake and eat some Yakitori for us folks in the west. Maybe shake a robot hand or two while you’re at it :-) - Luke On 2/16/06 2:14 PM, Bruce Momjian pgman@candle.pha.pa.us wrote: FYI, Josh Berkus and I are in Japan to give some presentations. We return to the USA on February 23. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
At 01:47 PM 2/16/2006, Ron wrote: At 12:19 PM 2/16/2006, Scott Lamb wrote: On Feb 16, 2006, at 8:32 AM, Ron wrote: Let's pretend that we have the typical DB table where rows are ~2-4KB apiece. 1TB of storage will let us have 256M-512M rows in such a table. A 32b hash code can be assigned to each row value such that only exactly equal rows will have the same hash code. A 32b pointer can locate any of the 256M-512M rows. Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b +32b= 64b*(2^28 to 2^29)= 2-4GB of pointers+keys followed by an optional pass to rearrange the actual rows if we so wish. I don't understand this. This is a true statement: (H(x) != H(y)) = (x != y) This is not: (H(x) H(y)) = (x y) Hash keys can tell you there's an inequality, but they can't tell you how the values compare. If you want 32-bit keys that compare in the same order as the original values, here's how you have to get them: For most hash codes, you are correct. There is a class of hash or hash-like codes that maintains the mapping to support that second statement. More later when I can get more time. Ron OK, so here's _a_ way (there are others) to obtain a mapping such that if a b then f(a) f (b) and if a == b then f(a) == f(b) Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb integer; a 4KB row becomes a 32Kb integer; etc) Since even a 1TB table made of such rows can only have 256M - 512M possible values even if each row is unique, a 28b or 29b key is large enough to represent each row's value and relative rank compared to all of the others even if all row values are unique. By scanning the table once, we can map say 001h (Hex used to ease typing) to the row with the minimum value and 111h to the row with the maximum value as well as mapping everything in between to their appropriate keys. That same scan can be used to assign a pointer to each record's location. We can now sort the key+pointer pairs instead of the actual data and use an optional final pass to rearrange the actual rows if we wish. That initial scan to set up the keys is expensive, but if we wish that cost can be amortized over the life of the table so we don't have to pay it all at once. In addition, once we have created those keys, then can be saved for later searches and sorts. Further space savings can be obtained whenever there are duplicate keys and/or when compression methods are used on the Key+pointer pairs. Ron ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] In Japan with Josh Berkus
Arigato gozaimas! - Luke From: [EMAIL PROTECTED] on behalf of Satoshi Nagayasu Sent: Thu 2/16/2006 10:17 PM To: Luke Lonergan Cc: Bruce Momjian; PostgreSQL-development Subject: Re: [HACKERS] In Japan with Josh Berkus Hi all, Josh's talk is now available at: http://snaga.org/01_Josh_Berkus.mp3 This file is very long, and an interpreter's voice to interpret into Japanese is also recorded. If you want to learn Japanese, please try it! :) Thanks. Luke Lonergan wrote: Drink Sake and eat some Yakitori for us folks in the west. Maybe shake a robot hand or two while you're at it :-) - Luke On 2/16/06 2:14 PM, Bruce Momjian pgman@candle.pha.pa.us wrote: FYI, Josh Berkus and I are in Japan to give some presentations. We return to the USA on February 23. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] In Japan with Josh Berkus
Transcript: introduction Josh: Can people in the back hear me? Thank you for hosting me in Tokyo, it's a lot of fun for me to come over here. It is also an extremely exciting time to be a PostgreSQL developer. It's just amazing how something that was a hobby, a sideline, um a ... thing that I and people like Bruce did in their spare time has become a major business. So, I'm going to go over a little of where we're going and where we've been. On November 8th of last year at the Open Source Database Conference in Frankfurt Germany, Peter Eisentraut, another member of the Postgres core team anounced the availability of Postgres 8.1. This release was a milestone for us, a highpoint in a lot of ways. From the features, from the amount of adoption and excitement that it's created, news coverage, the news coverage of Postgres in general Now, we didn't start out with what we have in 8.1. As you know, Postgres has been under development for a long time, over 20 years. Since my involvement with Postgresql, started in 1998, I'm just going to talk about what we've done since we went on the Internet in 1996. In fact, our 10th anniversary of going on the internet is coming up in July, and we'll be holding a small conference for PostgreSQL developers, for contributors in Toronto. So, when Postgres, actually as Postgres95 first became available for people to download, from Postgres.org, from Postgres95.org, I don't remember what our website was called. The first goal at that time was to stop it from crashing. A lot of that work was done by Bruce, who's in the back, and by Vim, one of our key developers at the time. Now, I of course waited and joined the community after Postgres stopped crashing. After that, the next thing we had to do was implement a lot of features that were considered standard on other SQL databases. Things like left joins, the schema object, and stored procedures. Once we were good enough in terms of implementing business features and standard database features, we focused on majing the database perform better. Because most of our users at that time were telecommunications companies and internet service providers and similar companies, we focused on what's called online transaction processing. And thanks partly to the design of Postgres with a few improvements, things like the free space map, we were quickly able to make Postgres measure up to even the largest commercial databases for online transaction processing. The other big thing that started in those years, and didn't peak till recently, was the port to the Windows operating system. That port was led by Bruce Momjian and involved a lot of Japanese contribuotrs to the Postgres database. Having conquered online transaction processing, in the last year to year and a half we've moved on to data warehousing and very large databases. And a lot of the features in 8.1 and some that will be coming in 8.2 will be about data warehousing. Now, what's coming in 2007 and beyond I'm not quite sure - a lot of it is up to you. As an open source project, we go where our users and contributors want us to go. I have a feeling that the that place is going to involve specialty database purposes and application integration. But we'll see. So, what's in 8.1 is a whole lot of features that we're pretty excited about. This includes major SQL features like 2-phase commit and user roles, large database and data warehousing features like index bitmap scan and table partitioning, faster transactions through improved multi-processor performance, shared row locks and faster GIST indexes. We also were able to take care of a couple of things on our todo list that some of our users have been asking us for a very long time. That includes more powerful and more standard functions and stored procedures. And integrating the autovacuum utility into the backend of the Postgres database. So, 2-phase commit, which took a couple of developers about 2 years, is heavily in demand by a small group, mostly in financial processing. The concept is pretty simple, instead of simply commiting a transaction on a single server, you have two phases where both servers coordinate committing a transaction together. The way we implemented it is when one server is ready to commit a transaction it sends a prepare to commit message to the other server, the other server acknowledges that with a prepared message, then when the transaction is ready, then when it's acknowledged, the master server sends a commit message. And the second server acknowledges it. Now the tricky part is what happens when one of these servers fails in the middle of this process. For example, what happens if this server fails, well the answer is that if this acknowledgement has not been received ... 18:58 into the recording. I'll get the rest transcribed and maybe put in on the new bizgres network site at