Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic

2006-02-16 Thread Luke Lonergan
Christoper,

On 2/15/06 11:14 PM, Christopher Kings-Lynne [EMAIL PROTECTED]
wrote:

 Any comments on this?  Is he referring to EnterpriseDB extensions that
 they don't make public?

I've noticed a lot of press lately is mentioning their name next to ingres
as an alternative to MySQL, so the MySQL folks might be feeling some
Postgres heat from their direction.

I also wonder where their project is too - they seem publicly opaque about
progress, etc.  From the web site's statements it looks like they've written
a tool to tune the postgresql.conf file from which they claim a 50%
speed-up, but that's not new or unique fork-level functionality.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic

2006-02-16 Thread Rick Gigger
Any comments on this?  Is he referring to EnterpriseDB extensions  
that

they don't make public?


I've noticed a lot of press lately is mentioning their name next to  
ingres

as an alternative to MySQL, so the MySQL folks might be feeling some
Postgres heat from their direction.

I also wonder where their project is too - they seem publicly  
opaque about
progress, etc.  From the web site's statements it looks like  
they've written

a tool to tune the postgresql.conf file from which they claim a 50%
speed-up, but that's not new or unique fork-level functionality.


What they don't say is whether that is a 50% speed up from the  
default settings or a 50% increase from a carefully hand tunes file.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Generating config stuff from single source

2006-02-16 Thread Peter Eisentraut
Am Donnerstag, 16. Februar 2006 02:50 schrieb Tom Lane:
 That's fine for users, but what new demands are you about to place on
 developers?  Does this require tools not already needed in order to
 build from a CVS pull?  (There's sure no xsltproc on this machine...)

It is to be expected that sooner or later we'll move from SGML to XML 
documentation builds, at which point xsltproc will become a semi-requirement 
anyway.  I don't think this requirement is too onerous; libxslt is portable 
and easy to install.

 The m4 idea seems attractive to me because that's already effectively
 required as part of the autoconf infrastructure (and I think bison
 uses it too these days).

That is true, but I'm afraid that this will lead to code that only a few 
people will be able to maintain.  (Try programming a loop in m4 to start.)

 A similar issue that's been bothering me for awhile is that it'd be a
 lot less error prone if keywords.c and the keyword list productions in
 gram.y were generated off a common declarative source (which might as
 well produce the keyword documentation appendix too).

That could be a job for m4, I think.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Blog post on EnterpriseDB...maybe off topic

2006-02-16 Thread Lukas Smith

Christopher Kings-Lynne wrote:
http://www.flamingspork.com/blog/2006/02/16/enterprisedb-where-is-the-source/ 



Any comments on this?  Is he referring to EnterpriseDB extensions that 
they don't make public?


I think so. Trying to battle the perception that EnterpriseDB is an 
open source database. Seems though that little effort is made to 
understand the actual relationship between EnterpriseDB and PostGreSQL.


Looks like an attempt at pitting dual license GPL/closed source vs. 
proprietary BSD based.


regards,
Lukas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Generating config stuff from single source

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 02:36:01AM +0100, Peter Eisentraut wrote:
 We are currently maintaining information about configuration parameters 
 in at least three places: the documentation, guc.c, and 
 postgresql.conf.sample.  I would like to generate these from a single 
 source.  Computationally, this is not very challenging, it's just a bit 
 of work.  I imagine as the source an XML file with a custom schema; see 
 below for an example.  I think this is the best source format because 
 it allows integrating the DocBook-formatted descriptions without too 
 much trouble and it allows for file format validation.  An alternative 
 might be m4 but that would not offer these features.  To process this 
 we'd use XSLT stylesheets run through xsltproc.  We'd run this part 
 during the tarball building phase, so users would not need it.  
 Obviously, all of this will need some fine-tuning, but can we agree on 
 this general direction?

Is there any reason why we can't just use something like awk? It's
installed almost everywhere (it may be required, not sure) and a lot
more people know how to use it. I havn't managed to wrap my brain
around xslt yet.

The input file could be simply line based. Attached is a simple script
that takes the input below and produces something resembling what you
suggested. It wouldn't be too hard to get it to produce multiple output
formats and dump the output to different files...


Group: Query Tuning
Subgroup: Planner Method Configuration

Name: enable_hashagg
Context: userset

... etc ...


-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.
BEGIN {
   havedata = 0;
   group = subgroup = ;
   print parameters\n;
}

/^[A-Za-z]+: *(.*)/ { 
keyword = tolower(substr( $0, 1, index( $0, : ) - 1 ) );
data = substr( $0, index( $0, : ) + 1 );
sub( /^ +/, , data );
sub( / +$/, , data );
if( keyword == group )
{ 
  if( group !=  ) { print /group\n }
  printf grouptitle%s/title\n, data;
  group = data;
  subgroup = ;
  next;
}

if( keyword == subgroup )
{ 
  if( subgroup !=  ) { printf /subgroup[%s]\n, subgroup }
  printf subgrouptitle%s/title\n, data;
  subgroup = data;
  next;
}

havedata = 1;
fields[keyword] = data;
next;
}
/^ *$/ { 
if( havedata == 0 ) { next }

print parameter\n;
for( keyword in fields )
{ printf %s%s%s\n, keyword, fields[keyword], keyword }
print /parameter\n;
havedata = 0;
delete fields;
next;
}

{ printf Invalid input: %s\n, $0; exit; }

END { 
if( havedata == 1 ) 
{
  print parameter\n;
  for( keyword in fields )
  { printf %s%s%s\n, keyword, fields[keyword], keyword }
  print /parameter\n;
}
if( subgroup !=  )
{ print /subgroup\n }
if( group !=  )
{ print /group\n }
print /parameters\n;
}


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again

2006-02-16 Thread Florian Weimer
* Neil Conway:

 On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
 It seems clear that our qsort.c is doing a pretty awful job of picking
 qsort pivots, while glibc is mostly managing not to make that mistake.
 I haven't looked at the glibc code yet to see what they are doing
 differently.

 glibc qsort is actually merge sort, so I'm not surprised it avoids this
 problem.

qsort also performs twice as many key comparisons as the theoretical
minimum.  If key comparison is not very cheap, other schemes (like
heapsort, for example) are more attractive.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Doubt in parser

2006-02-16 Thread Dhanaraj

hi

currently i looking at the postgres src code. I saw the scanner and 
parser implemetations at two different places (src/backend/parser/  and  
/src/bakend/bootstrp). Can anybody tell me the purpose of having two 
phases?? or will this help to parse the queries at different levels?


Thanks
Dhanaraj

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Doubt in parser

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 06:07:25PM +0530, Dhanaraj wrote:
 hi
 
 currently i looking at the postgres src code. I saw the scanner and 
 parser implemetations at two different places (src/backend/parser/  and  
 /src/bakend/bootstrp). Can anybody tell me the purpose of having two 
 phases?? or will this help to parse the queries at different levels?

The first one is the actual parser for queries you send. The latter is
the bootstrap parser which is only used during the inital bootstrap of
a database. It needs to be seperate because of things like the names of
columns are stored in a pg_attribute, yet how can you fill the table if
you don't know what the columns are called.

The latter is basically a glorified data loader to handle this special
case. It can't do queries or anything like that. You can basically
ignore it for normal development.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 01:10:48PM +0100, Florian Weimer wrote:
 * Neil Conway:
 
  On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
  It seems clear that our qsort.c is doing a pretty awful job of picking
  qsort pivots, while glibc is mostly managing not to make that mistake.
  I haven't looked at the glibc code yet to see what they are doing
  differently.
 
  glibc qsort is actually merge sort, so I'm not surprised it avoids this
  problem.
 
 qsort also performs twice as many key comparisons as the theoretical
 minimum.  If key comparison is not very cheap, other schemes (like
 heapsort, for example) are more attractive.

Last time around there were a number of different algorithms tested.
Did anyone run those tests while getting it to count the number of
actual comparisons (which could easily swamp the time taken to do the
actual sort in some cases)?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [PERFORM] [HACKERS] qsort again

2006-02-16 Thread Sven Geisler

Martijn van Oosterhout schrieb:


Last time around there were a number of different algorithms tested.
Did anyone run those tests while getting it to count the number of
actual comparisons (which could easily swamp the time taken to do the
actual sort in some cases)?



The last time I did such tests is almost 10 years ago. I had used 
MetroWerks CodeWarrior C/C++, which had Quicksort as algorithm in the Lib C.
Anyhow, I tested a few algorithms including merge sort and heapsort. I 
end up with heapsort because it was the fastest algorithm for our issue. 
We joined two arrays where each array was sorted and run qsort to sort 
the new array.


Sven.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Ron

At 06:35 AM 2/16/2006, Steinar H. Gunderson wrote:

On Wed, Feb 15, 2006 at 11:30:54PM -0500, Ron wrote:
 Even better (and more easily scaled as the number of GPR's in the CPU
 changes) is to use
 the set {L; L+1; L+2; t1; R-2; R-1; R}
 This means that instead of 7 random memory accesses, we have 3; two
 of which result in a burst access for three elements each.

Isn't that improvement going to disappear competely if you choose a bad
pivot?
Only if you _consistently_ (read: the vast majority of the time: 
quicksort is actually darn robust) choose a _pessimal_, not just 
bad, pivot quicksort will degenerate to the O(N^2) behavior 
everyone worries about.  See Corman  Rivest for a proof on this.


Even then, doing things as above has benefits:
1= The worst case is less bad since the guaranteed O(lgs!) pivot 
choosing algorithm puts s elements into final position.

Worst case becomes better than O(N^2/(s-1)).

2=  The overhead of pivot choosing can overshadow the benefits using 
more traditional methods for even moderate values of s.  See 
discussions on the quicksort variant known as samplesort and 
Sedgewick's PhD thesis for details.  Using a pivot choosing algorithm 
that actually does some of the partitioning (and does it more 
efficiently than the usual partitioning algorithm does) plus using 
partition-in-place (rather then Lomuto's method) reduces overhead 
very effectively (at the cost of more complicated / delicate to get 
right partitioning code).  The above reduces the number of moves used 
in a quicksort pass considerably regardless of the number of compares used.


3= Especially in modern systems where the gap between internal CPU 
bandwidth and memory bandwidth is so great, the overhead of memory 
accesses for comparisons and moves is the majority of the overhead 
for both the pivot choosing and the partitioning algorithms within 
quicksort.  Particularly random memory accesses.  The reason (#GPRs - 
1) is a magic constant is that it's the most you can compare and move 
using only register-to-register operations.


In addition, replacing as many of the memory accesses you must do 
with sequential rather than random memory accesses is a big deal: 
sequential memory access is measured in 10's of CPU cycles while 
random memory access is measured in hundreds of CPU cycles.  It's no 
accident that the advances in Grey's sorting contest have involved 
algorithms that are both register and cache friendly, minimizing 
overall memory access and using sequential memory access as much as 
possible when said access can not be avoided.  As caches grow larger 
and memory accesses more expensive, it's often worth it to use a 
BucketSort+QuickSort hybrid rather than just QuickSort.


...and of course if you know enough about the data to be sorted so as 
to constrain it appropriately, one should use a non comparison based 
O(N) sorting algorithm rather than any of the general comparison 
based O(NlgN) methods.




 SIDE NOTE: IIRC glibc's qsort is actually merge sort.  Merge sort
 performance is insensitive to all inputs, and there are way to
 optimize it as well.

glibc-2.3.5/stdlib/qsort.c:

  /* Order size using quicksort.  This implementation incorporates
 four optimizations discussed in Sedgewick:

I can't see any references to merge sort in there at all.

Well, then I'm not the only person on the lists whose memory is faulty ;-)

The up side of MergeSort is that its performance is always O(NlgN).
The down sides are that it is far more memory hungry than QuickSort and slower.


Ron




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [HACKERS] qsort again

2006-02-16 Thread Ron

At 07:10 AM 2/16/2006, Florian Weimer wrote:

* Neil Conway:

 On Wed, 2006-02-15 at 18:28 -0500, Tom Lane wrote:
 It seems clear that our qsort.c is doing a pretty awful job of picking
 qsort pivots, while glibc is mostly managing not to make that mistake.
 I haven't looked at the glibc code yet to see what they are doing
 differently.

 glibc qsort is actually merge sort, so I'm not surprised it avoids this
 problem.

qsort also performs twice as many key comparisons as the theoretical
minimum.


The theoretical minimum number of comparisons for a general purpose 
comparison based sort is O(lgN!).
QuickSort uses 2NlnN ~= 1.38NlgN or ~1.38x the optimum without tuning 
(see Knuth, Sedgewick, Corman, ... etc)
OTOH, QuickSort uses ~2x as many =moves= as the theoretical minimum 
unless tuned, and moves are more expensive than compares in modern systems.


See my other posts for QuickSort tuning methods that attempt to 
directly address both issues.



Ron 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Doubt in parser

2006-02-16 Thread Alvaro Herrera
Dhanaraj wrote:
 hi
 
 currently i looking at the postgres src code. I saw the scanner and 
 parser implemetations at two different places (src/backend/parser/  and  
 /src/bakend/bootstrp). Can anybody tell me the purpose of having two 
 phases?? or will this help to parse the queries at different levels?

The bootstrap parser is using only in bootstrap mode, which is when the
template1 database is initially created.  It has a completely different
syntax than the main parser.

If what you are looking for is to implement a new command or modify an
existing one, ignore the bootstrap parser.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, Ron,

Ron wrote:

 ...and of course if you know enough about the data to be sorted so as to
 constrain it appropriately, one should use a non comparison based O(N)
 sorting algorithm rather than any of the general comparison based
 O(NlgN) methods.

Sounds interesting, could you give us some pointers (names, URLs,
papers) to such algorithms?

Thanks a lot,
Markus



-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Jonah H. Harris
Last night I implemented a non-recursive introsort in C... let me test it a bit more and then I'll post it here for everyone else to try out.On 2/16/06, Markus Schaber
 [EMAIL PROTECTED] wrote:Hi, Ron,
Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based
 O(NlgN) methods.Sounds interesting, could you give us some pointers (names, URLs,papers) to such algorithms?Thanks a lot,Markus--Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GISFight against software patents in EU! www.ffii.org www.nosoftwarepatents.org---(end of broadcast)---
TIP 4: Have you searched our list archives? http://archives.postgresql.org-- Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation732.331.1324


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote:
 3= Especially in modern systems where the gap between internal CPU 
 bandwidth and memory bandwidth is so great, the overhead of memory 
 accesses for comparisons and moves is the majority of the overhead 
 for both the pivot choosing and the partitioning algorithms within 
 quicksort.  Particularly random memory accesses.  The reason (#GPRs - 
 1) is a magic constant is that it's the most you can compare and move 
 using only register-to-register operations.

But how much of this applies to us? We're not sorting arrays of
integers, we're sorting pointers to tuples. So while moves cost very
little, a comparison costs hundreds, maybe thousands of cycles. A tuple
can easily be two or three cachelines and you're probably going to
access all of it, not to mention the Fmgr structures and the Datums
themselves.

None of this is cache friendly. The actual tuples themselves could be
spread all over memory (I don't think any particular effort is expended
trying to minimize fragmentation).

Do these algorithms discuss the case where a comparison is more than
1000 times the cost of a move?

Where this does become interesting is where we can convert a datum to
an integer such that if f(A)  f(B) then A  B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
comparison only if f(A) = f(B). This would be much more cache-coherent
and make these algorithms much more applicable in my mind.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] Generating config stuff from single source

2006-02-16 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Donnerstag, 16. Februar 2006 02:50 schrieb Tom Lane:
 That's fine for users, but what new demands are you about to place on
 developers?  Does this require tools not already needed in order to
 build from a CVS pull?  (There's sure no xsltproc on this machine...)

 It is to be expected that sooner or later we'll move from SGML to XML 
 documentation builds, at which point xsltproc will become a semi-requirement 
 anyway.  I don't think this requirement is too onerous; libxslt is portable 
 and easy to install.

Forgot to mention, but: I don't find the above argument very convincing.
The buildfarm machines are not expected to build documentation, and many
developers seem not to have installed doc tools either.  So I think this
would be raising the bar another notch in terms of what's required to do
development or testing, even if it does overlap with docs-build needs.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote:

On Thu, Feb 16, 2006 at 08:22:55AM -0500, Ron wrote:
 3= Especially in modern systems where the gap between internal CPU
 bandwidth and memory bandwidth is so great, the overhead of memory
 accesses for comparisons and moves is the majority of the overhead
 for both the pivot choosing and the partitioning algorithms within
 quicksort.  Particularly random memory accesses.  The reason (#GPRs -
 1) is a magic constant is that it's the most you can compare and move
 using only register-to-register operations.

But how much of this applies to us? We're not sorting arrays of
integers, we're sorting pointers to tuples. So while moves cost very
little, a comparison costs hundreds, maybe thousands of cycles. A tuple
can easily be two or three cachelines and you're probably going to
access all of it, not to mention the Fmgr structures and the Datums
themselves.
Pointers are simply fixed size 32b or 64b quantities.  They are 
essentially integers.  Comparing and moving pointers or fixed size 
keys to those pointers is exactly the same problem as comparing and 
moving integers.


Comparing =or= moving the actual data structures is a much more 
expensive and variable cost proposition.  I'm sure that pg's sort 
functionality is written intelligently enough that the only real data 
moves are done in a final pass after the exact desired order has been 
found using pointer compares and (re)assignments during the sorting 
process.  That's a standard technique for sorting data whose key or 
pointer is much smaller than a datum.


Your cost comment basically agrees with mine regarding the cost of 
random memory accesses.  The good news is that the number of datums 
to be examined during the pivot choosing process is small enough that 
the datums can fit into CPU cache while the pointers to them can be 
assigned to registers: making pivot choosing +very+ fast when done correctly.


As you've noted, actual partitioning is going to be more expensive 
since it involves accessing enough actual datums that they can't all 
fit into CPU cache.  The good news is that QuickSort has a very 
sequential access pattern within its inner loop.  So while we must go 
to memory for compares, we are at least keeping the cost for it down 
it a minimum.  In addition, said access is nice enough to be very 
prefetch and CPU cache hierarchy friendly.




None of this is cache friendly. The actual tuples themselves could be
spread all over memory (I don't think any particular effort is expended
trying to minimize fragmentation).
It probably would be worth it to spend some effort on memory layout 
just as we do for HD layout.




Do these algorithms discuss the case where a comparison is more than
1000 times the cost of a move?
A move is always more expensive than a compare when the datum is 
larger than its pointer or key.  A move is always more expensive than 
a compare when it involves memory to memory movement rather than CPU 
location to CPU location movement.  A move is especially more 
expensive than a compare when it involves both factors.  Most moves 
do involve both.


What I suspect you meant is that a key comparison that involves 
accessing the data in memory is more expensive than reassigning the 
pointers associated with those keys.   That is certainly true.


Yes.  The problem has been extensively studied. ;-)



Where this does become interesting is where we can convert a datum to
an integer such that if f(A)  f(B) then A  B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
comparison only if f(A) = f(B). This would be much more cache-coherent
and make these algorithms much more applicable in my mind.

In fact we can do better.
Using hash codes or what-not to map datums to keys and then sorting 
just the keys and the pointers to those datums followed by an 
optional final pass where we do the actual data movement is also a 
standard technique for handling large data structures.



Regardless of what tweaks beyond the basic algorithms we use, the 
algorithms themselves have been well studied and their performance 
well established.  QuickSort is the best performing of the O(nlgn) 
comparison based sorts and it uses less resources than HeapSort or MergeSort.


Ron



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Tom Lane
Ron [EMAIL PROTECTED] writes:
 Your cost comment basically agrees with mine regarding the cost of 
 random memory accesses.  The good news is that the number of datums 
 to be examined during the pivot choosing process is small enough that 
 the datums can fit into CPU cache while the pointers to them can be 
 assigned to registers: making pivot choosing +very+ fast when done correctly.

This is more or less irrelevant given that comparing the pointers is not
the operation we need to do.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Craig A. James

Markus Schaber wrote:

Ron wrote:

...and of course if you know enough about the data to be sorted so as to
constrain it appropriately, one should use a non comparison based O(N)
sorting algorithm rather than any of the general comparison based
O(NlgN) methods.


Sounds interesting, could you give us some pointers (names, URLs,
papers) to such algorithms?


Most of these techniques boil down to good ol' bucket sort.  A simple example: suppose 
you have a column of integer percentages, range zero to 100.  You know there are only 101 distinct 
values.  So create 101 buckets (e.g. linked lists), make a single pass through your 
data and drop each row's ID into the right bucket, then make a second pass through the buckets, and 
write the row ID's out in bucket order.  This is an O(N) sort technique.

Any time you have a restricted data range, you can do this.  Say you have 100 
million rows of scientific results known to be good to only three digits -- it 
can have at most 1,000 distinct values (regardless of the magnitude of the 
values), so you can do this with 1,000 buckets and just two passes through the 
data.

You can also use this trick when the optimizer is asked for fastest first result.  Say you have a cursor on a column of numbers with good distribution.  If you do a bucket sort on the first two or three digits only, you know the first page of results will be in the first bucket.  So you only need to apply qsort to that first bucket (which is very fast, since it's small), and you can deliver the first page of data to the application.  This can be particularly effective in interactive situations, where the user typically looks at a few pages of data and then abandons the search.  


I doubt this is very relevant to Postgres.  A relational database has to be general 
purpose, and it's hard to give it hints that would tell it when to use this 
particular optimization.

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 10:52 AM 2/16/2006, Ron wrote:

At 09:48 AM 2/16/2006, Martijn van Oosterhout wrote:


Where this does become interesting is where we can convert a datum to
an integer such that if f(A)  f(B) then A  B. Then we can sort on
f(X) first with just integer comparisons and then do a full tuple
comparison only if f(A) = f(B). This would be much more cache-coherent
and make these algorithms much more applicable in my mind.

In fact we can do better.
Using hash codes or what-not to map datums to keys and then sorting 
just the keys and the pointers to those datums followed by an 
optional final pass where we do the actual data movement is also a 
standard technique for handling large data structures.

I thought some follow up might be in order here.

Let's pretend that we have the typical DB table where rows are ~2-4KB 
apiece.  1TB of storage will let us have 256M-512M rows in such a table.


A 32b hash code can be assigned to each row value such that only 
exactly equal rows will have the same hash code.

A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b= 
64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an optional 
pass to rearrange the actual rows if we so wish.


We get the same result while only examining and manipulating 1/50 to 
1/25 as much data during the sort.


If we want to spend more CPU time in order to save more space, we can 
compress the key+pointer representation.  That usually reduces the 
amount of data to be manipulated to ~1/4 the original key+pointer 
representation, reducing things to ~512M-1GB worth of compressed 
pointers+keys.  Or ~1/200 - ~1/100 the original amount of data we 
were discussing.


Either representation is small enough to fit within RAM rather than 
requiring HD IO, so we solve the HD IO bottleneck in the best 
possible way: we avoid ever doing it.


Ron   




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 11:32:55AM -0500, Ron wrote:
 At 10:52 AM 2/16/2006, Ron wrote:
 In fact we can do better.
 Using hash codes or what-not to map datums to keys and then sorting 
 just the keys and the pointers to those datums followed by an 
 optional final pass where we do the actual data movement is also a 
 standard technique for handling large data structures.

Or in fact required if the Datums are not all the same size, which is
the case in PostgreSQL.

 I thought some follow up might be in order here.
 
 Let's pretend that we have the typical DB table where rows are ~2-4KB 
 apiece.  1TB of storage will let us have 256M-512M rows in such a table.
 
 A 32b hash code can be assigned to each row value such that only 
 exactly equal rows will have the same hash code.
 A 32b pointer can locate any of the 256M-512M rows.

That hash code is impossible the way you state it, since the set of
strings is not mappable to a 32bit integer. You probably meant that a
hash code can be assigned such that equal rows have equal hashes (drop
the only).

 Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b+32b= 
 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an optional 
 pass to rearrange the actual rows if we so wish.
 
 We get the same result while only examining and manipulating 1/50 to 
 1/25 as much data during the sort.

But this is what we do now. The tuples are loaded, we sort an array of
pointers, then we write the output. Except we don't have the hash, so
we require access to the 1TB of data to do the actual comparisons. Even
if we did have the hash, we'd *still* need access to the data to handle
tie-breaks.

That's why your comment about moves always being more expensive than
compares makes no sense. A move can be acheived simply by swapping two
pointers in the array. A compare actually needs to call all sorts of
functions. If and only if we have functions for every data type to
produce an ordered hash, we can optimise sorts based on single
integers.

For reference, look at comparetup_heap(). It's just 20 lines, but each
function call there expands to maybe a dozen lines of code. And it has
a loop. I don't think we're anywhere near the stage where locality of
reference makes much difference.

We very rarely needs the tuples actualised in memory in the required
order, just the pointers are enough.

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Tom Lane
Craig A. James [EMAIL PROTECTED] writes:
 You can also use this trick when the optimizer is asked for fastest first 
 result.  Say you have a cursor on a column of numbers with good 
 distribution.  If you do a bucket sort on the first two or three digits only, 
 you know the first page of results will be in the first bucket.  So you 
 only need to apply qsort to that first bucket (which is very fast, since it's 
 small), and you can deliver the first page of data to the application.  This 
 can be particularly effective in interactive situations, where the user 
 typically looks at a few pages of data and then abandons the search.  

 I doubt this is very relevant to Postgres.  A relational database has to be 
 general purpose, and it's hard to give it hints that would tell it when to 
 use this particular optimization.

Actually, LIMIT does nicely for that hint; the PG planner has definitely
got a concept of preferring fast-start plans for limited queries.  The
real problem in applying bucket-sort ideas is the lack of any
datatype-independent way of setting up the buckets.

Once or twice we've kicked around the idea of having some
datatype-specific sorting code paths alongside the general-purpose one,
but I can't honestly see this as being workable from a code maintenance
standpoint.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Scott Lamb

On Feb 16, 2006, at 8:32 AM, Ron wrote:
Let's pretend that we have the typical DB table where rows are  
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in  
such a table.


A 32b hash code can be assigned to each row value such that only  
exactly equal rows will have the same hash code.

A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an  
optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) = (x != y)
This is not: (H(x)  H(y)) = (x  y)

Hash keys can tell you there's an inequality, but they can't tell you  
how the values compare. If you want 32-bit keys that compare in the  
same order as the original values, here's how you have to get them:


(1) sort the values into an array
(2) use each value's array index as its key

It reduces to the problem you're trying to use it to solve.


--
Scott Lamb http://www.slamb.org/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Generating config stuff from single source

2006-02-16 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Am Donnerstag, 16. Februar 2006 02:50 schrieb Tom Lane:
 The m4 idea seems attractive to me because that's already effectively
 required as part of the autoconf infrastructure (and I think bison
 uses it too these days).

 That is true, but I'm afraid that this will lead to code that only a few 
 people will be able to maintain.  (Try programming a loop in m4 to start.)

 A similar issue that's been bothering me for awhile is that it'd be a
 lot less error prone if keywords.c and the keyword list productions in
 gram.y were generated off a common declarative source (which might as
 well produce the keyword documentation appendix too).

 That could be a job for m4, I think.

Hmm ... well, if we are going to do this other thing with xsltproc, the
keyword problem should probably be solved with the same tool.

I hesitate to mention this, but: we have several other derived files in
the tarball (postgres.bki, fmgroids.h, etc) and those are all built with
shell/awk/sed/perl scripts.  How out of the question is it to solve the
GUC problem with that technology?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Dann Corbit
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Markus Schaber
 Sent: Thursday, February 16, 2006 5:45 AM
 To: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
Index
 
 Hi, Ron,
 
 Ron wrote:
 
  ...and of course if you know enough about the data to be sorted so
as to
  constrain it appropriately, one should use a non comparison based
O(N)
  sorting algorithm rather than any of the general comparison based
  O(NlgN) methods.
 
 Sounds interesting, could you give us some pointers (names, URLs,
 papers) to such algorithms?

He refers to counting sort and radix sort (which comes in most
significant digit and least significant digit format).  These are also
called distribution (as opposed to comparison) sorts.

These sorts are O(n) as a function of the data size, but really they are
O(M*n) where M is the average key length and n is the data set size.
(In the case of MSD radix sort M is the average length to completely
differentiate strings)

So if you have an 80 character key, then 80*log(n) will only be faster
than n*log(n) when you have 2^80th elements -- in other words -- never.

If you have short keys, on the other hand, distribution sorts will be
dramatically faster.  On an unsigned integer, for instance, it requires
4 passes with 8 bit buckets and so 16 elements is the crossover to radix
is faster than an O(n*log(n)) sort.  Of course, there is a fixed
constant of proportionality and so it will really be higher than that,
but for large data sets distribution sorting is the best thing that
there is for small keys.

You could easily have an introspective MSD radix sort.  The nice thing
about MSD radix sort is that you can retain the ordering that has
occurred so far and switch to another algorithm.

An introspective MSD radix sort could call an introspective quick sort
algorithm once it processed a crossover point of buckets of key data.

In order to have distribution sorts that will work with a database
system, for each and every data type you will need a function to return
the bucket of bits of significance for the kth bucket of bits.  For a
character string, you could return key[bucket].  For an unsigned integer
it is the byte of the integer to return will be a function of the
endianness of the CPU.  And for each other distinct data type a bucket
function would be needed or a sort could not be generated for that type
using the distribution method.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 12:19 PM 2/16/2006, Scott Lamb wrote:

On Feb 16, 2006, at 8:32 AM, Ron wrote:

Let's pretend that we have the typical DB table where rows are
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in
such a table.

A 32b hash code can be assigned to each row value such that only
exactly equal rows will have the same hash code.
A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an

optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) = (x != y)
This is not: (H(x)  H(y)) = (x  y)

Hash keys can tell you there's an inequality, but they can't tell you
how the values compare. If you want 32-bit keys that compare in the
same order as the original values, here's how you have to get them:
For most hash codes, you are correct.  There is a class of hash or 
hash-like codes that maintains the mapping to support that second statement.


More later when I can get more time.
Ron 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] how solve diff of API counstruct_md_array between 8.1 and 8.2?

2006-02-16 Thread Pavel Stehule

Hello

I use counstruct_md_array function in my Orafunc module. CVS version has 
diff def now. I am findig way for simple solution of maintaince source code 
for both version. I have PG_VERSION variable, but it's unusable. Is there 
way for contrib's autors differentiate PostgreSQL versions? I don't want to 
have two versions of source code.


Thank you
Pavel Stehule

_
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com. 
http://www.msn.cz/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Neil Conway
On Thu, 2006-02-16 at 12:35 +0100, Steinar H. Gunderson wrote:
 glibc-2.3.5/stdlib/qsort.c:
 
   /* Order size using quicksort.  This implementation incorporates
  four optimizations discussed in Sedgewick:
 
 I can't see any references to merge sort in there at all.

stdlib/qsort.c defines _quicksort(), not qsort(), which is defined by
msort.c. On looking closer, it seems glibc actually tries to determine
the physical memory in the machine -- if it is sorting a single array
that exceeds 1/4 of the machine's physical memory, it uses quick sort,
otherwise it uses merge sort.

-Neil



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] how solve diff of API counstruct_md_array between 8.1 and 8.2?

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 08:36:34PM +0100, Pavel Stehule wrote:
 Hello
 
 I use counstruct_md_array function in my Orafunc module. CVS version has 
 diff def now. I am findig way for simple solution of maintaince source code 
 for both version. I have PG_VERSION variable, but it's unusable. Is there 
 way for contrib's autors differentiate PostgreSQL versions? I don't want to 
 have two versions of source code.

For my stuff I've generally use CATALOG_VERSION_NO. It's not very easy,
but by looking through CVS you can find when the function was created
and in your code use:

#ifdef CATALOG_VERSION_NO  mmddN
/* New stuff */
#else
/* Old stuff */
#endif

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


[HACKERS] time waiting on patch queue

2006-02-16 Thread Andrew Dunstan

[redirecting to -hackers]

Tom Lane wrote:


At the moment it's not unusual for nontrivial patches to sit around
for a month or two, unless they happen to attract special interest of
one of the committers.


 



Yeah. That's just horrible. It makes people lose interest and can hurt 
because of bitrot too. I wish we could aim at some maximum time at least 
for a first cut review. If there are areas where only one or two people 
have sufficient expertise, that's also a worry.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] how solve diff of API counstruct_md_array between

2006-02-16 Thread Joe Conway

Martijn van Oosterhout wrote:

On Thu, Feb 16, 2006 at 08:36:34PM +0100, Pavel Stehule wrote:
I use counstruct_md_array function in my Orafunc module. CVS version has 
diff def now. I am findig way for simple solution of maintaince source code 
for both version. I have PG_VERSION variable, but it's unusable. Is there 
way for contrib's autors differentiate PostgreSQL versions? I don't want to 
have two versions of source code.


For my stuff I've generally use CATALOG_VERSION_NO. It's not very easy,
but by looking through CVS you can find when the function was created
and in your code use:

#ifdef CATALOG_VERSION_NO  mmddN
/* New stuff */
#else
/* Old stuff */
#endif


I do pretty much the same thing in PL/R. The good news is that 
CATALOG_VERSION_NO doesn't change for each major release once it is 
released. The following hasn't been updated since the 8.1 release, but 
you could use it as a starting point:


#if (CATALOG_VERSION_NO = 200211021)
#define PG_VERSION_73_COMPAT
#elif (CATALOG_VERSION_NO = 200310211)
#define PG_VERSION_74_COMPAT
#elif (CATALOG_VERSION_NO = 200411041)
#define PG_VERSION_80_COMPAT
#else
#define PG_VERSION_81_COMPAT
#endif

HTH,

Joe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] In Japan with Josh Berkus

2006-02-16 Thread Bruce Momjian
FYI, Josh Berkus and I are in Japan to give some presentations.  We
return to the USA on February 23.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Mark Lewis
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote:
 Once or twice we've kicked around the idea of having some
 datatype-specific sorting code paths alongside the general-purpose one,
 but I can't honestly see this as being workable from a code maintenance
 standpoint.
 
   regards, tom lane


It seems that instead of maintaining a different sorting code path for
each data type, you could get away with one generic path and one
(hopefully faster) path if you allowed data types to optionally support
a 'sortKey' interface by providing a function f which maps inputs to 32-
bit int outputs, such that the following two properties hold:

f(a)=f(b) iff a=b
if a==b then f(a)==f(b)

So if a data type supports the sortKey interface you could perform the
sort on f(value) and only refer back to the actual element comparison
functions when two sortKeys have the same value.

Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

Depending on the overhead, you might not even need to maintain 2
independent search code paths, since you could always use f(x)=0 as the
default sortKey function which would degenerate to the exact same sort
behavior in use today.

-- Mark Lewis

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Markus Schaber
Hi, Mark,

Mark Lewis schrieb:

 It seems that instead of maintaining a different sorting code path for
 each data type, you could get away with one generic path and one
 (hopefully faster) path if you allowed data types to optionally support
 a 'sortKey' interface by providing a function f which maps inputs to 32-
 bit int outputs, such that the following two properties hold:
 
 f(a)=f(b) iff a=b
 if a==b then f(a)==f(b)

Hmm, to remove redundancy, I'd change the = to a  and define:

if a==b then f(a)==f(b)
if ab  then f(a)=f(b)

 Data types which could probably provide a useful function for f would be
 int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

With int2 or some restricted ranges of oid and int4, we could even
implement a bucket sort.

Markus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Martijn van Oosterhout
On Thu, Feb 16, 2006 at 02:17:36PM -0800, Mark Lewis wrote:
 It seems that instead of maintaining a different sorting code path for
 each data type, you could get away with one generic path and one
 (hopefully faster) path if you allowed data types to optionally support
 a 'sortKey' interface by providing a function f which maps inputs to 32-
 bit int outputs, such that the following two properties hold:
 
 f(a)=f(b) iff a=b
 if a==b then f(a)==f(b)

Note this is a property of the collation, not the type. For example
strings can be sorted in many ways and the sortKey must reflect that.
So in postgres terms it's a property of the btree operator class.

It's something I'd like to do if I get A Round Tuit. :)

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Greg Stark
Markus Schaber [EMAIL PROTECTED] writes:

 Hmm, to remove redundancy, I'd change the = to a  and define:
 
 if a==b then f(a)==f(b)
 if ab  then f(a)=f(b)
 
  Data types which could probably provide a useful function for f would be
  int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).

How exactly do you imagine doing this for text?

I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Mark Lewis
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:
   Data types which could probably provide a useful function for f would be
   int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
 
 How exactly do you imagine doing this for text?
 
 I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though.


In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
sortKey as elsewhere suggested).  The sorting key doesn't need to be a
one-to-one mapping.

-- Mark Lewis

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] In Japan with Josh Berkus

2006-02-16 Thread Luke Lonergan
Title: Re: [HACKERS] In Japan with Josh Berkus



Drink Sake and eat some Yakitori for us folks in the west. Maybe shake a robot hand or two while youre at it :-)

- Luke

On 2/16/06 2:14 PM, Bruce Momjian pgman@candle.pha.pa.us wrote:

FYI, Josh Berkus and I are in Japan to give some presentations. We
return to the USA on February 23.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster









Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread David Lang

On Thu, 16 Feb 2006, Mark Lewis wrote:


On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:

Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).


How exactly do you imagine doing this for text?

I could see doing it for char(n)/varchar(n) where n=4 in SQL_ASCII though.



In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
sortKey as elsewhere suggested).  The sorting key doesn't need to be a
one-to-one mapping.


that would violate your second contraint ( f(a)==f(b) iff (a==b) )

if you could drop that constraint (the cost of which would be extra 'real' 
compares within a bucket) then a helper function per datatype could work 
as you are talking.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] In Japan with Josh Berkus

2006-02-16 Thread Satoshi Nagayasu

Hi all,

Josh's talk is now available at:

http://snaga.org/01_Josh_Berkus.mp3

This file is very long, and an interpreter's voice
to interpret into Japanese is also recorded.

If you want to learn Japanese, please try it! :)

Thanks.

Luke Lonergan wrote:
Drink Sake and eat some Yakitori for us folks in the west.  Maybe shake 
a robot hand or two while you’re at it :-)


- Luke

On 2/16/06 2:14 PM, Bruce Momjian pgman@candle.pha.pa.us wrote:

FYI, Josh Berkus and I are in Japan to give some presentations.  We
return to the USA on February 23.

--
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square,
Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-16 Thread Ron

At 01:47 PM 2/16/2006, Ron wrote:

At 12:19 PM 2/16/2006, Scott Lamb wrote:

On Feb 16, 2006, at 8:32 AM, Ron wrote:

Let's pretend that we have the typical DB table where rows are
~2-4KB apiece.  1TB of storage will let us have 256M-512M rows in
such a table.

A 32b hash code can be assigned to each row value such that only
exactly equal rows will have the same hash code.
A 32b pointer can locate any of the 256M-512M rows.

Now instead of sorting 1TB of data we can sort 2^28 to 2^29 32b 
+32b= 64b*(2^28 to 2^29)=  2-4GB of pointers+keys followed by an

optional pass to rearrange the actual rows if we so wish.


I don't understand this.

This is a true statement: (H(x) != H(y)) = (x != y)
This is not: (H(x)  H(y)) = (x  y)

Hash keys can tell you there's an inequality, but they can't tell you
how the values compare. If you want 32-bit keys that compare in the
same order as the original values, here's how you have to get them:
For most hash codes, you are correct.  There is a class of hash or 
hash-like codes that maintains the mapping to support that second statement.


More later when I can get more time.
Ron


OK, so here's _a_ way (there are others) to obtain a mapping such that
 if a  b then f(a)  f (b) and
 if a == b then f(a) == f(b)

Pretend each row is a integer of row size (so a 2KB row becomes a 
16Kb integer; a 4KB row becomes a 32Kb integer; etc)
Since even a 1TB table made of such rows can only have 256M - 512M 
possible values even if each row is unique, a 28b or 29b key is large 
enough to represent each row's value and relative rank compared to 
all of the others even if all row values are unique.


By scanning the table once, we can map say 001h (Hex used to ease 
typing) to the row with the minimum value and 111h to the row 
with the maximum value as well as mapping everything in between to 
their appropriate keys.  That same scan can be used to assign a 
pointer to each record's location.


We can now sort the key+pointer pairs instead of the actual data and 
use an optional final pass to rearrange the actual rows if we wish.


That initial scan to set up the keys is expensive, but if we wish 
that cost can be amortized over the life of the table so we don't 
have to pay it all at once.  In addition, once we have created those 
keys, then can be saved for later searches and sorts.


Further space savings can be obtained whenever there are duplicate 
keys and/or when compression methods are used on the Key+pointer pairs.


Ron







---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] In Japan with Josh Berkus

2006-02-16 Thread Luke Lonergan
Arigato gozaimas!
 
- Luke



From: [EMAIL PROTECTED] on behalf of Satoshi Nagayasu
Sent: Thu 2/16/2006 10:17 PM
To: Luke Lonergan
Cc: Bruce Momjian; PostgreSQL-development
Subject: Re: [HACKERS] In Japan with Josh Berkus



Hi all,

Josh's talk is now available at:

http://snaga.org/01_Josh_Berkus.mp3

This file is very long, and an interpreter's voice
to interpret into Japanese is also recorded.

If you want to learn Japanese, please try it! :)

Thanks.

Luke Lonergan wrote:
 Drink Sake and eat some Yakitori for us folks in the west.  Maybe shake
 a robot hand or two while you're at it :-)

 - Luke

 On 2/16/06 2:14 PM, Bruce Momjian pgman@candle.pha.pa.us wrote:

 FYI, Josh Berkus and I are in Japan to give some presentations.  We
 return to the USA on February 23.

 --
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square,
 Pennsylvania 19073

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] In Japan with Josh Berkus

2006-02-16 Thread Luke Lonergan
Transcript:
 
introduction
 
Josh: Can people in the back hear me?  Thank you for hosting me in Tokyo, it's 
a lot of fun for me to come over here.  It is also an extremely exciting time 
to be a PostgreSQL developer.  It's just amazing how something that was a 
hobby, a sideline, um a ... thing that I and people like Bruce did in their 
spare time has become a major business.
 
So, I'm going to go over a little of where we're going and where we've been.
 
On November 8th of last year at the Open Source Database Conference in 
Frankfurt Germany, Peter Eisentraut, another member of the Postgres core team 
anounced the availability of Postgres 8.1.  This release was a milestone for 
us, a highpoint in a lot of ways.  From the features, from the amount of 
adoption and excitement that it's created, news coverage, the news coverage of 
Postgres in general
 
Now, we didn't start out with what we have in 8.1.  As you know, Postgres has 
been under development for a long time, over 20 years.  Since my involvement 
with Postgresql, started in 1998, I'm just going to talk about what we've done 
since we went on the Internet in 1996.  In fact, our 10th anniversary of going 
on the internet is coming up in July, and we'll be holding a small conference 
for PostgreSQL developers, for contributors in Toronto.
 
So, when Postgres, actually as Postgres95 first became available for people to 
download, from Postgres.org, from Postgres95.org, I don't remember what our 
website was called.  The first goal at that time was to stop it from crashing.  
A lot of that work was done by Bruce, who's in the back, and by Vim, one of our 
key developers at the time.  Now, I of course waited and joined the community 
after Postgres stopped crashing.  After that, the next thing we had to do was 
implement a lot of features that were considered standard on other SQL 
databases.  Things like left joins, the schema object, and stored procedures. 
Once we were good enough in terms of implementing business features and 
standard database features, we focused on majing the database perform better.  
Because most of our users at that time were telecommunications companies and 
internet service providers and similar companies, we focused on what's called 
online transaction processing.  And thanks partly to the design of Postgres 
with a few improvements, things like the free space map, we were quickly able 
to make Postgres measure up to even the largest commercial databases for online 
transaction processing.  The other big thing that started in those years, and 
didn't peak till recently, was the port to the Windows operating system.  That 
port was led by Bruce Momjian and involved a lot of Japanese contribuotrs to 
the Postgres database.  Having conquered online transaction processing, in the 
last year to year and a half we've moved on to data warehousing and very large 
databases.  And a lot of the features in 8.1 and some that will be coming in 
8.2 will be about data warehousing.
 
Now, what's coming in 2007 and beyond I'm not quite sure - a lot of it is up to 
you.  As an open source project, we go where our users and contributors want us 
to go.  I have a feeling that the that place is going to involve specialty 
database purposes and application integration.  But we'll see.
 
So, what's in 8.1 is a whole lot of features that we're pretty excited about.  
This includes major SQL features like 2-phase commit and user roles, large 
database and data warehousing features like index bitmap scan and table 
partitioning, faster transactions through improved multi-processor performance, 
shared row locks and faster GIST indexes.  We also were able to take care of a 
couple of things on our todo list that some of our users have been asking us 
for a very long time.  That includes more powerful and more standard functions 
and stored procedures.  And integrating the autovacuum utility into the backend 
of the Postgres database.  So, 2-phase commit, which took a couple of 
developers about 2 years, is heavily in demand by a small group, mostly in 
financial processing.  The concept is pretty simple, instead of simply 
commiting a transaction on a single server, you have two phases where both 
servers coordinate committing a transaction together.  The way we implemented 
it is when one server is ready to commit a transaction it sends a prepare to 
commit message to the other server, the other server acknowledges that with a 
prepared message, then when the transaction is ready, then when it's 
acknowledged, the master server sends a commit message.  And the second server 
acknowledges it.  Now the tricky part is what happens when one of these servers 
fails in the middle of this process. For example, what happens if this server 
fails, well the answer is that if this acknowledgement has not been received
 
... 18:58 into the recording.  I'll get the rest transcribed and maybe put in 
on the new bizgres network site at