Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Jens-Wolfhard Schicke
--On Donnerstag, Februar 16, 2006 10:39:45 -0800 Dann Corbit 
[EMAIL PROTECTED] wrote:



He refers to counting sort and radix sort (which comes in most
significant digit and least significant digit format).  These are also
called distribution (as opposed to comparison) sorts.

These sorts are O(n) as a function of the data size, but really they are
O(M*n) where M is the average key length and n is the data set size.
(In the case of MSD radix sort M is the average length to completely
differentiate strings)

So if you have an 80 character key, then 80*log(n) will only be faster

I suppose you meant 80 * n here?


than n*log(n) when you have 2^80th elements -- in other words -- never.
I think this is wrong. You can easily improve Radix sort by a constant if 
you don't take single bytes as the digits but rather k-byte values. At 
least 2 byte should be possible without problems. This would give you 40 * 
n time, not 80 * n. And you assume that the comparision of an 80-byte wide 
value only costs 1, which might (and in many cases will be imho) wrong. 
Actually it migh mean to compare 80 bytes as well, but I might be wrong.


What I think as the biggest problem is the digit representation necessary 
for Radix-Sort in cases of locales which sort without looking at spaces. I 
assume that would be hard to implement. The same goes for the proposed 
mapping of string values onto 4/8-byte values.


Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke  [EMAIL PROTECTED]
asco GmbH http://www.asco.de
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Martijn van Oosterhout
On Fri, Feb 17, 2006 at 09:18:39AM +0100, Jens-Wolfhard Schicke wrote:
 What I think as the biggest problem is the digit representation necessary 
 for Radix-Sort in cases of locales which sort without looking at spaces. I 
 assume that would be hard to implement. The same goes for the proposed 
 mapping of string values onto 4/8-byte values.

Actually, this is easy. The standard C library provides strxfrm() and
other locale toolkits like ICU provide ucol_getSortKey(). Windows
provides LCMapString(). Just pass each string through this and take the
first four bytes of the result to form your integer key.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Markus Schaber
Hi, David,

David Lang schrieb:

 In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
 sortKey as elsewhere suggested).  The sorting key doesn't need to be a
 one-to-one mapping.

 that would violate your second contraint ( f(a)==f(b) iff (a==b) )

no, it doesn't.

When both strings are equal, then the first characters are equal, too.

If they are not equal, the constraint condition does not match.

The first characters of the strings may be equal as f(a) may be equal to
f(b) as to the other constraint.

Markus

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Markus Schaber
Hi, Ron,

Ron schrieb:

 OK, so here's _a_ way (there are others) to obtain a mapping such that
  if a  b then f(a)  f (b) and
  if a == b then f(a) == f(b)
 
 Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
 integer; a 4KB row becomes a 32Kb integer; etc)
 Since even a 1TB table made of such rows can only have 256M - 512M
 possible values even if each row is unique, a 28b or 29b key is large
 enough to represent each row's value and relative rank compared to all
 of the others even if all row values are unique.
 
 By scanning the table once, we can map say 001h (Hex used to ease
 typing) to the row with the minimum value and 111h to the row with
 the maximum value as well as mapping everything in between to their
 appropriate keys.  That same scan can be used to assign a pointer to
 each record's location.

But with a single linear scan, this cannot be accomplished, as the table
contents are neither sorted nor distributed linearly between the minimum
and the maximum.

For this mapping, you need a full table sort.

 That initial scan to set up the keys is expensive, but if we wish that
 cost can be amortized over the life of the table so we don't have to pay
 it all at once.  In addition, once we have created those keys, then can
 be saved for later searches and sorts.

But for every update or insert, you have to resort the keys, which is
_very_ expensive as it basically needs to update a huge part of the table.

Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Ron

At 04:24 AM 2/17/2006, Ragnar wrote:

On fös, 2006-02-17 at 01:20 -0500, Ron wrote:

 OK, so here's _a_ way (there are others) to obtain a mapping such that
   if a  b then f(a)  f (b) and
   if a == b then f(a) == f(b)

 By scanning the table once, we can map say 001h (Hex used to ease
 typing) to the row with the minimum value and 111h to the row
 with the maximum value as well as mapping everything in between to
 their appropriate keys.  That same scan can be used to assign a
 pointer to each record's location.

This step is just as expensive as the original 
sort you want to replace/improve.


Why do you think that?  External sorts involve 
the equivalent of multiple scans of the table to 
be sorted, sometimes more than lgN (where N is 
the number of items in the table to be 
sorted).  Since this is physical IO we are 
talking about, each scan is very expensive, and 
therefore 1 scan is going to take considerably 
less time than = lgN scans will be.



If you want to keep this mapping saved as a sort 
of an index, or as part ot each row data, this 
will make the cost of inserts and updates enormous.


Not sure you've got this right either.  Looks to 
me like we are adding a = 32b quantity to each 
row.  Once we know the mapping, incrementally 
updating it upon insert or update would seem to 
be simple matter of a fast search for the correct 
ranking [Interpolation search, which we have all 
the needed data for, is O(lglgN).  Hash based 
search is O(1)]; plus an increment/decrement of 
the key values greater/less than the key value of 
the row being inserted / updated.  Given than we 
are updating all the keys in a specific range 
within a tree structure, that update can be done 
in O(lgm) (where m is the number of records affected).



  We can now sort the key+pointer pairs instead of the actual data and
 use an optional final pass to rearrange the actual rows if we wish.

How are you suggesting this mapping be accessed? 
If the mapping is kept separate from the tuple 
data, as in an index, then how will you look up the key?
???  We've effectively created a data set where 
each record is a pointer to a DB row plus its 
key.  We can now sort the data set by key and 
then do an optional final pass to rearrange the 
actual DB rows if we so wish.  Since that final 
pass is very expensive, it is good that not all 
use scenarios will need that final pass.


The amount of storage required to sort this 
representation of the table rather than the 
actual table is so much less that it turns an 
external sorting problem into a internal sorting 
problem with an optional final pass that is =1= 
scan (albeit one scan with a lot of seeks and 
data movement).  This is a big win.  It is a 
variation of a well known technique.  See Sedgewick, Knuth, etc.




 That initial scan to set up the keys is expensive, but if we wish
 that cost can be amortized over the life of the table so we don't
 have to pay it all at once.  In addition, once we have created those
 keys, then can be saved for later searches and sorts.

What is the use case where this would work better than a
regular btree index ?
Again, ???  btree indexes address different 
issues.  They do not in any way help create a 
compact data representation of the original data 
that saves enough space so as to turn an external 
ranking or sorting problem into an internal one.



Ron 




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Ron

At 05:19 AM 2/17/2006, Markus Schaber wrote:

Hi, Ron,

Ron schrieb:

 OK, so here's _a_ way (there are others) to obtain a mapping such that
  if a  b then f(a)  f (b) and
  if a == b then f(a) == f(b)

 Pretend each row is a integer of row size (so a 2KB row becomes a 16Kb
 integer; a 4KB row becomes a 32Kb integer; etc)
 Since even a 1TB table made of such rows can only have 256M - 512M
 possible values even if each row is unique, a 28b or 29b key is large
 enough to represent each row's value and relative rank compared to all
 of the others even if all row values are unique.

 By scanning the table once, we can map say 001h (Hex used to ease
 typing) to the row with the minimum value and 111h to the row with
 the maximum value as well as mapping everything in between to their
 appropriate keys.  That same scan can be used to assign a pointer to
 each record's location.

But with a single linear scan, this cannot be accomplished, as the table
contents are neither sorted nor distributed linearly between the minimum
and the maximum.
So what?  We are talking about key assignment here, not anything that 
requires physically manipulating the actual DB rows.

One physical IO pass should be all that's needed.



For this mapping, you need a full table sort.
One physical IO pass should be all that's needed.  However, let's 
pretend you are correct and that we do need to sort the table to get 
the key mapping.  Even so, we would only need to do it =once= and 
then we would be able to use and incrementally update the results 
forever afterward.  Even under this assumption, one external sort to 
save all subsequent such sorts seems well worth it.


IOW, even if I'm wrong about the initial cost to do this; it is still 
worth doing ;-)




 That initial scan to set up the keys is expensive, but if we wish that
 cost can be amortized over the life of the table so we don't have to pay
 it all at once.  In addition, once we have created those keys, then can
 be saved for later searches and sorts.

But for every update or insert, you have to resort the keys, which is
_very_ expensive as it basically needs to update a huge part of the table.


??? You do not need to resort already ordered data to insert a new 
element into it such that the data stays ordered!  Once we have done 
the key ordering operation once, we should not ever need to do it 
again on the original data.  Else most sorting algorithms wouldn't work ;-)



Ron 




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Updated email signature

2006-02-17 Thread Bruce Momjian
I have updated my email signature because I think it is no longer
necessary to list contact information now that my home page has all that
information.  (At the time I created it, email was becoming popular but
personal web sites were rare.)

Also, I wanted to mention that I work for SRA OSS more prominently.  I
hope that is OK with everyone.

-- 
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Martijn van Oosterhout
On Fri, Feb 17, 2006 at 08:23:40AM -0500, Ron wrote:
 For this mapping, you need a full table sort.
 One physical IO pass should be all that's needed.  However, let's 
 pretend you are correct and that we do need to sort the table to get 
 the key mapping.  Even so, we would only need to do it =once= and 
 then we would be able to use and incrementally update the results 
 forever afterward.  Even under this assumption, one external sort to 
 save all subsequent such sorts seems well worth it.
 
 IOW, even if I'm wrong about the initial cost to do this; it is still 
 worth doing ;-)

I think you're talking about something different here. You're thinking
of having the whole table sorted and when you add a new value you add
it in such a way to keep it sorted. The problem is, what do you sort it
by? If you've sorted the table by col1, then when the user does ORDER
BY col2 it's useless.

Indeed, this is what btrees do, you store the order of the table
seperate from the data. And you can store multiple orders. But even
then, when someone does ORDER BY lower(col1), it's still useless.

And you're right, we still need to do the single mass sort in the
beginning, which is precisely what we're trying to optimise here.

 ??? You do not need to resort already ordered data to insert a new 
 element into it such that the data stays ordered!  Once we have done 
 the key ordering operation once, we should not ever need to do it 
 again on the original data.  Else most sorting algorithms wouldn't work ;-)

We already do this with btree indexes. I'm not sure what you are
proposing that improves on that.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Scott Lamb
On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). ...and with some work, floats (I think just the exponent would work, if nothing else). bytea. Probably just about anything.Interesting. If you abandon the idea that collisions should be impossible (they're not indexes) or extremely rare (they're not hashes), it's pretty easy to come up with a decent hint to avoid a lot of dereferences. --Scott Lamb http://www.slamb.org/ 

Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Martijn van Oosterhout
On Fri, Feb 17, 2006 at 08:18:41AM -0800, Scott Lamb wrote:
 On Feb 16, 2006, at 2:17 PM, Mark Lewis wrote:
 Data types which could probably provide a useful function for f  
 would be
 int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).
 
 ...and with some work, floats (I think just the exponent would work,  
 if nothing else). bytea. Probably just about anything.
 
 Interesting. If you abandon the idea that collisions should be  
 impossible (they're not indexes) or extremely rare (they're not  
 hashes), it's pretty easy to come up with a decent hint to avoid a  
 lot of dereferences.

Yep, pretty much for any datatype you create a mapping function to map
it to a signed int32. All you have to guarentee is that f(a)  f(b)
implies that a  b. Only if f(a) == f(b) do you need to compare a and
b.

You then change the sorting code to have an array of (Datum,int32)
(ouch, double the storage) where the int32 is the f(Datum). And in the
comparison routines you first check the int32. If they give an order
you're done. On match you do the full comparison.

For integer types (int2,int4,int8,oid) the conversion is straight
forward. For float you'd use the exponent and the first few bits of the
mantissa. For strings you'd have to bail, or use a strxfrm equivalent.
NULL would be INT_MAX pr INT_MIN depending on where you want it. Thing
is, even if you don't have such a function and always return zero, the
results will still be right.

Not a new idea, but it would be very nice to implement. If would
produce nice speedups for type where comparisons are expensive. And
more importantly, the bulk of the comparisons can be moved inline and
make the whole cache-friendlyness discussed here much more meaningful.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] Updated email signature

2006-02-17 Thread Joshua D. Drake

Bruce Momjian wrote:

I have updated my email signature because I think it is no longer
necessary to list contact information now that my home page has all that
information.  (At the time I created it, email was becoming popular but
personal web sites were rare.)

Also, I wanted to mention that I work for SRA OSS more prominently.  I
hope that is OK with everyone.
  

I can see why anyone would have a problem with this. Have you
see my signature ;)

Joshua D. Drake

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Ron

At 10:53 AM 2/17/2006, Martijn van Oosterhout wrote:

On Fri, Feb 17, 2006 at 08:23:40AM -0500, Ron wrote:
 For this mapping, you need a full table sort.
 One physical IO pass should be all that's needed.  However, let's
 pretend you are correct and that we do need to sort the table to get
 the key mapping.  Even so, we would only need to do it =once= and
 then we would be able to use and incrementally update the results
 forever afterward.  Even under this assumption, one external sort to
 save all subsequent such sorts seems well worth it.

 IOW, even if I'm wrong about the initial cost to do this; it is still
 worth doing ;-)

I think you're talking about something different here. You're thinking
of having the whole table sorted and when you add a new value you add
it in such a way to keep it sorted. The problem is, what do you sort it
by? If you've sorted the table by col1, then when the user does ORDER
BY col2 it's useless.

No, I'm thinking about how to represent DB row data in such a way that
a= we use a compact enough representation that we can sort internally 
rather than externally.
b= we do the sort once and avoid most of the overhead upon subsequent 
similar requests.


I used the example of sorting on the entire row to show that the 
approach works even when the original record being sorted by is very large.
All my previous comments on this topic hold for the case where we are 
sorting on only part of a row as well.


If all you are doing is sorting on a column or a few columns, what 
I'm discussing is even easier since treating the columns actually 
being used a sort criteria as integers rather than the whole row as 
an atomic unit eats less resources during the key creation and 
mapping process.  If the row is 2-4KB in size, but we only care about 
some combination of columns that only takes on = 2^8 or = 2^16 
different values, then what I've described will be even better than 
the original example I gave.


Basically, the range of a key is going to be restricted by how
a= big the field is that represents the key (columns and such are 
usually kept narrow for performance reasons) or
b= big each row is (the more space each row takes, the fewer rows fit 
into any given amount of storage)

c= many rows there are in the table
Between the conditions, the range of a key tends to be severely 
restricted and therefore use much less space than sorting the actual 
DB records would.  ...and that gives us something we can take advantage of.




Indeed, this is what btrees do, you store the order of the table
seperate from the data. And you can store multiple orders. But even
then, when someone does ORDER BY lower(col1), it's still useless.

And you're right, we still need to do the single mass sort in the
beginning, which is precisely what we're trying to optimise here.

Sigh.  My points were:
1= we have information available to us that allows us to map the rows 
in such a way as to turn most external sorts into internal sorts, 
thereby avoiding the entire external sorting problem in those 
cases.  This is a huge performance improvement.


2= that an external sort is =NOT= required for initial key 
assignment, but even if it was it would be worth it.


3= that this is a variation of a well known technique so I'm not 
suggesting heresy here.



Ron 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] Need pointers to standard pg database(s) for testing

2006-02-17 Thread Ron

I assume we have such?

Ron



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Need pointers to standard pg database(s) for

2006-02-17 Thread Scott Marlowe
On Fri, 2006-02-17 at 10:51, Ron wrote:
 I assume we have such?

Depends on what you wanna do.
For transactional systems, look at some of the stuff OSDL has done.

For large geospatial type stuff, the government is a good source, like
www.usgs.gov or the fcc transmitter database.

There are other ones out there.  Really depends on what you wanna test.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Need pointers to standard pg database(s) for testing

2006-02-17 Thread Michael Paesold

Ron wrote:

I assume we have such?


You could look at the Sample Databases project on pgfoundry:
http://pgfoundry.org/projects/dbsamples/

Best Regards,
Michael Paesold


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Mark Lewis
On Thu, 2006-02-16 at 21:33 -0800, David Lang wrote:
  In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
  sortKey as elsewhere suggested).  The sorting key doesn't need to be a
  one-to-one mapping.
 
 that would violate your second contraint ( f(a)==f(b) iff (a==b) )
 
 if you could drop that constraint (the cost of which would be extra 'real' 
 compares within a bucket) then a helper function per datatype could work 
 as you are talking.

I think we're actually on the same page here; you're right that the
constraint above ( f(a)==f(b) iff a==b ) can't be extended to data types
with more than 32 bits of value space.  But the constraint I listed was
actually:

if a==b then f(a)==f(b)

Which doesn't imply 'if and only if'.  It's a similar constraint to
hashcodes; the same value will always have the same hash, but you're not
guaranteed that the hashcodes for two distinct values will be unique.

-- Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] who is pgsql in cvs

2006-02-17 Thread Christian Bird
Hello,

I'm a grad student doing some work correlating mailinglist activity
with cvs activity for a datamining conference and I'm using postgres
as one of the OSS projects under study.  For most names of cvs
committers, I've been able to determine what e-mail addresses they
used on this mailinglist.  The one cvs identity that I'm having a
tough time with is pgsql.  Is this cvs committer a person or an
account used for some special purpose?  It looks to have a number of
commits over the life of the repository.  If anyone could help me out
by letting me know who pgsql is or if it isn't a person and fulfills
some role, that would be greatly appreciated!  Thanks a bunch!

-- Chris

--
Christian Bird
[EMAIL PROTECTED]
Computer Science Dept.
UC Davis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Ragnar
On fös, 2006-02-17 at 08:01 -0500, Ron wrote:
 At 04:24 AM 2/17/2006, Ragnar wrote:
 On fös, 2006-02-17 at 01:20 -0500, Ron wrote:
  
   OK, so here's _a_ way (there are others) to obtain a mapping such that
 if a  b then f(a)  f (b) and
 if a == b then f(a) == f(b)
 
   By scanning the table once, we can map say 001h (Hex used to ease
   typing) to the row with the minimum value and 111h to the row
   with the maximum value as well as mapping everything in between to
   their appropriate keys.  That same scan can be used to assign a
   pointer to each record's location.
 
 This step is just as expensive as the original 
 sort you want to replace/improve.
 
 Why do you think that?  External sorts involve 
 the equivalent of multiple scans of the table to 
 be sorted, sometimes more than lgN (where N is 
 the number of items in the table to be 
 sorted).  Since this is physical IO we are 
 talking about, each scan is very expensive, and 
 therefore 1 scan is going to take considerably 
 less time than = lgN scans will be.

Call me dim, but please explain exactly how you are going
to build this mapping in one scan. Are you assuming
the map will fit in memory? 

 
 
 If you want to keep this mapping saved as a sort 
 of an index, or as part ot each row data, this 
 will make the cost of inserts and updates enormous.
 
 Not sure you've got this right either.  Looks to 
 me like we are adding a = 32b quantity to each 
 row.  Once we know the mapping, incrementally 
 updating it upon insert or update would seem to 
 be simple matter of a fast search for the correct 
 ranking [Interpolation search, which we have all 
 the needed data for, is O(lglgN).  Hash based 
 search is O(1)]; plus an increment/decrement of 
 the key values greater/less than the key value of 
 the row being inserted / updated.  Given than we 
 are updating all the keys in a specific range 
 within a tree structure, that update can be done 
 in O(lgm) (where m is the number of records affected).

Say again ?
Let us say you have 1 billion rows, where the
column in question contains strings like 
baaaaaa
baaaaab
baaaaac
...
not necessarily in this order on disc of course

The minimum value would be keyed as 0001h,
the next one as 0002h and so on.

Now insert new value 'a'

Not only will you have to update 1 billion records,
but also all the values in your map.

please explain

gnari



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-17 Thread Tom Lane
Mark Lewis [EMAIL PROTECTED] writes:
 I think we're actually on the same page here; you're right that the
 constraint above ( f(a)==f(b) iff a==b ) can't be extended to data types
 with more than 32 bits of value space.  But the constraint I listed was
 actually:

 if a==b then f(a)==f(b)

I believe Martijn had it right: the important constraint is

f(a)  f(b) implies a  b

which implies by commutativity

f(a)  f(b) implies a  b

and these two together imply

a == b implies f(a) == f(b)

Now you can't do any sorting if you only have the equality rule, you
need the inequality rule.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] who is pgsql in cvs

2006-02-17 Thread Marc G. Fournier

On Fri, 17 Feb 2006, Christian Bird wrote:


Hello,

I'm a grad student doing some work correlating mailinglist activity
with cvs activity for a datamining conference and I'm using postgres
as one of the OSS projects under study.  For most names of cvs
committers, I've been able to determine what e-mail addresses they
used on this mailinglist.  The one cvs identity that I'm having a
tough time with is pgsql.  Is this cvs committer a person or an
account used for some special purpose?  It looks to have a number of
commits over the life of the repository.  If anyone could help me out
by letting me know who pgsql is or if it isn't a person and fulfills
some role, that would be greatly appreciated!  Thanks a bunch!


pgsql == scrappy (me) ... its what I use for doing release packaging / 
tagging ...



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] who is pgsql in cvs

2006-02-17 Thread Tom Lane
Christian Bird [EMAIL PROTECTED] writes:
 I'm a grad student doing some work correlating mailinglist activity
 with cvs activity for a datamining conference and I'm using postgres
 as one of the OSS projects under study.  For most names of cvs
 committers, I've been able to determine what e-mail addresses they
 used on this mailinglist.  The one cvs identity that I'm having a
 tough time with is pgsql.  Is this cvs committer a person or an
 account used for some special purpose?

That's Marc Fournier in his capacity as packager.  He also commits
as scrappy --- not sure if he has any hard-and-fast rule about which
account to use for what, but most of the time pgsql commits are just
for activities associated with tagging releases.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create

2006-02-17 Thread Gregory Maxwell
On 2/17/06, Ragnar [EMAIL PROTECTED] wrote:
 Say again ?
 Let us say you have 1 billion rows, where the
 column in question contains strings like
 baaaaaa
 baaaaab
 baaaaac
 ...
 not necessarily in this order on disc of course

 The minimum value would be keyed as 0001h,
 the next one as 0002h and so on.

 Now insert new value 'a'

 Not only will you have to update 1 billion records,
 but also all the values in your map.

 please explain

No comment on the usefulness of the idea overall.. but the solution
would be to insert with the colliding value of the existing one lesser
than it..

It will falsly claim equal, which you then must fix with a second
local sort which should be fast because you only need to sort the
duplicates/false dupes.  If you insert too much then this obviously
becomes completely useless.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updated email signature

2006-02-17 Thread Bruce Momjian
Joshua D. Drake wrote:
 Bruce Momjian wrote:
  I have updated my email signature because I think it is no longer
  necessary to list contact information now that my home page has all that
  information.  (At the time I created it, email was becoming popular but
  personal web sites were rare.)
 
  Also, I wanted to mention that I work for SRA OSS more prominently.  I
  hope that is OK with everyone.

 I can see why anyone would have a problem with this. Have you
 see my signature ;)
 
 -- 
 The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 Managed Services, Shared and Dedicated Hosting
 Co-Authors: PLphp, PLperl - http://www.commandprompt.com/

What!  Can't you embed an image in there too!  :-)

Anyway, thinking about it, it isn't that personal web pages weren't
popular when I created the original signature, but that the web itself
didn't exist yet, at least beyond research sites.  

My first post to Usenet was May 1991:


http://groups.google.com/group/comp.unix.admin/browse_frm/thread/e14966d30496969c/e28f1e5bf544f540?lnk=stq=momjianrnum=5hl=en#e28f1e5bf544f540

And the world wide web came later:

http://www.kevcom.com/words/guide/guide.04.html

How off topic can I get?  :-)

-- 
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Updated email signature

2006-02-17 Thread Joshua D. Drake



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


What!  Can't you embed an image in there too!  :-)


You know... I can which is scary... and every single Thunderbird
and Outlook user would see it unless they turned it off ;)

Joshua D. Drake
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Updated email signature

2006-02-17 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 My first post to Usenet was May 1991:

Newbie ;-)

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Updated email signature

2006-02-17 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
  My first post to Usenet was May 1991:
 
 Newbie ;-)

OK, don't taunt us.  What date do you have?  :-)

-- 
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updated email signature

2006-02-17 Thread Marc G. Fournier

On Fri, 17 Feb 2006, Bruce Momjian wrote:


Joshua D. Drake wrote:

Bruce Momjian wrote:

I have updated my email signature because I think it is no longer
necessary to list contact information now that my home page has all that
information.  (At the time I created it, email was becoming popular but
personal web sites were rare.)

Also, I wanted to mention that I work for SRA OSS more prominently.  I
hope that is OK with everyone.



I can see why anyone would have a problem with this. Have you
see my signature ;)

--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


What!  Can't you embed an image in there too!  :-)

Anyway, thinking about it, it isn't that personal web pages weren't
popular when I created the original signature, but that the web itself
didn't exist yet, at least beyond research sites.

My first post to Usenet was May 1991:


http://groups.google.com/group/comp.unix.admin/browse_frm/thread/e14966d30496969c/e28f1e5bf544f540?lnk=stq=momjianrnum=5hl=en#e28f1e5bf544f540


I can go back to April 1991 :)

http://listserv.nodak.edu/cgi-bin/wa.exe?A2=ind9104dL=minix-lF=S=P=5347

But that was back in the UUCP days ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Updated email signature

2006-02-17 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Tom Lane wrote:
 Bruce Momjian pgman@candle.pha.pa.us writes:
 My first post to Usenet was May 1991:
 
 Newbie ;-)

 OK, don't taunt us.  What date do you have?  :-)

Not sure, but I remember being netnews admin for CMU in '87.
(Grad student slave labor position, mind you, not prestigious.
That was before anyone cared enough about netnews to have a real
staff person take care of it...)  I'd guess I first got involved
in Usenet a year or two before that.  The oldest thing I can
actually document at the moment is the CMU coke-machine info I posted
in 1989, eg http://homepages.inf.ed.ac.uk/jhb/silly/cokemachine.htm
(another slave-labor position, but at least the loaders got free
coke out of it)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Updated email signature

2006-02-17 Thread Tom Lane
I said:
 The oldest thing I can actually document at the moment 

After further digging, I found something older: conclusive proof that
I was present at the invention of the smiley.  I've been heard to assert
that before, but apparently someone has actually managed to unearth
archives that had been thought long gone:

http://www.authentichistory.com/documents/1980s/smiley/complete_smiley_thread.html

See Scott Fahlman's post on 19-Sep-82 11:44, and note the rapidity with
which the idea spread below.  I can be seen answering some unrelated
question about halfway down the page.

The CMU bboard system could legitimately be called a forerunner of
usenet, though since it didn't travel further than the CS department's
local net, it certainly wasn't the sort of community we now associate
with netnews.  Or then again maybe it was --- the fact that the
department members had a need to invent such a symbol should give you
some flavor of the place.

Anyone able to beat that?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updated email signature

2006-02-17 Thread Marc G. Fournier

On Fri, 17 Feb 2006, Tom Lane wrote:


I said:

The oldest thing I can actually document at the moment


After further digging, I found something older: conclusive proof that
I was present at the invention of the smiley.  I've been heard to assert
that before, but apparently someone has actually managed to unearth
archives that had been thought long gone:

http://www.authentichistory.com/documents/1980s/smiley/complete_smiley_thread.html

See Scott Fahlman's post on 19-Sep-82 11:44, and note the rapidity with
which the idea spread below.  I can be seen answering some unrelated
question about halfway down the page.

The CMU bboard system could legitimately be called a forerunner of
usenet, though since it didn't travel further than the CS department's
local net, it certainly wasn't the sort of community we now associate
with netnews.  Or then again maybe it was --- the fact that the
department members had a need to invent such a symbol should give you
some flavor of the place.

Anyone able to beat that?


Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Updated email signature

2006-02-17 Thread Michael Fuhr
On Fri, Feb 17, 2006 at 09:35:32PM -0400, Marc G. Fournier wrote:
 Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)

Anybody know some reasonable postgresql.conf settings for a system
that starts up with

  Cass?
  Memory Size?

'cuz I still have one :-)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Updated email signature

2006-02-17 Thread Christopher Browne
Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] (Tom Lane) would 
write:
 I said:
 The oldest thing I can actually document at the moment 

 After further digging, I found something older: conclusive proof that
 I was present at the invention of the smiley.  I've been heard to assert
 that before, but apparently someone has actually managed to unearth
 archives that had been thought long gone:

 http://www.authentichistory.com/documents/1980s/smiley/complete_smiley_thread.html

 See Scott Fahlman's post on 19-Sep-82 11:44, and note the rapidity with
 which the idea spread below.  I can be seen answering some unrelated
 question about halfway down the page.

 The CMU bboard system could legitimately be called a forerunner of
 usenet, though since it didn't travel further than the CS department's
 local net, it certainly wasn't the sort of community we now associate
 with netnews.  Or then again maybe it was --- the fact that the
 department members had a need to invent such a symbol should give you
 some flavor of the place.

 Anyone able to beat that?

Crud, I was hoping that my posts listed on Google from July 1986 would
beat the 1987 dates you mentioned, but evidently not :-(.
-- 
output = reverse(gro.gultn @ enworbbc)
http://cbbrowne.com/info/lsf.html
In the case of CAPP, an EAL4 evaluation tells you everything you need
to   know.  It tells you   that  Microsoft spent  millions  of dollars
producing documentation that   shows   that  Windows 2000 meets an
inadequate  set of  requirements,  and  that  you can have  reasonably
strong confidence that this is the case. -- Jonathan S. Shapiro

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Updated email signature

2006-02-17 Thread Marc G. Fournier

On Fri, 17 Feb 2006, Michael Fuhr wrote:


On Fri, Feb 17, 2006 at 09:35:32PM -0400, Marc G. Fournier wrote:

Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)


Anybody know some reasonable postgresql.conf settings for a system
that starts up with

 Cass?
 Memory Size?

'cuz I still have one :-)


I have a PDP-II/360 sitting in a storage locker right now that one of 
these days I'm going to re-wire and find platters for :)  rats got at the 
wires *sigh*



Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updated email signature

2006-02-17 Thread Joshua D. Drake





Anyone able to beat that?


Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)



And Marc hands himself a foot gun... I was 9 years old in 82.

Joshua D. Drake




Marc G. Fournier   Hub.Org Networking Services 
(http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 
7615664



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Updated email signature

2006-02-17 Thread Joshua D. Drake





Anyone able to beat that?


Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)



At Marc hands himself a foot gun... I was 9 years old in 82.

Joshua D. Drake




Marc G. Fournier   Hub.Org Networking Services 
(http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 
7615664



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Updated email signature

2006-02-17 Thread Jonah H. Harris
/me was 1 year old in 1982On 2/17/06, Joshua D. Drake [EMAIL PROTECTED] wrote:
 Anyone able to beat that? Sorry, I was still in Junior High in '82 :(Man, you are *old* :)And Marc hands himself a foot gun... I was 9 years old in 82.
Joshua D. Drake  Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: 
[EMAIL PROTECTED] Yahoo!: yscrappyICQ: 7615664--The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated HostingCo-Authors: PLphp, PLperl - http://www.commandprompt.com/---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
-- Jonah H. Harris, Database Internals ArchitectEnterpriseDB Corporation732.331.1324


Re: [HACKERS] Updated email signature

2006-02-17 Thread Marc G. Fournier

On Fri, 17 Feb 2006, Joshua D. Drake wrote:






Anyone able to beat that?


Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)



At Marc hands himself a foot gun... I was 9 years old in 82.


damn, now *I* feel old :)


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Updated email signature

2006-02-17 Thread Oleg Bartunov

On Sat, 18 Feb 2006, Marc G. Fournier wrote:


On Fri, 17 Feb 2006, Joshua D. Drake wrote:






Anyone able to beat that?


Sorry, I was still in Junior High in '82 :(  Man, you are *old* :)



At Marc hands himself a foot gun... I was 9 years old in 82.


damn, now *I* feel old :)


don't feel upset, that time I already learned how to control missile :)




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Updated email signature

2006-02-17 Thread Andrej Ricnik-Bay
 don't feel upset, that time I already learned how to control missile :)
Just curious ... how old does one need to be to be allowed
that? :)  I was of legal drinking age then, btw ..

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match