date:20060910

Re: [HACKERS] Foreign keys

2006-09-10 Thread Chris Mair


 
 First of all i'de like to apolagize cause my poor english. After this,
 i shuould say that i beleavee a year ago i brought this problem to the
 community but i donn't remember some answering about it. The problem
 is:
  
 Every time a users misses a external refrenced key the PGSql raises an
 exception. 
 Well as far as i realise if we had 5 or 10 Foreign keys
 during an Insert/Update transaction only exception should be raised
 reporting all erros/messages after last external refrenced field
 missed at one time,not one by one.
 Well, in order to implement this idea we will need to desable the
 built-in refencial integrety and build it all by your self- all the
 validation (look-ups etc..) before insert/update If tg_op='insert' or
 tg_op='update'  then as people do with non relational Databases - all
 hand-made. Well, this is very hard to beleave!!! I must be missing
 something.
  
 Please i'must be wrong can some one explain me what i'm missing?

When there is a constraint violation, the current transaction is rolled
back anyhow. 

What's the purpose of letting you insert 1000 records, then, at the end
say: hah, all is rolled back becauase the 2nd record was invalid.
PG justly throws the exception immediately to let you know it's futile
inserting 998 more records.

I don't see a problem here.

Bye,
Chris.



-- 

Chris Mair
http://www.1006.org


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Foreign keys

2006-09-10 Thread Stefan Kaltenbrunner

MAR - Secretariado Geral wrote:
 Hi everybody,
  
 First of all i'de like to apolagize cause my poor english. After this, i
 shuould say that i beleavee a year ago i brought this problem to the
 community but i donn't remember some answering about it. The problem is:
  
 Every time a users misses a external refrenced key the PGSql raises an
 exception.
 Well as far as i realise if we had 5 or 10 Foreign keys
 during an Insert/Update transaction only exception should be raised
 reporting all erros/messages after last external refrenced field missed
 at one time,not one by one.
 Well, in order to implement this idea we will need to desable the
 built-in refencial integrety and build it all by your self- all the
 validation (look-ups etc..) before insert/update If tg_op='insert' or
 tg_op='update'  then as people do with non relational Databases - all
 hand-made. Well, this is very hard to beleave!!! I must be missing
 something.
  
 Please i'must be wrong can some one explain me what i'm missing?

I'm not sure what you are complining about exactly but maybe you want to
declare your FK as DEFERRABLE INITIALLY DEFERRED ?
That way the constraint checking happens at the end of the transaction
and not immediately


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Foreign keys

2006-09-10 Thread Gregory Stark

Chris Mair [EMAIL PROTECTED] writes:

 What's the purpose of letting you insert 1000 records, then, at the end
 say: hah, all is rolled back becauase the 2nd record was invalid.
 PG justly throws the exception immediately to let you know it's futile
 inserting 998 more records.

Well there's plenty of cases where people want that and we support it with
deferred constraints.

However the OP sounds like he wants something else. I think what he wants is
when he inserts a record and it fails due to foreign key constraints to report
all the violated constraints, not just the first one found.

I never run into this problem myself because I think of foreign key
constraints as more akin to C assertions. They're a backstop to make sure the
application is working correctly. I never write code that expects foreign key
constraint errors and tries to handle them.

But there's nothing saying that's the only approach. The feature request seems
pretty reasonable to me. I'm not sure how hard it would be with the ri
triggers as written. I'm not sure there's anywhere for triggers to store their
return values so I'm unclear this can even be done using triggers.

But to answer his original question: yes that's the way Postgres works and if
you want to report all the violations together you'll have to check them
yourself.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Foreign keys

2006-09-10 Thread Stephan Szabo

On Sun, 10 Sep 2006, Gregory Stark wrote:

 Chris Mair [EMAIL PROTECTED] writes:

  What's the purpose of letting you insert 1000 records, then, at the end
  say: hah, all is rolled back becauase the 2nd record was invalid.
  PG justly throws the exception immediately to let you know it's futile
  inserting 998 more records.

 Well there's plenty of cases where people want that and we support it with
 deferred constraints.

 However the OP sounds like he wants something else. I think what he wants is
 when he inserts a record and it fails due to foreign key constraints to report
 all the violated constraints, not just the first one found.

 I never run into this problem myself because I think of foreign key
 constraints as more akin to C assertions. They're a backstop to make sure the
 application is working correctly. I never write code that expects foreign key
 constraint errors and tries to handle them.

 But there's nothing saying that's the only approach. The feature request seems
 pretty reasonable to me. I'm not sure how hard it would be with the ri
 triggers as written. I'm not sure there's anywhere for triggers to store their
 return values so I'm unclear this can even be done using triggers.

I think if we were going to do this that all the constraint violations for
unique, not null, check and foreign keys should be handled similarly, so
we'd probably want something more general than just a way for the ri
triggers to do this. I don't have a good idea of the right solution for
that though.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Foreign keys

2006-09-10 Thread Tom Lane

Stephan Szabo [EMAIL PROTECTED] writes:
 I think if we were going to do this that all the constraint violations for
 unique, not null, check and foreign keys should be handled similarly, so
 we'd probably want something more general than just a way for the ri
 triggers to do this. I don't have a good idea of the right solution for
 that though.

It seems pretty unwieldy to me: it's not hard to imagine a long INSERT
throwing millions of separate foreign-key errors before it's done, for
instance.  And then there's the cascading-errors problem, ie, bogus
reports that happen because some prior step failed ... not least being
your client crashing and failing to tell you anything about what
happened because it ran out of memory for the error list.

My advice is to rethink the client code that wants such a behavior.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module

2006-09-10 Thread Jeremy Kronuz

 Tom, I've checked the version in the cvs and I had made significant 
changes

 from that version.

Hm, it sounds like I guessed wrong about which version was newer ... is
there something flaky about your machine's system clock?  The file
timestamps in the two tarballs definitely pointed the other way.


Nope, not a system clock problem, not that I know of... I don't know what 
the problem was... anyway, with the patch I made it should now be fully 
functional.



 I fixed some major bugs that prevented some ISBN numbers
 from working. I've merged the changes made for the cvs with my own, 
hoping I

 didn't miss anything that was already fixed in the cvs... I noticed you
 changed some log messages and that you added GET_STR and GET_TEXT. 
indeed

 there was that thing that took me some time to figure in both
 ean13_cast_to_text() and isn_cast_to_text() functions, not changing to a
 valid TEXT the return value; but I did not know about the other
 *_cast_from_text() functions having the problem of the missing GET_STR() 
(I

 was using VARDATA()).

Yeah, your to/from text functions simply did not work --- the VARDATA
hack for example presumes that the data part of a text value is
null-terminated, which it isn't.  In any case it seems pretty likely
that the content of TEXT datums will change someday to allow better
charset/collation support, so you really are better off calling
textin/textout if possible rather than assuming you know the
representation of type TEXT.  I copied the GET_STR/GET_TEXT macros
from some other contrib module ... you're free to do it differently
if you want, but in any case you have to deal with the fact that TEXT
is *not* a C-string.


Yeah, that's what I painfully learned, specially with the *_cast_to_text() 
functions... those were definitely crashing the server. I didn't know the 
VARDATA() hack could fail at some point (even in the patch, I already 
replaced it with GET_STR(), just in case)... and since it took me so long to 
figure out why the other two functions where crashing, and not knowing what 
exactly was that you made with the GET_TEXT() macro, I just decided to use 
my own patched version... I suppose the use of GET_TEXT() is better, as it's 
smaller and makes the code easier to follow; perhaps we might want to change 
those two functions back the way you had them (using GET_TEXT())



As for the log message changes, there's room for debate about that, but
the way you had it struck me as far too chatty.  The use-case for weak
checking is that you have a large pile of data you don't want to
validate right now, correct?  So why would you want a warning for every
single bad datum during the import?  Also, all the fancy formatting in
that one big warning was directly against our message style guidelines.


That's right, that's the idea behind weak mode... I've been in that 
position and I found the weak mode very useful, so I thought it would be 
nice to have it as a feature that others could benefit from. And yes, I know 
the messages were far too chatty, as you rightfully said so. In my latest 
version (before the patch) I took away some of the messages, and I also 
shortened the *big* warning to a single line but I think not having so many 
messages it's truly better; I wasn't sure when the messages were shown (this 
is my first module, you know) but it seems those removed messages were 
really not needed at all.


I'm very happy to have contributed a bit to the growth and improvement of 
PostgreSQL, and I look forward the day I can make bigger/better 
contributions. And Tom, were you able to apply the patch without problems 
('cause this is one of the first times I post a patch in a mailing list.)


Regards,
Kronuz



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Foreign keys

2006-09-10 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Stephan Szabo [EMAIL PROTECTED] writes:
  I think if we were going to do this that all the constraint violations for
  unique, not null, check and foreign keys should be handled similarly, so
  we'd probably want something more general than just a way for the ri
  triggers to do this. I don't have a good idea of the right solution for
  that though.
 
 It seems pretty unwieldy to me: it's not hard to imagine a long INSERT
 throwing millions of separate foreign-key errors before it's done, for
 instance.  And then there's the cascading-errors problem, ie, bogus
 reports that happen because some prior step failed ... not least being
 your client crashing and failing to tell you anything about what
 happened because it ran out of memory for the error list.
 
 My advice is to rethink the client code that wants such a behavior.

Well you're still talking about the case of multiple queries deferring all
constraint checks to the end of the transaction. I'm not sure what the
original poster had in mind but that's not how I read it and it wasn't what I
was speculating about.

I was only thinking of situations like:

INSERT INTO TABLE USER (name, department, zipcode) VALUES ('Tom', 0, '0');

We'll fail with an error like violates foreign key constraint
user_department_fkey. It won't say anything about the zipcode also being
invalid.

I sure hate UIs that give you one error at a time so you have to keep fixing
one problem, clicking ok again, only to have yet another error pop up, rinse,
lather, repeat until you finally get all the problems sorted out.

Now I've never actually run into this because as I mention I always treated
the database constraints as assertion checks independent of the application
which usually enforces stricter conditions anyways. But I could see someone
arguing that having two independent sets of code implementing the same set of
conditions is poor.

In any case the same logic that leads to it being desirable to report all the
errors to the user in a UI and not just report them one by one also applies to
the database. I'm not sure it's the most important issue in the world, but it
does seem like a it would be nice feature if it reported all the errors in
the statement, not just the first one it finds.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Foreign keys

2006-09-10 Thread Joshua D. Drake




In any case the same logic that leads to it being desirable to report all the
errors to the user in a UI and not just report them one by one also applies to
the database. I'm not sure it's the most important issue in the world, but it
does seem like a it would be nice feature if it reported all the errors in
the statement, not just the first one it finds.



Seems kind of extraneous to me. I am guessing it would cause yet further 
overhead with our foreign key checks.


My testing shows that the use of foreign keys on high velocity single 
transaction loads, can cause easily a 50% reduction in performance. Why 
add to that? What we need to be doing is finding a way to decrease the 
impact of foreign key checks.


Sincerely,

Joshua D. Drake

--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Foreign keys

2006-09-10 Thread Stephan Szabo

On Sun, 10 Sep 2006, Gregory Stark wrote:

 Tom Lane [EMAIL PROTECTED] writes:

  Stephan Szabo [EMAIL PROTECTED] writes:
   I think if we were going to do this that all the constraint violations for
   unique, not null, check and foreign keys should be handled similarly, so
   we'd probably want something more general than just a way for the ri
   triggers to do this. I don't have a good idea of the right solution for
   that though.
 
  It seems pretty unwieldy to me: it's not hard to imagine a long INSERT
  throwing millions of separate foreign-key errors before it's done, for
  instance.  And then there's the cascading-errors problem, ie, bogus
  reports that happen because some prior step failed ... not least being
  your client crashing and failing to tell you anything about what
  happened because it ran out of memory for the error list.
 
  My advice is to rethink the client code that wants such a behavior.

 Well you're still talking about the case of multiple queries deferring all
 constraint checks to the end of the transaction.

Well, or insert ... select or update or delete. Most deferred
conditions can happen within one statement as well.

 In any case the same logic that leads to it being desirable to report all the
 errors to the user in a UI and not just report them one by one also applies to
 the database. I'm not sure it's the most important issue in the world, but it
 does seem like a it would be nice feature if it reported all the errors in
 the statement, not just the first one it finds.

SQL seems to have a notion of setting the size of the diagnostics area for
a transaction to hold a number of conditions. There are a few odd bits,
for example it's mostly unordered, but the sqlcode returned must match to
the first condition and we presumably want to make sure that if there are
any errors that we return an exception sql code not a completion one.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Tom Lane

Awhile back, someone went around and made sure that all the contrib
modules had uninstall scripts, but most of the new contributions since
then don't have 'em:
adminpack hstore pg_freespacemap pgrowlocks sslinfo
all seem to need one.  Also, I tested a few of the existing uninstall
scripts in passing while fixing the contains/contained mess, and two
out of four were broken (failed to remove everything installed by the
module, as shown by getting errors if you then try to reinstall it).
Seems like this area needs more attention ... anyone want to work on it?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] ISBN/ISSN/ISMN/EAN13 module

2006-09-10 Thread Tom Lane

Jeremy Kronuz [EMAIL PROTECTED] writes:
 ... Tom, were you able to apply the patch without problems 
 ('cause this is one of the first times I post a patch in a mailing list.)

Um, no, doesn't work at all unfortunately:

$ patch --dry ~/ean.patch
patching file isn.c
patch:  malformed patch at line 6: enum isn_type { INVALID, ANY, EAN13, 
ISBN, ISMN, ISSN, UPC };

$ 

It looks like the patch got whitespace-mangled in transit.  With many
mail programs it works better to add a patch as an attachment than
to try to include it directly in the message text.  Also, we usually
ask for diff -c rather than diff -u format ... I for one find it easier
to read.

Please sync the text-conversion issues and resubmit.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger


Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:

No one has mentioned that we page value on disk to match the CPU
alignment.  This is done for efficiency, but is not strictly required.


Well, it is unless you are willing to give up support of non-Intel CPUs;
most other popular chips are strict about alignment, and will fail an
attempt to do a nonaligned fetch.


Intel CPUs are detectable at compile time, right?  Do we use less 
padding in the layout for tables on Intel-based servers?  If not, could we?


I would be particularly interested in the creation of a 24-bit integer 
if it could pack into only three bytes.  (If the layout forces an extra 
byte of padding per integer, the advantage is lost.)


For argument sake, if I created a contrib extension called int3 which 
stored 24-bit integers, in the int3.source file I could write:


CREATE TYPE int3 (
internallength = 3,
input = int3_in,
output = int3_out,
alignment = ALIGNMENT
);

And then have sed replace ALIGNMENT with either char or int4 
depending on the architecture.


Is there a reason this wouldn't work?

For the example schema which started this thread, a contrib extension 
for ascii fields could be written, with types like ascii1, ascii2, 
ascii3, and ascii4, each with implicit upcasts to text.  A contrib for 
int1 and uint1 could be written to store single byte integers in a 
single byte, performing math on them correctly, etc.


mark


The only way we could pack stuff without alignment is to go over to the
idea that memory and disk representations are different --- where in
this case the conversion might just be a memcpy to a known-aligned
location.  The performance costs of that seem pretty daunting, however,
especially when you reflect that simply stepping over a varlena field
would require memcpy'ing its length word to someplace.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Joshua D. Drake


Tom Lane wrote:

Awhile back, someone went around and made sure that all the contrib
modules had uninstall scripts, but most of the new contributions since
then don't have 'em:
adminpack hstore pg_freespacemap pgrowlocks sslinfo
all seem to need one.  Also, I tested a few of the existing uninstall
scripts in passing while fixing the contains/contained mess, and two
out of four were broken (failed to remove everything installed by the
module, as shown by getting errors if you then try to reinstall it).
Seems like this area needs more attention ... anyone want to work on it?


I'll take it. How long do I have?

Joshua D. Drake




regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] ISBN/ISSN/ISMN/EAN13 module

2006-09-10 Thread Jeremy Kronuz


 ... Tom, were you able to apply the patch without problems
 ('cause this is one of the first times I post a patch in a mailing 
list.)


Um, no, doesn't work at all unfortunately:

$ patch --dry ~/ean.patch
patching file isn.c
patch:  malformed patch at line 6: enum isn_type { INVALID, ANY, EAN13, 
ISBN, ISMN, ISSN, UPC };


$

It looks like the patch got whitespace-mangled in transit.  With many
mail programs it works better to add a patch as an attachment than
to try to include it directly in the message text.  Also, we usually
ask for diff -c rather than diff -u format ... I for one find it easier
to read.

Please sync the text-conversion issues and resubmit.



Tom, I'm attaching the new patch now... it's now using the GET_TEXT(), I've 
updated the README and I've used the -c format as you suggested.

Please, let me know how it goes for you.

Regards,
Kronuz.



isn2c.diff
Description: Binary data

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Martijn van Oosterhout

On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote:
 Well, it is unless you are willing to give up support of non-Intel CPUs;
 most other popular chips are strict about alignment, and will fail an
 attempt to do a nonaligned fetch.
 
 Intel CPUs are detectable at compile time, right?  Do we use less 
 padding in the layout for tables on Intel-based servers?  If not, could we?

Intel CPUs may not complain about unaligned reads, they're still
inefficient. Internally it does two aligned reads and rearranges the
bytes. On other architechtures the OS can emulate that but postgres
doesn't use that for obvious reasons.

 For the example schema which started this thread, a contrib extension 
 for ascii fields could be written, with types like ascii1, ascii2, 
 ascii3, and ascii4, each with implicit upcasts to text.  A contrib for 
 int1 and uint1 could be written to store single byte integers in a 
 single byte, performing math on them correctly, etc.

The problem is that for each of those ascii types, to actually use them
they would have to be converted, which would amount to allocating some
memory, copying and adding a length header. At some point you have to
wonder whether you're actually saving anything.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Michael Fuhr

On Sun, Sep 10, 2006 at 12:09:24PM -0700, Joshua D. Drake wrote:
 Tom Lane wrote:
 Awhile back, someone went around and made sure that all the contrib
 modules had uninstall scripts, but most of the new contributions since
 then don't have 'em:
  adminpack hstore pg_freespacemap pgrowlocks sslinfo
 all seem to need one.  Also, I tested a few of the existing uninstall
 scripts in passing while fixing the contains/contained mess, and two
 out of four were broken (failed to remove everything installed by the
 module, as shown by getting errors if you then try to reinstall it).
 Seems like this area needs more attention ... anyone want to work on it?
 
 I'll take it. How long do I have?

For reference, here are some comments about the cleanup work I did
a while back:

http://archives.postgresql.org/pgsql-patches/2006-03/msg00163.php

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Tom Lane

Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Seems like this area needs more attention ... anyone want to work on it?

 I'll take it. How long do I have?

Since it's contrib, I don't think we need to hold you to being done
before beta1.  But the sooner the better of course.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger


Martijn van Oosterhout wrote:

On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote:

Well, it is unless you are willing to give up support of non-Intel CPUs;
most other popular chips are strict about alignment, and will fail an
attempt to do a nonaligned fetch.
Intel CPUs are detectable at compile time, right?  Do we use less 
padding in the layout for tables on Intel-based servers?  If not, could we?


Intel CPUs may not complain about unaligned reads, they're still
inefficient. Internally it does two aligned reads and rearranges the
bytes. On other architechtures the OS can emulate that but postgres
doesn't use that for obvious reasons.


This gets back to the CPU vs. I/O bound issue, right?  Might not some 
people (with heavily taxed disks but lightly taxed CPU) prefer that 
trade-off?


For the example schema which started this thread, a contrib extension 
for ascii fields could be written, with types like ascii1, ascii2, 
ascii3, and ascii4, each with implicit upcasts to text.  A contrib for 
int1 and uint1 could be written to store single byte integers in a 
single byte, performing math on them correctly, etc.


The problem is that for each of those ascii types, to actually use them
they would have to be converted, which would amount to allocating some
memory, copying and adding a length header. At some point you have to
wonder whether you're actually saving anything.

Have a nice day,


I'm not sure what you mean by actually use them.  The types could have 
their own comparator operators.  So you could use them for sorting and 
indexing, and use them in WHERE clauses with these comparisons without 
any conversion to/from text.  I mentioned implicit upcasts to text 
merely to handle other cases, such as using them in a LIKE or ILIKE, or 
concatenation, etc., where the work of providing this functionality for 
each contrib datatype would not really be justified.


I'm not personally as interested in the aforementioned ascii types as I 
am in the int1 and int3 types, but the argument in favor of each is 
about the same.  If a person has a large table made of small data, it 
seems really nuts to have 150% - 400% bloat on that table, when such a 
small amount of work is needed to write the contrib datatypes necessary 
to store the data compactly.  The argument made upthread that a 
quadratic number of conversion operators is necessitated doesn't seem 
right to me, given that each type could upcast to the canonical built in 
type.  (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = 
text, ascii3 = text, etc.)  Operations on data of differing type can be 
done in the canonical type, but the common case for many users would be 
operations between data of the same type, for which no conversion is 
required.


Am I missing something that would prevent this approach from working?  I 
am seriously considering writing these contrib datatypes for use either 
on pgfoundary or the contrib/ subdirectory for the 8.3 release, but am 
looking for advice if I am really off-base.


Thanks,

mark


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] ISBN/ISSN/ISMN/EAN13 module

2006-09-10 Thread Tom Lane

Jeremy Kronuz [EMAIL PROTECTED] writes:
 Tom, I'm attaching the new patch now... it's now using the GET_TEXT(), I've 
 updated the README and I've used the -c format as you suggested.
 Please, let me know how it goes for you.

Looks great, applied.

If you are interested, consider working up a few regression tests to see
if there are any portability issues.  I don't see anything obviously
dangerous here, but you never know.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Joshua D. Drake


Tom Lane wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:

Tom Lane wrote:

Seems like this area needs more attention ... anyone want to work on it?



I'll take it. How long do I have?


Since it's contrib, I don't think we need to hold you to being done
before beta1.  But the sooner the better of course.


O.k., I will start working through it and report at the end of the week.

Joshua D. Drake




regards, tom lane




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane

Mark Dilger [EMAIL PROTECTED] writes:
 ... The argument made upthread that a 
 quadratic number of conversion operators is necessitated doesn't seem 
 right to me, given that each type could upcast to the canonical built in 
 type.  (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = 
 text, ascii3 = text, etc.)

This would work all right for the string-category cases, since TEXT is
the only thing you really care about having them cast to anyway.
It probably won't work all that well for int1/int3, because you really
want them to coerce implicitly to all the wider numeric types.
Otherwise, perfectly sane queries like int8 + int1 fail.

Part of the issue here is that we deliberately keep the parser from
searching for multi-step coercions.  So for example if you only provide
int1-int2 then the existence of up-casts from int2 doesn't help you
use an int1 with anything except int2.

I am not sure whether any problems would be created if you did provide
the full spectrum of up-casts.  I remember having argued that there
would be problems with trying to invent uint2/uint4 types, but that was
a very long time ago, before we had pg_cast and some other changes in
the type resolution rules.  With the current system it might work OK.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Guido Barosio


Let me know if you need an extra pair of eyes.

G.-

On 9/10/06, Joshua D. Drake [EMAIL PROTECTED] wrote:

Tom Lane wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Seems like this area needs more attention ... anyone want to work on it?

 I'll take it. How long do I have?

 Since it's contrib, I don't think we need to hold you to being done
 before beta1.  But the sooner the better of course.

O.k., I will start working through it and report at the end of the week.

Joshua D. Drake



   regards, tom lane



--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
  http://www.commandprompt.com/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




--
Guido Barosio
---
http://www.globant.com
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian


Added to TODO:

* Consider ways of storing rows more compactly on disk

o Store disk pages with no alignment/padding?
o Reorder physical storage order to reduce padding?
o Support a smaller header for short variable-length fields?
o Reduce the row header size?

---

Gregory Stark wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  I think it would be good to see if we can extend the varlena data types
  to support a shorter header for storing short byte values.  Looking at
  the header now we have:
 
 This isn't the first time we've been down that route. There were some
 extensive discussions a while back. I think there were even patches.
 I don't remember why it was eventually rejected. I suspect it simply got too
 complex.
 
 But I think this is a dead-end route. What you're looking at is the number 1
 repeated for *every* record in the table. And what your proposing amounts to
 noticing that the number 4 fits in a byte and doesn't need a whole word to
 store it. Well sure, but you don't even need a byte if it's going to be the
 same for every record in the table.
 
 If someone popped up on the list asking about whether Postgres compressed
 their data efficiently if they stored a column that was identical throughout
 the whole table you would tell them to normalize their data.
 
 -- 
 greg

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Mark Dilger


Tom Lane wrote:

Mark Dilger [EMAIL PROTECTED] writes:
... The argument made upthread that a 
quadratic number of conversion operators is necessitated doesn't seem 
right to me, given that each type could upcast to the canonical built in 
type.  (int1 = smallint, int3 = integer, ascii1 = text, ascii2 = 
text, ascii3 = text, etc.)


This would work all right for the string-category cases, since TEXT is
the only thing you really care about having them cast to anyway.
It probably won't work all that well for int1/int3, because you really
want them to coerce implicitly to all the wider numeric types.
Otherwise, perfectly sane queries like int8 + int1 fail.

Part of the issue here is that we deliberately keep the parser from
searching for multi-step coercions.  So for example if you only provide
int1-int2 then the existence of up-casts from int2 doesn't help you
use an int1 with anything except int2.

I am not sure whether any problems would be created if you did provide
the full spectrum of up-casts.  I remember having argued that there
would be problems with trying to invent uint2/uint4 types, but that was
a very long time ago, before we had pg_cast and some other changes in
the type resolution rules.  With the current system it might work OK.

regards, tom lane


Thanks Tom,

I will try this then.  I won't be proposing to ever put this in core, as 
 the increased code size isn't justified for people who aren't using 
these types (IMHO).  Any further feedback on why this wouldn't work is 
appreciated, as it might save me some time learning on my own.  But 
otherwise I'll post back in a few days when this is finished.


mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
   * Consider ways of storing rows more compactly on disk
   o Support a smaller header for short variable-length fields?

With respect to the business of having different on-disk and in-memory
representations, we have that already today: see TOAST.  It strikes me
that it might be useful to think about solving the problem with a
second generation toast mechanism.  The first generation only worried
about storing large values, but the second generation would also address
the problem of storing small values efficiently.

Or you could think about it as a second generation varlena.  This
mindset would make for a slightly different set of choices about where
the work gets done.  I'm not sure which is better.

Either way, I think it would be interesting to consider

(a) length word either one or two bytes, not four.  You can't need more
than 2 bytes for a datum that fits in a disk page ...

(b) alignment either one or two bytes, not four.  TEXT would be
perfectly happy with 1-byte alignment, but for NUMERIC we might want 2.

I'm inclined to bag the idea of storing the length words separately from
the data proper.  Although it probably would make for some marginal gain
in cache efficiency, I don't see any reasonable way at all to fit it
into the current system structure, whereas either the toast or next
gen varlena approaches seem fairly straightforward.  And having to
track an additional pointer inside the inner loops of heap_form_tuple
and heap_deform_tuple could eat up any performance gain anyway.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [COMMITTERS] pgsql: Back out patch added during 8.2.X development: Allow to_char()

2006-09-10 Thread Tom Lane

[EMAIL PROTECTED] (Bruce Momjian) writes:
 Back out patch added during 8.2.X development:
   Allow to_char() D format specifiers for interval/time.
 It doesn't work, and I doubt it is useful enough to fix (D = day of
 week).

Hm, shouldn't there be a documentation update to go with that?  Or was
it never documented anyway :-(

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Michael Fuhr

On Sun, Sep 10, 2006 at 01:50:39PM -0700, Joshua D. Drake wrote:
 O.k., I will start working through it and report at the end of the week.

I spent a few minutes doing the same tests I did a few months ago
and found problems with dblink and ltree; I'll submit patches for
those.  Tom, do you recall which modules gave you trouble?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Tom Lane

Michael Fuhr [EMAIL PROTECTED] writes:
 I spent a few minutes doing the same tests I did a few months ago
 and found problems with dblink and ltree; I'll submit patches for
 those.  Tom, do you recall which modules gave you trouble?

Um ... hstore and tsearch2, I think.  I fixed those, but was thinking
that other stuff might crawl out from under similar rocks.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  * Consider ways of storing rows more compactly on disk
  o Support a smaller header for short variable-length fields?
 
 With respect to the business of having different on-disk and in-memory
 representations, we have that already today: see TOAST.  It strikes me
 that it might be useful to think about solving the problem with a
 second generation toast mechanism.  The first generation only worried
 about storing large values, but the second generation would also address
 the problem of storing small values efficiently.
 
 Or you could think about it as a second generation varlena.  This
 mindset would make for a slightly different set of choices about where
 the work gets done.  I'm not sure which is better.
 
 Either way, I think it would be interesting to consider
 
 (a) length word either one or two bytes, not four.  You can't need more
 than 2 bytes for a datum that fits in a disk page ...

That is an interesting observation, though could compressed inline
values exceed two bytes?

 (b) alignment either one or two bytes, not four.  TEXT would be
 perfectly happy with 1-byte alignment, but for NUMERIC we might want 2.
 
 I'm inclined to bag the idea of storing the length words separately from
 the data proper.  Although it probably would make for some marginal gain
 in cache efficiency, I don't see any reasonable way at all to fit it
 into the current system structure, whereas either the toast or next
 gen varlena approaches seem fairly straightforward.  And having to
 track an additional pointer inside the inner loops of heap_form_tuple
 and heap_deform_tuple could eat up any performance gain anyway.

Good point.  How do we do it now?  I assume we store just the fixed-size
toast pointer length in the heap attribute, not the toast length.

Why haven't we investigated shrinking the varlena header before?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] contrib/xml2 and PG_MODULE_MAGIC

2006-09-10 Thread Michael Fuhr

contrib/xml2 currently has PG_MODULE_MAGIC in xslt_proc.c, which
results in a runtime error on systems that built the module without
support for libxslt per the comments in the Makefile.  Should
PG_MODULE_MAGIC be in xpath.c instead?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [COMMITTERS] pgsql: Back out patch added during

2006-09-10 Thread Bruce Momjian

Tom Lane wrote:
 [EMAIL PROTECTED] (Bruce Momjian) writes:
  Back out patch added during 8.2.X development:
  Allow to_char() D format specifiers for interval/time.
  It doesn't work, and I doubt it is useful enough to fix (D = day of
  week).
 
 Hm, shouldn't there be a documentation update to go with that?  Or was
 it never documented anyway :-(

Not documented.  Using D for an interval/time is more of a corner case
/ don't do that.  ;-)

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Michael Fuhr

On Sun, Sep 10, 2006 at 07:38:24PM -0400, Tom Lane wrote:
 Michael Fuhr [EMAIL PROTECTED] writes:
  I spent a few minutes doing the same tests I did a few months ago
  and found problems with dblink and ltree; I'll submit patches for
  those.  Tom, do you recall which modules gave you trouble?
 
 Um ... hstore and tsearch2, I think.  I fixed those, but was thinking
 that other stuff might crawl out from under similar rocks.

I don't see an uninstall script for hstore; that one needs to be
written.  The latest untsearch2.sql looks good in my tests.

Should untsearch2.sql be renamed to uninstall_tsearch2.sql to be
consistent with all the other uninstall scripts?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Lock partitions

2006-09-10 Thread Bruce Momjian

I see this in the CVS commits for 8.2.  Did we determine the proper
number of lock partitions?  Should it be based on the number of buffers
or concurrent sessions allowed?

 Divide the lock manager's shared state into 'partitions', so as to
 reduce contention for the former single LockMgrLock.  Per my recent
 proposal.  I set it up for 16 partitions, but on a pgbench test this
 gives only a marginal further improvement over 4 partitions --- we need
 to test more scenarios to choose the number of partitions.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Either way, I think it would be interesting to consider
 
 (a) length word either one or two bytes, not four.  You can't need more
 than 2 bytes for a datum that fits in a disk page ...

 That is an interesting observation, though could compressed inline
 values exceed two bytes?

After expansion, perhaps, but it's the on-disk footprint that concerns
us here.

I thought a bit more about this and came up with a zeroth-order sketch:

The length word for an on-disk datum could be either 1 or 2 bytes;
in the 2-byte case we'd need to be prepared to fetch the bytes
separately to avoid alignment issues.  The high bits of the first byte
say what's up:

* First two bits 00: 2-byte length word, uncompressed inline data
follows.  This allows a maximum on-disk size of 16K for an uncompressed
datum, so we lose nothing at all for standard-size disk pages and not
much for 32K pages (remember the toaster will try to compress any tuple
exceeding 1/4 page anyway ... this just makes it mandatory).

* First two bits 01: 2-byte length word, compressed inline data
follows.  Again, hard limit of 16K, so if your data exceeds that you
have to push it out to the toast table.  Again, this policy costs zero
for standard size disk pages and not much for 32K pages.

* First two bits 10: 1-byte length word, zero to 62 bytes of
uncompressed inline data follows.  This is the case that wins for short
values.

* First two bits 11: 1-byte length word, pointer to out-of-line toast
data follows.  We may as well let the low 6 bits of the length word be
the size of the toast pointer, same as it works now.  Since the toast
pointer is not guaranteed aligned anymore, we'd have to memcpy it
somewhere before using it ... but compared to the other costs of
fetching a toast value, that's surely down in the noise.  The
distinction between compressed and uncompressed toast data would need to
be indicated in the body of the toast pointer, not in the length word as
today, but nobody outside of tuptoaster.c would care.

Notice that heap_deform_tuple only sees 2 cases here: high bit 0 means
2-byte length word, high bit 1 means 1-byte.  It doesn't care whether
the data is compressed or toasted, same as today.

There are other ways we could divvy up the bit assignments of course.
The main issue is keeping track of whether any given Datum is in this
compressed-for-disk format or in the uncompressed 4-byte-length-word
format.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] contrib uninstall scripts need some love

2006-09-10 Thread Tom Lane

Michael Fuhr [EMAIL PROTECTED] writes:
 Should untsearch2.sql be renamed to uninstall_tsearch2.sql to be
 consistent with all the other uninstall scripts?

Yeah, that and lo_drop are outliers that probably ought to be renamed.
Offhand I don't see any serious compatibility objection --- who does
module removals except by hand? --- but if anyone sees a problem
speak now ...

I'm also wondering why the heck contrib/lo installs lo_test.sql.
That ought to be a regression test anyway, and we don't install
regression scripts.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Lock partitions

2006-09-10 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I see this in the CVS commits for 8.2.  Did we determine the proper
 number of lock partitions?  Should it be based on the number of buffers
 or concurrent sessions allowed?

No.  NUM_LOCK_PARTITIONS needs to be a compile-time constant for a
number of reasons, and there is absolutely zero evidence to justify
making any effort (and spending any cycles) on a variable value.

It would be nice to see some results from the OSDL tests with, say, 4,
8, and 16 lock partitions before we forget about the point though.
Anybody know whether OSDL is in a position to run tests for us?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Foreign keys

2006-09-10 Thread Kevin Brown

Joshua D. Drake wrote:
 
 In any case the same logic that leads to it being desirable to report all 
 the
 errors to the user in a UI and not just report them one by one also 
 applies to
 the database. I'm not sure it's the most important issue in the world, but 
 it
 does seem like a it would be nice feature if it reported all the errors 
 in
 the statement, not just the first one it finds.
 
 
 Seems kind of extraneous to me. I am guessing it would cause yet further 
 overhead with our foreign key checks.

But in this case, it would be (or should be) overhead only in the case
of failure.  In the case of success, all the constraints are checked
anyway -- they just succeed.

I would expect that the number of applications for which a constraint
violation is the norm and not the exception is very small.


But Tom's concern is a valid one.  I expect a reasonable compromise
would be to record and show the errors for only the non-deferred
constraints in the currently executing statement, because after that
point the transaction is in an error state anyway and thus can't
continue without a rollback to a savepoint.  It probably wouldn't make
sense to evaluate the deferred constraints within the erroring
statement anyway -- they're deferred, which by definition means they
don't get evaluated until commit, so evaluating them at failure time
could easily show errors that are only there because subsequent
statements never got executed.

As for the deferred constraints, it might be reasonable to show errors
only up to some limit (controlled by a GUC, perhaps), with the default
limit being 1, which is what we have now.  Otherwise you run the risk
of throwing millions of errors, which is surely not desirable.  The
downside to this is until you've hit the limit, you have to evaluate
*all* the deferred constraints, which could take a while, whereas the
current setup will return immediately upon encountering the first
constraint error.



 My testing shows that the use of foreign keys on high velocity single 
 transaction loads, can cause easily a 50% reduction in performance. Why 
 add to that? What we need to be doing is finding a way to decrease the 
 impact of foreign key checks.

I definitely agree here, but this should be independent of how foreign
key failures are handled once they're detected.  In other words, what
you're experiencing is the perfomance hit that comes from evaluating
the constraints, not from reporting the errors afterwards.


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  Either way, I think it would be interesting to consider
  
  (a) length word either one or two bytes, not four.  You can't need more
  than 2 bytes for a datum that fits in a disk page ...
 
  That is an interesting observation, though could compressed inline
  values exceed two bytes?
 
 After expansion, perhaps, but it's the on-disk footprint that concerns
 us here.

I'm a bit confused by this and how it would be handled in your sketch. I
assumed we needed a bit pattern dedicated to 4-byte length headers because
even though it would never occur on disk it would be necessary to for the
uncompressed and/or detoasted data.

In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted
to larger than 16k?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian

Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Bruce Momjian [EMAIL PROTECTED] writes:
   Tom Lane wrote:
   Either way, I think it would be interesting to consider
   
   (a) length word either one or two bytes, not four.  You can't need more
   than 2 bytes for a datum that fits in a disk page ...
  
   That is an interesting observation, though could compressed inline
   values exceed two bytes?
  
  After expansion, perhaps, but it's the on-disk footprint that concerns
  us here.
 
 I'm a bit confused by this and how it would be handled in your sketch. I
 assumed we needed a bit pattern dedicated to 4-byte length headers because
 even though it would never occur on disk it would be necessary to for the
 uncompressed and/or detoasted data.

Well, we have to expand the TOAST anyway in memory, so when we do that
we already give it the right length header.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Tom Lane

Gregory Stark [EMAIL PROTECTED] writes:
 I'm a bit confused by this and how it would be handled in your sketch. I
 assumed we needed a bit pattern dedicated to 4-byte length headers because
 even though it would never occur on disk it would be necessary to for the
 uncompressed and/or detoasted data.

 In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted
 to larger than 16k?

I'm imagining that it would give you the same old uncompressed in-memory
representation as it does now, ie, 4-byte length word and uncompressed
data.

The weak spot of the scheme is that it assumes different, incompatible
in-memory and on-disk representations.  This seems to require either
(a) coercing values to in-memory form before they ever get handed to any
datatype manipulation function, or (b) thinking of some magic way to
pass out-of-band info about the contents of the datum.  (b) is the same
stumbling block we have in connection with making typmod available to
datatype manipulation functions.  I don't want to reject (b) entirely,
but it seems to require some pretty major structural changes.

OTOH (a) is not very pleasant either, and so what would be nice is if
we could tell by inspection of the Datum alone which format it's in.

After further thought I have an alternate proposal that does that,
but it's got its own disadvantage: it requires storing uncompressed
4-byte length words in big-endian byte order everywhere.  This might
be a showstopper (does anyone know the cost of ntohl() on modern
Intel CPUs?), but if it's not then I see things working like this:

* If high order bit of datum's first byte is 0, then it's an
uncompressed datum in what's essentially the same as our current
in-memory format except that the 4-byte length word must be big-endian
(to ensure that the leading bit can be kept zero).  In particular this
format will be aligned on 4- or 8-byte boundary as called for by the
datatype definition.

* If high order bit of first byte is 1, then it's some compressed
variant.  I'd propose divvying up the code space like this:

* 0xxx  uncompressed 4-byte length word as stated above
* 10xx  1-byte length word, up to 62 bytes of data
* 110x  2-byte length word, uncompressed inline data
* 1110  2-byte length word, compressed inline data
*   1-byte length word, out-of-line TOAST pointer

This limits us to 8K uncompressed or 4K compressed inline data without
toasting, which is slightly annoying but probably still an insignificant
limitation.  It also means more distinct cases for the heap_deform_tuple
inner loop to think about, which might be a problem.

Since the compressed forms would not be aligned to any boundary,
there's an important special case here: how can heap_deform_tuple tell
whether the next field is compressed or not?  The answer is that we'll
have to require pad bytes between fields to be zero.  (They already are
zeroed by heap_form_tuple, but now it'd be a requirement.)  So the
algorithm for decoding a non-null field is:

* if looking at a byte with high bit 0, then we are either
on the start of an uncompressed field, or on a pad byte before
such a field.  Advance to the declared alignment boundary for
the datatype, read a 4-byte length word, and proceed.

* if looking at a byte with high bit 1, then we are at the
start of a compressed field (which will never have any preceding
pad bytes).  Decode length as per rules above.

The good thing about this approach is that it requires zero changes to
fundamental system structure.  The pack/unpack rules in heap_form_tuple
and heap_deform_tuple change a bit, and the mechanics of
PG_DETOAST_DATUM change, but a Datum is still just a pointer and you
can always tell what you've got by examining the pointed-to data.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Bruce Momjian

Tom Lane wrote:
 After further thought I have an alternate proposal that does that,
 but it's got its own disadvantage: it requires storing uncompressed
 4-byte length words in big-endian byte order everywhere.  This might
 be a showstopper (does anyone know the cost of ntohl() on modern
 Intel CPUs?), but if it's not then I see things working like this:
 
 * If high order bit of datum's first byte is 0, then it's an
 uncompressed datum in what's essentially the same as our current
 in-memory format except that the 4-byte length word must be big-endian
 (to ensure that the leading bit can be kept zero).  In particular this
 format will be aligned on 4- or 8-byte boundary as called for by the
 datatype definition.
 
 * If high order bit of first byte is 1, then it's some compressed
 variant.  I'd propose divvying up the code space like this:
 
   * 0xxx  uncompressed 4-byte length word as stated above
   * 10xx  1-byte length word, up to 62 bytes of data
   * 110x  2-byte length word, uncompressed inline data
   * 1110  2-byte length word, compressed inline data
   *   1-byte length word, out-of-line TOAST pointer

Great.  I assumed we would have to use a variable-length header, as you
described.  I don't think ntohl() is going to be a problem.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pgsql: Install a cleaner solution to the AIX libpq linking problem, as

2006-09-10 Thread Christopher Browne

After a long battle with technology, [EMAIL PROTECTED] (Tom Lane), an 
earthling, wrote:
 Log Message:
 ---
 Install a cleaner solution to the AIX libpq linking problem, as per
 an earlier discussion.  Centralize assumptions about what libpq depends
 on in one place in Makefile.global.  I am unconvinced that this list
 is complete, but since ecpg seems to have gotten along with just these
 entries, we'll try it this way and see what happens.

Looks good thus far...  It'll be hitting the buildfarm shortly...

I had been getting edgy about the absence of commits on this; glad to
see this.
-- 
(reverse (concatenate 'string moc.liamg @ enworbbc))
http://cbbrowne.com/info/linuxdistributions.html
Rules  of  the Evil  Overlord  #59. I  will  never  build a  sentient
computer smarter than I am. http://www.eviloverlord.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Kevin Brown

Tom Lane wrote:
 (does anyone know the cost of ntohl() on modern
 Intel CPUs?)

I wrote a simple test program to determine this:

#include arpa/inet.h

int main (int argc, char *argv[]) {
unsigned long i;
uint32_t a;

a = 0;
for (i = 0 ; i  40L ; ++i) {
#ifdef CALL_NTOHL
a = ntohl(i);
#endif
}
return a;
}


I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit
mode, another one with the same processor running in 32-bit mode, a a
third running a Pentium 4 1.5 GHz processor, and a fourth running a
pair of 2.8 GHz Xeons in hyperthreading mode.

I compiled the test program on the 32-bit systems with the -std=c9x
option so that the constant would be treated as unsigned.  Other than
that, the compilation method I used was identical: no optimization,
since it would skip the loop entirely in the version without the
ntohl() call.  I compiled it both with and without defining
CALL_NTOHL, and measured the difference in billed CPU seconds.

Based on the above, on both Athlon 64 systems, each ntohl() invocation
and assignment takes 1.04 nanoseconds to complete (I presume the
assignment is to a register, but I'd have to examine the assembly to
know for sure).  On the 1.5 GHz P4 system, each iteration takes 8.49
nanoseconds.  And on the 2.8 GHz Xeon system, each iteration takes
5.01 nanoseconds.


That seems reasonably fast to me...




-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fixed length data types issue

2006-09-10 Thread Jeremy Drake

On Sun, 10 Sep 2006, Kevin Brown wrote:

 Tom Lane wrote:
  (does anyone know the cost of ntohl() on modern
  Intel CPUs?)

 I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit
 mode, another one with the same processor running in 32-bit mode, a a
 third running a Pentium 4 1.5 GHz processor, and a fourth running a
 pair of 2.8 GHz Xeons in hyperthreading mode.

 I compiled the test program on the 32-bit systems with the -std=c9x
 option so that the constant would be treated as unsigned.  Other than
 that, the compilation method I used was identical: no optimization,
 since it would skip the loop entirely in the version without the
 ntohl() call.  I compiled it both with and without defining
 CALL_NTOHL, and measured the difference in billed CPU seconds.

 Based on the above, on both Athlon 64 systems, each ntohl() invocation
 and assignment takes 1.04 nanoseconds to complete (I presume the
 assignment is to a register, but I'd have to examine the assembly to
 know for sure).  On the 1.5 GHz P4 system, each iteration takes 8.49
 nanoseconds.  And on the 2.8 GHz Xeon system, each iteration takes
 5.01 nanoseconds.

Of course, that depends on the particular OS and variant as well.  IIRC,
at some point an instruction was added to x86 instruction set to do byte
swapping.

This is from /usr/include/netinet/in.h on a gentoo linux box with glibc
2.3

#ifdef __OPTIMIZE__
/* We can optimize calls to the conversion functions.  Either nothing has
   to be done or we are using directly the byte-swapping functions which
   often can be inlined.  */
# if __BYTE_ORDER == __BIG_ENDIAN
/* The host byte order is the same as network byte order,
   so these functions are all just identity.  */
# define ntohl(x)   (x)
# define ntohs(x)   (x)
# define htonl(x)   (x)
# define htons(x)   (x)
# else
#  if __BYTE_ORDER == __LITTLE_ENDIAN
#   define ntohl(x) __bswap_32 (x)
#   define ntohs(x) __bswap_16 (x)
#   define htonl(x) __bswap_32 (x)
#   define htons(x) __bswap_16 (x)
#  endif
# endif
#endif


And from bits/byteswap.h

/* To swap the bytes in a word the i486 processors and up provide the
   `bswap' opcode.  On i386 we have to use three instructions.  */
#  if !defined __i486__  !defined __pentium__  !defined __pentiumpro__ \
   !defined __pentium4__
#   define __bswap_32(x)  \
 (__extension__   \
  ({ register unsigned int __v, __x = (x);\
 if (__builtin_constant_p (__x))  \
   __v = __bswap_constant_32 (__x);   \
 else \
   __asm__ (rorw $8, %w0;   \
rorl $16, %0;   \
rorw $8, %w0\
: =r (__v)  \
: 0 (__x)   \
: cc);  \
 __v; }))
#  else
#   define __bswap_32(x) \
 (__extension__   \
  ({ register unsigned int __v, __x = (x);\
 if (__builtin_constant_p (__x))  \
   __v = __bswap_constant_32 (__x);   \
 else \
   __asm__ (bswap %0 : =r (__v) : 0 (__x)); \
 __v; }))
#  endif


/me searches around his hard drive for the ia32 developers reference

BSWAP
Opcode  Instruction Description
0F C8+rdBSWAP r32   Reverse the byte order of a 32-bit register

...

The BSWAP instruction is not supported on IA-32 processors earlier than
the Intel486 processor family. ...


I have read some odd stuff about instructions like these.  Apparently the
fact that this is a prefixed instruction (the 0F byte at the beginning)
costs an extra clock cycle, so though this instruction should take 1
cycle, it ends up taking 2.  I am unclear whether or not this is rectified
in later pentium chips.

So to answer the question about how much ntohl costs on recent Intel
boxes, a properly optimized build with a friendly libc like I quoted
should be able to do it in 2 cycles.


-- 
In Ohio, if you ignore an orator on Decoration day to such an extent as
to publicly play croquet or pitch horseshoes within one mile of the
speaker's stand, you can be fined $25.00.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] contrib uninstall script fixes

2006-09-10 Thread Tom Lane

Michael Fuhr [EMAIL PROTECTED] writes:
 The uninstall scripts for dblink and ltree were missing some
 functions.

Applied, thanks.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

45 matches

Mail list logo