date:20060908

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread mark

On Sat, Sep 09, 2006 at 01:03:24AM -0400, Jan de Visser wrote:
> On Saturday 09 September 2006 00:42, [EMAIL PROTECTED] wrote:
> > But if the input isn't 32 hexadecimal characters - I don't see how
> > it fits the UUID/GUID type.

> Again, it wasn't about that particular *value* (which, as I
> concurred, is not a [GU]UID). It was about the fact that different
> tools spit out stuff in different formats, and that it would suck
> royally if you would have to convert because Gevik didn't think of
> the particular format of your particular tool.

> But I guess your suggestion of just ignoring punctuation would work
> just fine.

I don't think so. If it isn't 128 bits - and you want to fit it into
128 bits, it means padding. Where should the padding go? As application
specific, it is up to the application to convert.

I don't agree that a GUID/UUID type should do much more than allow a
GUID/UUID to fit into it. It is not a type that can be used to
represent any UID you happen to have. With a 64 bit UID or 96 bit UID
- why would you want to store it in 128-bits anyways? This would be
a waste of space.

For the same reason I wouldn't want to store an MD5SUM into a UUID
type. It's an abuse of the type. If the type was called hexstring32
or something like that, sure.

Philosophy is involved. Making UUID understand every possible format,
and know how to pad each format that doesn't use 32 hexadecimal
characters is outside the scope of this discussion, and guarantees
that it will never be implemented. There will always be one more
format somebody wants to cram into it.

UUID is not a UID. UUID is a 128-bit number with a fairly specific
format that includes a version number, and each of the bit segments
within it are used to signify a different type of data, based upon the
version number.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Alvaro Herrera

Jan de Visser wrote:
> On Saturday 09 September 2006 00:42, [EMAIL PROTECTED] wrote:
> > But if the input isn't 32 hexadecimal characters - I don't see how
> > it fits the UUID/GUID type.
> 
> Again, it wasn't about that particular *value* (which, as I concurred, is not 
> a [GU]UID). It was about the fact that different tools spit out stuff in 
> different formats, and that it would suck royally if you would have to 
> convert because Gevik didn't think of the particular format of your 
> particular tool. 
> 
> But I guess your suggestion of just ignoring punctuation would work just fine.

Would it?  The output you showed was

3b732da7:10d9029b3eb:-8000

What to do with the :-8000 part?  Do you discard the whole :-8000?  Do
you discard the :- and keep the 8000?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Jan de Visser

On Saturday 09 September 2006 00:42, [EMAIL PROTECTED] wrote:
> But if the input isn't 32 hexadecimal characters - I don't see how
> it fits the UUID/GUID type.

Again, it wasn't about that particular *value* (which, as I concurred, is not 
a [GU]UID). It was about the fact that different tools spit out stuff in 
different formats, and that it would suck royally if you would have to 
convert because Gevik didn't think of the particular format of your 
particular tool. 

But I guess your suggestion of just ignoring punctuation would work just fine.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 10:49:21PM -0400, Jan de Visser wrote:
> On Friday 08 September 2006 21:34, [EMAIL PROTECTED] wrote:
> > On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote:
> > > On Friday 08 September 2006 15:18, Gevik Babakhani wrote:
> > > > 2a) Three input formats are supported.
> > > > example:
> > > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce');
> > > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}');
> > > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce');
> > >
> > > Please extend your list. java's RMI package generates UIDs with the
> > > following format:
> > >
> > > [head order 21:19]$ bsh.sh
> > > BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED])
> > > bsh % print(new java.rmi.server.UID());
> > > 3b732da7:10d9029b3eb:-8000
> > > bsh %
> > That doesn't look like a UUID/GUID - unless it trims leading zeroes?
> > UUID/GUID = 128-bits = 32 hexadecimal characters.

> I know, it's a UID. Not a GUID. But many people (including myself)
> use it as a base to generate GUIDs. And that's not really the
> point. The point is that I'm afraid he's going to restrict himself
> to much.

I'm not sure what it would do with this input though. Where do the
bits go in a fixed 128-bit field? I think with and with dashes is
fine. { and } starts to get into the unnecessary. At this point, it
may as well ignore all punctuation characters, which stays fine.
Less error checking required.

But if the input isn't 32 hexadecimal characters - I don't see how
it fits the UUID/GUID type.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module

2006-09-08 Thread Tom Lane

"Jeremy Kronuz" <[EMAIL PROTECTED]> writes:
>> I'm reviewing this for addition to contrib/ now.  I notice that there is
>> no clear license statement.  Is it OK to put the following into the
>> README file?

> Tom, yes. Also, I just put copyright in the files that contain a significant 
> amount of code written by me. (i.e. isn.h and isn.c).

Excellent, thanks.

> Also, did you get the 'isn-1_0_beta_20060924' version from my ftp? 'cause 
> thats the last one I updated (2006-09-24).  It's at: 
> ftp://ftp.kronuz.com/pub/programming/isn-1_0_beta_20060924.tar.bz2 (just in 
> case)

You had submitted two slightly different versions to the mailing list
awhile back.  I took the one that seemed to have later file dates and
did some fixes/editorializations on that.  Please look at what I've
just committed to PG CVS and see if you want to make any adjustments
--- if so, submit a patch through the usual pgsql-patches channel.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module

2006-09-08 Thread Jeremy Kronuz



"Jeremy Kronuz" <[EMAIL PROTECTED]> writes:
> Hello again,This is an usable version of my EAN13/UPC/ISBN module.

I'm reviewing this for addition to contrib/ now.  I notice that there is
no clear license statement.  Is it OK to put the following into the
README file?


EAN13 - UPC - ISBN (books) - ISMN (music) - ISSN (serials)
--

Copyright Germán Méndez Bravo (Kronuz), 2004 - 2006
This module is released under the same BSD license as the rest of 
PostgreSQL.


The information to implement this module was collected through ...


Tom, yes. Also, I just put copyright in the files that contain a significant 
amount of code written by me. (i.e. isn.h and isn.c).
If you check the other files, I just put the information was recompiled from 
several sources and I also included the websites from where the information 
came.


I suppose I just forgot to put the BSD license legend there in the code, so 
please add the line you suggested about the PostgreSQL BSD license if you 
want.


Also, did you get the 'isn-1_0_beta_20060924' version from my ftp? 'cause 
thats the last one I updated (2006-09-24).  It's at: 
ftp://ftp.kronuz.com/pub/programming/isn-1_0_beta_20060924.tar.bz2 (just in 
case)


Kronuz.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries

2006-09-08 Thread Tom Lane

"Albe Laurenz" <[EMAIL PROTECTED]> writes:
> Here is a new patch that replaces the previous one; it adds two
> macros LDAP_LIBS_FE and LDAP_LIBS_BE for frontend and backend,
> respectively.

> I did not only add them to the Makefile for interfaces/libpq,
> but also everywhere something is linked against libpq in case
> somebody links static.

Applied, but without that last part.  It builds OK for me on Darwin,
which is moderately picky about that sort of thing, but someone should
try AIX.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Jan de Visser

On Friday 08 September 2006 21:34, [EMAIL PROTECTED] wrote:
> On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote:
> > On Friday 08 September 2006 15:18, Gevik Babakhani wrote:
> > > 2a) Three input formats are supported.
> > > example:
> > > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce');
> > > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}');
> > > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce');
> >
> > Please extend your list. java's RMI package generates UIDs with the
> > following format:
> >
> > [head order 21:19]$ bsh.sh
> > BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED])
> > bsh % print(new java.rmi.server.UID());
> > 3b732da7:10d9029b3eb:-8000
> > bsh %
>
> That doesn't look like a UUID/GUID - unless it trims leading zeroes?
>
> UUID/GUID = 128-bits = 32 hexadecimal characters.

I know, it's a UID. Not a GUID. But many people (including myself) use it as a 
base to generate GUIDs. And that's not really the point. The point is that 
I'm afraid he's going to restrict himself to much.

>
> Cheers,
> mark

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Build v8.1.4 with VC++ 2005

2006-09-08 Thread Merlin Moncure


On 9/8/06, Xiaofeng Zhao <[EMAIL PROTECTED]> wrote:



I tried to build postgresql from src using vc++ 2005 but was not able to go
far before hitting errors related to inline function in the "wchar.c" file.

The source file I downloaded is "postgresql-8.1.4.tar.gz".  I tried to build
it by runing "nmake -f win32.mak"  in the "src" directory.

Can v8.1.4 be build using VC++ 2005?  If so, what is the build procedure and
if any patches is required.


there is a patch in the queue which has a decent chance of making it
into 8.2.  from the stock  8.1 sources it is impossible.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Build v8.1.4 with VC++ 2005

2006-09-08 Thread Xiaofeng Zhao




I tried to build postgresql from src using vc++ 
2005 but was not able to go far before hitting errors related to inline function 
in the "wchar.c" file.
 
The source file I downloaded is 
"postgresql-8.1.4.tar.gz".  I tried to build it by runing "nmake -f 
win32.mak"  in the "src" directory.
 
Can v8.1.4 be build using VC++ 2005?  If 
so, what is the build procedure and if any patches is required.  

 
Thanks!

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 09:24:19PM -0400, Jan de Visser wrote:
> On Friday 08 September 2006 15:18, Gevik Babakhani wrote:
> > 2a) Three input formats are supported.
> > example:
> > insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce');
> > insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}');
> > insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce');
> Please extend your list. java's RMI package generates UIDs with the
> following format:

> [head order 21:19]$ bsh.sh
> BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED])
> bsh % print(new java.rmi.server.UID());
> 3b732da7:10d9029b3eb:-8000
> bsh %  

That doesn't look like a UUID/GUID - unless it trims leading zeroes?

UUID/GUID = 128-bits = 32 hexadecimal characters.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Jan de Visser

On Friday 08 September 2006 15:18, Gevik Babakhani wrote:
> 2a) Three input formats are supported.
> example:
> insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce');
> insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}');
> insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce');

Please extend your list. java's RMI package generates UIDs with the following 
format:

[head order 21:19]$ bsh.sh
BeanShell 1.3.0 - by Pat Niemeyer ([EMAIL PROTECTED])
bsh % print(new java.rmi.server.UID());
3b732da7:10d9029b3eb:-8000
bsh %  

So forms that use colons instead of dashes seem appropriate.

Or better still, make it configurable.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread elein

On Fri, Sep 08, 2006 at 05:20:18PM -0400, Tom Lane wrote:
> elein <[EMAIL PROTECTED]> writes:
> > I think what you are saying is that the domain checking (proposed constraint
> > existence checking) would need to be done in more places and I'm not sure I 
> > understand this.
> 
> What I'm complaining about is the need to search the catalogs to see if
> a datatype has constraints.  At the moment we need to do that only
> for operations yielding domain types.  Your proposal appears to require
> that it be done for *every* operation on *every* datatype, right down
> to int4 and bool (which at the very least creates some interesting
> circularity issues).  I'm not willing to accept that much overhead on
> the strength of what is frankly a pretty weak case.  If you want a
> constraint, what's wrong with putting a domain on your base type to
> enforce it?
> 
> > And checking for constraint <> NULL should be equivalent to the 
> > current check *typtype != 'd'.
> 
> Not without an amazingly complicated substructure to the "constraint"
> column (multiple entries, names as well as expressions, etc).  At the
> very least that's a violation of relational theory, and I'm not sure how
> we're going to handle dependencies of the constraint expressions at all
> if they aren't separate catalog entries.
> 
>   regards, tom lane

I'm seeing the constraint column as an attribute of the type. It would
at least flag existence of a constraint on a type.  This is necessary for
domains as they work now (but we check typtype).  This would also be a 
catalog change, i.e. non-trivial.  It would link types to constraints only 
and replace the 'd' value of typtype.  Perhaps my ignorance is showing 
and I'm missing something.  How you have the ability to select the constraint 
from the domain name now is fuzzy to me.  But I'm trying to move "domains" 
from a type of type to just an attribute of type with the goal of simplifying
the behaviour.

The single value result of an expression, on assignment is the only candidate
for constraint checking.  Other expression evaluation would behave as is
until assignment to a final result.  Theoretically, you would not have to add 
constraints other than those defined now.  And this should already be working
though the implementation would change slightly with my proposal.

elein
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 04:49:22PM -0400, Andrew Dunstan wrote:
> [EMAIL PROTECTED] wrote:
> >Only ASCII values store more space efficiently in UTF-8. All values
> >over 127 store more space efficiently using UTF-16. 
> This second statement is demonstrably not true. Only values above 0x07ff 
> require more than 2 bytes in UTF-8. All chars up to that point are 
> stored in UTF-8 with greater or equal efficiency than that of UTF-16.  
> See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html

You are correct - I should have said "All values over 127 store
at least as space efficiently using UTF-16 as UTF-8."

>From the ICU page: "Most of the time, the memory throughput of the
hard drive and RAM is the main performance constraint. UTF-8 is 50%
smaller than UTF-16 or US-ASCII, but UTF-8 is 50% larger than UTF-16
or East and South Asian scripts. There is no memory difference for
Latin extensions, Greek, Cyrillic, Hebrew, and Arabic.

For processing Unicode data, UTF-16 is much easier to handle. You get
a choice between either one or two units per character, not a choice
among four lengths. UTF-16 also does not have illegal 16-bit unit
values, while you might want to check or illegal bytes in UTF-8.
Incomplete character sequences in UTF-16 are less important and more
benign. If you want to quickly convert small strings between the
different UTF encodings or get a UChar32 value, you can use the macros
provided in utf.h and ..."

I didn't think of the iterators for simple uses.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Mapping arbitriary and heirachical XML to tuple

2006-09-08 Thread Mark Woodward

I have a system by which I store complex data in PostgreSQL as an XML
string. I have a simple function that can return a single value.

I would like to return sets and sets of rows from the data. This is not a
huge problem, as I've written a few of these functions. The question I'd
like to put out there, is how would you represent heirarchical data as:


  
0
1.00
2.5
3.5

5
10

  
  
1
1.10
2.2
3.53

3
9

  



The biggest problem with XML is storing data is easy, getting it back out
in a sane way is less so. How would you guys think to represent this?
(Obviously, this is a bogus example, real life would be much worse!)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread Tom Lane

elein <[EMAIL PROTECTED]> writes:
> I think what you are saying is that the domain checking (proposed constraint
> existence checking) would need to be done in more places and I'm not sure I 
> understand this.

What I'm complaining about is the need to search the catalogs to see if
a datatype has constraints.  At the moment we need to do that only
for operations yielding domain types.  Your proposal appears to require
that it be done for *every* operation on *every* datatype, right down
to int4 and bool (which at the very least creates some interesting
circularity issues).  I'm not willing to accept that much overhead on
the strength of what is frankly a pretty weak case.  If you want a
constraint, what's wrong with putting a domain on your base type to
enforce it?

> And checking for constraint <> NULL should be equivalent to the 
> current check *typtype != 'd'.

Not without an amazingly complicated substructure to the "constraint"
column (multiple entries, names as well as expressions, etc).  At the
very least that's a violation of relational theory, and I'm not sure how
we're going to handle dependencies of the constraint expressions at all
if they aren't separate catalog entries.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 04:49:49PM -0400, Alvaro Herrera wrote:
> Actually he muttered something about iterators, and not needing to
> convert anything.

Yes, many of the useful functions accept strings in two forms, either
UTF-16 or CharacterIterators. The iterator pretty much only has to know
how to step forward through the string and return the code point at
each point.

Here's the docs for C++ class, but there's a equivalent C interface.

http://icu.sourceforge.net/apiref/icu4c/classCharacterIterator.html#_details

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread elein

On Fri, Sep 08, 2006 at 03:47:23PM -0400, Tom Lane wrote:
> elein <[EMAIL PROTECTED]> writes:
> > a) if subtypes/domains can have constraints then the model should
> >not be different for domains only but for all types.  Constraint 
> >checking would only
> >occur at check constraint points--and there for any type.  You
> >already check for the existance of a domain.  Change that test
> >to the existence of a constraint only and eliminate domain specific
> >code.
> 
> Au contraire, the test whether a constraint actually exists occurs at
> runtime, not at the time we check for domain-ness.  Your proposal would
> force such checks to be introduced into every single expression
> evaluation.  It's not feasible at all without plan invalidation, and
> even with that I foresee fairly enormous added overhead.  Our experience
> with domains so far is that looking up those constraints is *expensive*.

For domain checking isn't expression evaluation required anyway?
email := email_value || email_value should fail on a constraint check for
result value at assignment time.  

I think what you are saying is that the domain checking (proposed constraint
existence checking) would need to be done in more places and I'm not sure I 
understand this.

I believe constraints checking should done less often than input
types. And checking for constraint <> NULL should be equivalent to the 
current check *typtype != 'd'. I could be wrong base on the current
implementation.  There may be more to it, but I suspect making sure the 
constraint value is available when you fetch a type would be necessary.

Turn the thing around a bit.  The contraint is an attribute on anytype.
All type code, only where appropriate, should check for existence of
the constraint attribute.  This is different from saying domains as 
special types and need special casing in places (other than constraint 
checking).
I'm trying to remove the specialness from domains so that the type
code can pretty well work as is in all places execpt checking for the
constraint attribute of a type.  This should solve some of the existing
domain problems.

We're pretty close to this as is, but there is still a lot of special
casing going on.  The ability to add constraints to any type should
only be considered as a logical extension made easier by the change
in what you test when you test for constraints.

elein
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] log_duration is redundant, no?

2006-09-08 Thread Tom Lane

"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> Do you want me to propose a patch or do you prefer to work on it
> yourself?

It's done already ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] log_duration is redundant, no?

2006-09-08 Thread Guillaume Smet


Tom,

On 9/8/06, Tom Lane <[EMAIL PROTECTED]> wrote:

It seems like we should either remove the separate log_duration boolean
or make it work as he suggests.  I'm leaning to the second answer now.


Do you want me to propose a patch or do you prefer to work on it
yourself? If so, do we keep the log_duration name or do we change it
to log_all_duration or another more appropriate name?

I attached the little patch I use to apply on our packages. I can work
on it to make it apply to HEAD and update the documentation.

I suppose we should also change the FE/BE protocol logging accordingly
but ISTM you already planned to change it for other reasons.

--
Guillaume
Index: src/backend/tcop/postgres.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
retrieving revision 1.483
diff -u -r1.483 postgres.c
--- src/backend/tcop/postgres.c	4 Apr 2006 19:35:35 -	1.483
+++ src/backend/tcop/postgres.c	6 Apr 2006 12:12:00 -
@@ -1092,14 +1092,6 @@
 		usecs = (long) (stop_t.tv_sec - start_t.tv_sec) * 100 +
 			(long) (stop_t.tv_usec - start_t.tv_usec);
 
-		/* Only print duration if we previously printed the statement. */
-		if (was_logged && save_log_duration)
-			ereport(LOG,
-	(errmsg("duration: %ld.%03ld ms",
-			(long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
-  (stop_t.tv_usec - start_t.tv_usec) / 1000),
-		 (long) (stop_t.tv_usec - start_t.tv_usec) % 1000)));
-
 		/*
 		 * Output a duration_statement to the log if the query has exceeded
 		 * the min duration, or if we are to print all durations.
@@ -1107,6 +1099,7 @@
 		if (save_log_min_duration_statement == 0 ||
 			(save_log_min_duration_statement > 0 &&
 			 usecs >= save_log_min_duration_statement * 1000))
+		{
 			ereport(LOG,
 	(errmsg("duration: %ld.%03ld ms  statement: %s%s",
 			(long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
@@ -1114,6 +1107,17 @@
 			(long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
 			query_string,
 			prepare_string ? prepare_string : "")));
+		}
+		else
+		{
+			/* Print duration if we did not print it before. */
+			if (save_log_duration)
+ereport(LOG,
+		(errmsg("duration: %ld.%03ld ms",
+(long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
+	  (stop_t.tv_usec - start_t.tv_usec) / 1000),
+			 (long) (stop_t.tv_usec - start_t.tv_usec) % 1000)));
+		}
 	}
 
 	if (save_log_statement_stats)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan


[EMAIL PROTECTED] wrote:

Only ASCII values store more space efficiently in UTF-8. All values
over 127 store more space efficiently using UTF-16. 

  


This second statement is demonstrably not true. Only values above 0x07ff 
require more than 2 bytes in UTF-8. All chars up to that point are 
stored in UTF-8 with greater or equal efficiency than that of UTF-16.  
See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera

[EMAIL PROTECTED] wrote:
> On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote:

> > But Martijn already clarified that ICU does not actually force you to
> > switch everything to UTF-16, so this is not an issue anyway.
> 
> If my memory is correct, it does this by converting it to UTF-16 first.
> This is a performance disadvantage (although it may not be worse than
> PostgreSQL's current implementation :-) ).

Actually he muttered something about iterators, and not needing to
convert anything.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote:
> [EMAIL PROTECTED] wrote:
> > The authors of the library in question? Java? Anybody whose primary
> > alphabet isn't LATIN1 based? :-)
> Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than
> UTF-8.  That covers a lot of the world.  Forcing those people to change
> to UTF-16 does not strike me as a very good idea.

Ah. Thought you were talking UTF-8 vs UTF-16.

> But Martijn already clarified that ICU does not actually force you to
> switch everything to UTF-16, so this is not an issue anyway.

If my memory is correct, it does this by converting it to UTF-16 first.
This is a performance disadvantage (although it may not be worse than
PostgreSQL's current implementation :-) ).

> > Only ASCII values store more space efficiently in UTF-8. All values
> > over 127 store more space efficiently using UTF-16. UTF-16 is easier
> > to process. UTF-8 requires too many bit checks with single character
> > offsets. I'm not an expert - I had this question before a year or two
> > ago, and read up on the ideas of experts.
> Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs.
> UTF-16".  I don't much like UTF-8 myself, but that's not a very informed
> opinion, just like a feeling of "fly-killing-cannon" (when it's used to
> store Latin-9-fitting text).

*nod*

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Gevik Babakhani

On Fri, 2006-09-08 at 16:17 -0400, Tom Lane wrote:
> Gevik Babakhani <[EMAIL PROTECTED]> writes:
> > typreceive = not supported
> > typsend = not supported
> 
> Really?  Why not?

You are right, typreceive/typsend are also needed.
How would you advice to test this?

> I would suggest that the default output format just be 32 hex
> characters, since that would render the type useful for purposes
> other than one narrow definition of UUID.

Agreed.

> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera

[EMAIL PROTECTED] wrote:
> On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote:
> > [EMAIL PROTECTED] wrote:
> > > I think I've been involved in a discussion like this in the past. Was
> > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
> > > means that UTF-8 applications are at a disadvantage when using the
> > > library. UTF-16 is considered more efficient to work with for everybody
> > > except ASCII users. :-)
> > Uh, is it?  By whom?  And why?
> 
> The authors of the library in question? Java? Anybody whose primary
> alphabet isn't LATIN1 based? :-)

Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than
UTF-8.  That covers a lot of the world.  Forcing those people to change
to UTF-16 does not strike me as a very good idea.

But Martijn already clarified that ICU does not actually force you to
switch everything to UTF-16, so this is not an issue anyway.

> Only ASCII values store more space efficiently in UTF-8. All values
> over 127 store more space efficiently using UTF-16. UTF-16 is easier
> to process. UTF-8 requires too many bit checks with single character
> offsets. I'm not an expert - I had this question before a year or two
> ago, and read up on the ideas of experts.

Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs.
UTF-16".  I don't much like UTF-8 myself, but that's not a very informed
opinion, just like a feeling of "fly-killing-cannon" (when it's used to
store Latin-9-fitting text).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote:
> [EMAIL PROTECTED] wrote:
> > I think I've been involved in a discussion like this in the past. Was
> > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
> > means that UTF-8 applications are at a disadvantage when using the
> > library. UTF-16 is considered more efficient to work with for everybody
> > except ASCII users. :-)
> Uh, is it?  By whom?  And why?

The authors of the library in question? Java? Anybody whose primary
alphabet isn't LATIN1 based? :-)

Only ASCII values store more space efficiently in UTF-8. All values
over 127 store more space efficiently using UTF-16. UTF-16 is easier
to process. UTF-8 requires too many bit checks with single character
offsets. I'm not an expert - I had this question before a year or two
ago, and read up on the ideas of experts.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Martijn van Oosterhout

Just a few comments,

On Fri, Sep 08, 2006 at 09:18:20PM +0200, Gevik Babakhani wrote:
> 5) support functions:
> because uuid could also be used as PK or unique values, additional
> function(s) will be available to produce a uuid value to be used in 
> a field's default value like sequences or PL/pgSQL etc.. etc...
> 
> example;
> 
> create table tbl( 
>   ID uuid default ('new_uuid()'),
>   
> );

That would be: ID uuid default new_uuid();

> typstorage = m // stored compressed inline

Compression is not going to work on such short values, the header will
almost longer, just use 'p' like every other fixed length type.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Tom Lane

Gevik Babakhani <[EMAIL PROTECTED]> writes:
> typreceive = not supported
> typsend = not supported

Really?  Why not?

I would suggest that the default output format just be 32 hex
characters, since that would render the type useful for purposes
other than one narrow definition of UUID.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Proposal for GUID datatype

2006-09-08 Thread Gevik Babakhani


Martijn van Oosterhout wrote:

Just a few comments,

On Fri, Sep 08, 2006 at 09:18:20PM +0200, Gevik Babakhani wrote:
  

5) support functions:
because uuid could also be used as PK or unique values, additional
function(s) will be available to produce a uuid value to be used in 
a field's default value like sequences or PL/pgSQL etc.. etc...


example;

create table tbl( 
	ID uuid default ('new_uuid()'),


);



That would be: ID uuid default new_uuid();

  

typstorage = m // stored compressed inline



Compression is not going to work on such short values, the header will
almost longer, just use 'p' like every other fixed length type.

Have a nice day,
  

Point taken, thank you.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > No one has mentioned that we page value on disk to match the CPU
> > alignment.  This is done for efficiency, but is not strictly required.
> 
> Well, it is unless you are willing to give up support of non-Intel CPUs;
> most other popular chips are strict about alignment, and will fail an
> attempt to do a nonaligned fetch.
> 
> The only way we could pack stuff without alignment is to go over to the
> idea that memory and disk representations are different --- where in
> this case the "conversion" might just be a memcpy to a known-aligned
> location.  The performance costs of that seem pretty daunting, however,
> especially when you reflect that simply stepping over a varlena field
> would require memcpy'ing its length word to someplace.

Agreed, but I thought I would point it out.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 03:20:00PM -0400, Tom Lane wrote:
> "Albe Laurenz" <[EMAIL PROTECTED]> writes:
> > I did not only add them to the Makefile for interfaces/libpq,
> > but also everywhere something is linked against libpq in case
> > somebody links static.

> If we do need to start mentioning all of libpq's dependencies everywhere
> it's linked, I think it's time for a generic solution to that, instead
> of hacking each such place over again every time a new dependency pops up.

The business of having to include every single dependancy when linking
static is quite irritating. It has almost reached the point where
people are just giving up static linking because it's too much of a
pain.

However, if we do want to support it, the way you do it is by extending
pg_config to do something like:

pg_config --dynamic-lick   => returns -lpq
pg_config --static-link=> returns -lpq 

That way only people who actually want static linking need be bothered.

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread Tom Lane

elein <[EMAIL PROTECTED]> writes:
> a) if subtypes/domains can have constraints then the model should
>not be different for domains only but for all types.  Constraint 
>checking would only
>occur at check constraint points--and there for any type.  You
>already check for the existance of a domain.  Change that test
>to the existence of a constraint only and eliminate domain specific
>code.

Au contraire, the test whether a constraint actually exists occurs at
runtime, not at the time we check for domain-ness.  Your proposal would
force such checks to be introduced into every single expression
evaluation.  It's not feasible at all without plan invalidation, and
even with that I foresee fairly enormous added overhead.  Our experience
with domains so far is that looking up those constraints is *expensive*.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread elein

On Fri, Sep 08, 2006 at 02:33:13PM -0400, Tom Lane wrote:
> elein <[EMAIL PROTECTED]> writes:
> > Domains and subtypes.
> 
> >* Create new child type from values in parent type.
> >* Maintain only checks for constraints
> >* Create implicit casts from child to parent
> 
> This seems a bit content-free, because it's not clear how it differs
> from what we do now.  We already have implicit child-to-parent casts.

I guess the key point here was to treat the domains as proper udt types
except where constraint checking is required.  And yes, this is already
done, but it needed to be included for context.

> 
> > Constraints on types:
> 
> >* Change the pg_types to hold a NULLABLE constraint text column
> >  OR add a type constraint lookup table (pg_domains?)
> 
> I understand that you are arguing to allow constraints to be associated
> with any type not only domains, but
> (a) I don't see why we should want to add that overhead, and
> (b) I don't see what that has to do with the problem you actually
> need to solve, specifically limiting the application of implicit
> domain-to-base-type casts.

This is a new feature idea, derived from the implementation of domains.  
Usually people store type checking in the input functions, but this is 
a nice addition to UDTs that require a constraint checking model. 
It allows the constraints to be in plperl which is nice for parsing 
complex object stored at strings. (My example was email and the constraint 
was a plperl function that validated the format and legitimacy of the value.)  

Other complex objects (stored as strings) such as key value lists and 
your ordinary weirdly constructed values can use the (more expensive) 
constraint at constraint time only instead of the input function which 
should remain fast and may be a borrowed or inherited input function.

This is not a drop dead required feature but it should flow from the
cleaner implementation of domains. Changing the check from domain type
to constraint exists on any type should be cleaner.  Changing
the SQL for CREATE TYPE should be the added work to get this feature
available.

It just seems simpler and cleaner.  We want to treat all types the
same and maintain a type blind database server.

a) if subtypes/domains can have constraints then the model should
   not be different for domains only but for all types.  Constraint 
   checking would only
   occur at check constraint points--and there for any type.  You
   already check for the existance of a domain.  Change that test
   to the existence of a constraint only and eliminate domain specific
   code.
b) It is not part of the problem but a logical stretch given the
   changes required.  It will also reduce the domain checking.

Other than my existing tests (published previously) I do not have a good
idea code wise the extent of the changes.  This discussion may help
us get to that point.  

> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 

elein
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote:
> > > But that won't help in the example you posted upthread, because char(N) 
> > > is not fixed-length.
> > 
> > It can be fixed-length, or at least, have an upper bound. If marked
> > up to contain only ascii characters, it doesn't, at least in theory,
> > and even if it is unicode, it's not going to need more than 4 bytes
> > per character. char(2) through char(16) only require 4 bits to
> > store the length header, leaving 4 bits for encoding information.
> > bytea(2) through bytea(16), at least in theory, should require none.
> 
> If your talking about an upper-bound, then it's not fixed length
> anymore, and you need to expend bytes storing the length. ASCII bytes
> only take one byte in most encodings, include UTF8.
> 
> Doodling this morning I remember why the simple approach didn't work.
> If you look at the varlena header, 2 bits are reserved. Say you take
> one bit to indicate "short header". Then lengths 0-31 bytes can be
> represented with a one byte header, yay!
> 
> However, now you only have enough bits leftover to store 29 bits for
> the length, so we've just cut the maximum datum size from 1GB to 512MB.
> Is that a fair trade? Probably not, so you'd need a more sophisticated
> scheme.

I was hoping we could have both bits true mean short header, but that is
also used by our system to indicate compressed and TOAST usage.  For
testing, I would just grab a bit and see how thing go.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries

2006-09-08 Thread Tom Lane

"Albe Laurenz" <[EMAIL PROTECTED]> writes:
> I did not only add them to the Makefile for interfaces/libpq,
> but also everywhere something is linked against libpq in case
> somebody links static.

I intensely dislike that part of the patch, but will work on applying
the rest.

If we do need to start mentioning all of libpq's dependencies everywhere
it's linked, I think it's time for a generic solution to that, instead
of hacking each such place over again every time a new dependency pops up.
But at the moment I'm unconvinced that we need to do it.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Proposal for GUID datatype

2006-09-08 Thread Gevik Babakhani

Folks,

I would like to submit the following proposal regarding the
implementation of the GUID datatype. Based on the findings, thoughts and
the discussion we have had in the past, I am going to propose the
following:

1) Datatype name would be "uuid" or "guid".
example: create table tbl (fld uuid, fld2 );

2) Accepted input/output datatype and formats:

The input/output datatype would be string(36)

2a) Three input formats are supported.
example:
insert into tbl (fld) values('1dfb39af-b56a-40b8-a903-b5b31567c3ce');
insert into tbl (fld) values('{1dfb39af-b56a-40b8-a903-b5b31567c3ce}');
insert into tbl (fld) values('1dfb39afb56a40b8a903b5b31567c3ce');

2b) Only one default output format is supported.
example:

# select fld from tbl;

  fld
--+
1dfb39af-b56a-40b8-a903-b5b31567c3ce  | 

2b.a) An additional function will be available to provide other
output formats or an existing function like to_char will support the
additional formatting options.
 

3) Internal datatype
Because there is not going to be any kind of (mathematically meaningful)
calculation on the guid values, the internal datatype would be just a
simple 16 byte unsigned char (uint8). This would help when comparing
these values and can also be compressed inline

Proposed data structure would be:

typedef struct uuid_t {
char  data[16];
} uuid_t;

4) Comparing functions and operators

The default comparing functions and operators like
= < != > etc, etc.. would be implemented as required.
Note that guid >= guid would not mean anything. The values will
internally be compared as strings.

5) support functions:
because uuid could also be used as PK or unique values, additional
function(s) will be available to produce a uuid value to be used in 
a field's default value like sequences or PL/pgSQL etc.. etc...

example;

create table tbl( 
ID uuid default ('new_uuid()'),

);

5.a) If needed an additional macro-type like SERIAL could also
be developed in later stage.

6) pg_type layout:

typname = uuid
typnamespace = pg_catalog
typowner = (default) // db owner
typlen = 16
typbyval = FALSE // type is byref
typtype = b // built-in type
typisdefiled = true
typdelim = ',' // ',' seperator for array of uuid
typrelid = 0
typelem = 0
typinput = to be defined later
typoutput = to be defined later
typreceive = not supported
typsend = not supported
typanalyze = 0 // default analyze
typalign = c 
typstorage = m // stored compressed inline
typnotnull = false // can be null

other pg_type attributes are set to default values.


Please send your comments and suggestions to complete or modify this
proposal.

Regards,
Gevik



   


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan


Bruce Momjian wrote:


No one has mentioned that we page value on disk to match the CPU
alignment.  This is done for efficiency, but is not strictly required.

  


From time to time the idea of a logical vs physical mapping for columns 
has been mentioned. Among other benefits, that might allow us to do some 
rearrangement of physical ordering to reduce space wasted on alignment 
in some cases. There might be a small addition on computation required, 
but I suspect it would be lost in the noise, and swamped by any 
increased efficiency we got from putting more tuples in a page.


cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> No one has mentioned that we page value on disk to match the CPU
> alignment.  This is done for efficiency, but is not strictly required.

Well, it is unless you are willing to give up support of non-Intel CPUs;
most other popular chips are strict about alignment, and will fail an
attempt to do a nonaligned fetch.

The only way we could pack stuff without alignment is to go over to the
idea that memory and disk representations are different --- where in
this case the "conversion" might just be a memcpy to a known-aligned
location.  The performance costs of that seem pretty daunting, however,
especially when you reflect that simply stepping over a varlena field
would require memcpy'ing its length word to someplace.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Bruce Momjian

Gregory Stark wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> 
> > Gregory Stark wrote:
> > > But I think this is a dead-end route. What you're looking at is the 
> > > number "1"
> > > repeated for *every* record in the table. And what your proposing amounts 
> > > to
> > > noticing that the number "4" fits in a byte and doesn't need a whole word 
> > > to
> > > store it. Well sure, but you don't even need a byte if it's going to be 
> > > the
> > > same for every record in the table.
> > > 
> > > If someone popped up on the list asking about whether Postgres compressed
> > > their data efficiently if they stored a column that was identical 
> > > throughout
> > > the whole table you would tell them to normalize their data.
> > 
> > I am confused.  You don't want to shrink the header but instead compress
> > duplicate values in the same row to a single entry?
> 
> I think we have to find a way to remove the varlena length header entirely for
> fixed length data types since it's going to be the same for every single
> record in the table.

What fixed-length data type has a header?

> It might be useful to find a way to have 1-byte or 2-byte length headers too
> since I suspect most legitimately variable columns like text or array[] are
> also gong to be under 256 bytes.

I think the point you are making is that fixed length fields, like GUID,
don't need a header, while short fields like VARCHAR() and NUMERIC()
need some shorter header.

No one has mentioned that we page value on disk to match the CPU
alignment.  This is done for efficiency, but is not strictly required.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Query returning tuples that does not satisfy the WHERE clause

2006-09-08 Thread Tom Lane

Manuel Sugawara  writes:
> I don't think so, search_path has its default value and also I'm able
> to reproduce it in a fresh cluster (tried 3 different machines to
> discard hardware problems).

Hm, well I'm willing to take a look if you can provide me access to the
problem database running on a debug-enabled Postgres build.  The
fresh-cluster test seems to rule out my other idea about a corrupt
index (though that was shaky anyway considering both plans use the
same index...)

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Alvaro Herrera

[EMAIL PROTECTED] wrote:

> I think I've been involved in a discussion like this in the past. Was
> it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
> means that UTF-8 applications are at a disadvantage when using the
> library. UTF-16 is considered more efficient to work with for everybody
> except ASCII users. :-)

Uh, is it?  By whom?  And why?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote:
> Ah, you're right, I did misunderstand that.  However, it's still
> apparently the case that ICU works mostly with UTF16 and handles other
> encodings only via conversion to UTF16.  That's a pretty serious
> mismatch with our needs --- we'll end up converting to UTF16 all the
> time.  We're certainly not going to change to using UTF16 as the actual
> native string representation inside the backend, both because of the
> space penalty and incompatibility with tools like bison.

No need to do anything like that. We'd probably use the
u_strCompareIter() interface, where the two strings are defined as
iterators. We setup the iterator to understand whatever charset
postgres is currently running.

Many of the other function have iterator versions also, so you can
avoid UTF-16 entirely if you like.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread Tom Lane

elein <[EMAIL PROTECTED]> writes:
> Domains and subtypes.

>* Create new child type from values in parent type.
>* Maintain only checks for constraints
>* Create implicit casts from child to parent

This seems a bit content-free, because it's not clear how it differs
from what we do now.  We already have implicit child-to-parent casts.

> Constraints on types:

>* Change the pg_types to hold a NULLABLE constraint text column
>  OR add a type constraint lookup table (pg_domains?)

I understand that you are arguing to allow constraints to be associated
with any type not only domains, but
(a) I don't see why we should want to add that overhead, and
(b) I don't see what that has to do with the problem you actually
need to solve, specifically limiting the application of implicit
domain-to-base-type casts.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Query returning tuples that does not satisfy the WHERE clause

2006-09-08 Thread Manuel Sugawara

Tom Lane <[EMAIL PROTECTED]> writes:

> You seem to have worse problems than that, because as given the
> insc_registra_grupo function never returns non-NULL

Actually the function (and the database) is quite complex and was
trimed just to test the problem.

> I suspect pilot error --- perhaps looking at the wrong schema or
> some such?

I don't think so, search_path has its default value and also I'm able
to reproduce it in a fresh cluster (tried 3 different machines to
discard hardware problems).

Regards,
Manuel.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> >> AFAICT, most of the useful operations work on UChar, which is uint16:
> >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30=
> > 5324ef288165e2ac
> > Oh, you're confusing UCS-2 with UTF-16,
> Ah, you're right, I did misunderstand that.  However, it's still
> apparently the case that ICU works mostly with UTF16 and handles other
> encodings only via conversion to UTF16.  That's a pretty serious
> mismatch with our needs --- we'll end up converting to UTF16 all the
> time.  We're certainly not going to change to using UTF16 as the actual
> native string representation inside the backend, both because of the
> space penalty and incompatibility with tools like bison.

I think I've been involved in a discussion like this in the past. Was
it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
means that UTF-8 applications are at a disadvantage when using the
library. UTF-16 is considered more efficient to work with for everybody
except ASCII users. :-)

No opinion on the matter though. Changing PostgreSQL to UTF-16 would
be an undertaking... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Query returning tuples that does not satisfy the WHERE clause

2006-09-08 Thread Tom Lane

Manuel Sugawara  writes:
> Using the second plan the query is returning tuples where
>a=F1o_semestre <> 20071

You seem to have worse problems than that, because as given the
insc_registra_grupo function never returns non-NULL, and so the
query ought not be returning any tuples at all.  I suspect pilot
error --- perhaps looking at the wrong schema or some such?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Domains and subtypes, a brief proposal

2006-09-08 Thread elein

On Thu, Sep 07, 2006 at 07:12:17PM -0700, Josh Berkus wrote:
> Elein,
> 
> > I may have missed some stuff here. Obviously.  For example how to divide
> > and conquer the various aspects of the issues raised here. But this is a
> > high, high level proposal at this time.
> 
> I'm not quite clear on what in your proposal is different from current Domain 
> behavior.  Or are you just looking to remove the limitations on where Domains 
> can be used?
> 
> -- 
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
> 

I'm looking to make domains proper types and eliminate the need for 
special domain checking in the case where it is not a check constraint
check.  I'm trying to influence a more logical implementation of 
domains that removes some code and gains us features.  Also it
should eliminate some (all?) of the current limitations with regards
to using domains as subtypes.  It will also pave the way for 
implementation of create type under type.

--elein
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Query returning tuples that does not satisfy the WHERE clause

2006-09-08 Thread Manuel Sugawara

Hi all,

I'm running PostgreSQL v 8.1.4 and found a query that returns tuples
that does not satisfy the WHERE clause, the query is:

select * into errores_20071 from (
   select r.id, r.trayectoria_id, r.grupo_id, 
regacd.insc_registra_grupo(trayectoria_id, grupo_id, true, false, true) as 
error 
 from regacd.registro r join regacd.grupo g on (g.id = r.grupo_id)
   where g.año_semestre = 20071 and g.tipo_id = 'a') x 
 where error is not null;

A self-contained database schema is here:



schema-registro.sql
Description: Binary data

Unfortunately I cannot post the data set but I'm willing to give
access to my machine to test the problem.

While trying to create a self contained test case I found that the
query returns the correct answer before analyzing:

 QUERY PLAN
  

   Nested Loop  (cost=41.61..31193.44 rows=36 width=12)
 ->  Index Scan using "AsignaturClaveGrupoÚnicaPorAñoSemestre2" on grupo g  
(cost=0.00..14.03 rows=3 width=4)
   Index Cond: (("año_semestre" = 20071) AND (tipo_id = 'a'::"char"))
 ->  Bitmap Heap Scan on registro r  (cost=41.61..10305.22 rows=7031 
width=12)
   Recheck Cond: ("outer".id = r.grupo_id)
   Filter: (insc_registra_grupo(trayectoria_id, grupo_id, true, false, 
true) IS NOT NULL)
   ->  Bitmap Index Scan on registro_grupo  (cost=0.00..41.61 rows=7031 
width=0)
 Index Cond: ("outer".id = r.grupo_id)
  (8 filas)

but does not after I run analyze:

 QUERY PLAN
  

   Hash Join  (cost=1166.75..44109.74 rows=34184 width=12)
 Hash Cond: ("outer".grupo_id = "inner".id)
 ->  Seq Scan on registro r  (cost=0.00..28538.85 rows=1397684 width=12)
   Filter: (insc_registra_grupo(trayectoria_id, grupo_id, true, false, 
true) IS NOT NULL)
 ->  Hash  (cost=1159.54..1159.54 rows=2883 width=4)
   ->  Bitmap Heap Scan on grupo g  (cost=31.30..1159.54 rows=2883 
width=4)
 Recheck Cond: (("año_semestre" = 20071) AND (tipo_id = 
'a'::"char"))
 ->  Bitmap Index Scan on 
"AsignaturClaveGrupoÚnicaPorAñoSemestre2"  (cost=0.00..31.30 rows=2883 width=0)
   Index Cond: (("año_semestre" = 20071) AND (tipo_id = 
'a'::"char"))
  (9 filas)

Using the second plan the query is returning tuples where
  
   año_semestre <> 20071

Any help will be appreciated.

Best regards,
Manuel.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane

Martijn van Oosterhout  writes:
>> AFAICT, most of the useful operations work on UChar, which is uint16:
>> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30=
> 5324ef288165e2ac

> Oh, you're confusing UCS-2 with UTF-16,

Ah, you're right, I did misunderstand that.  However, it's still
apparently the case that ICU works mostly with UTF16 and handles other
encodings only via conversion to UTF16.  That's a pretty serious
mismatch with our needs --- we'll end up converting to UTF16 all the
time.  We're certainly not going to change to using UTF16 as the actual
native string representation inside the backend, both because of the
space penalty and incompatibility with tools like bison.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Alvaro Herrera

Praveen Kumar N wrote:
> 
> one more doubt.in the following example we dont need sort right.But in 
> this case also relations were scanned sequentially one by one.

In this case it's because the result from one seqscan was materialized.

> So is it the 
> case that any relation is accessed only once from database while executing 
> a given query?

Not in general -- you'll see that behavior only in particular cases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N



one more doubt.in the following example we dont need sort right.But in 
this case also relations were scanned sequentially one by one.So is it the 
case that any relation is accessed only once from database while executing 
a given query?




praveen=# explain select count(*) from a_9000_0,b_9000_0;
QUERY PLAN
--
 Aggregate  (cost=537566595.00..537566595.01 rows=1 width=0)
   ->  Nested Loop  (cost=7616.00..485726595.00 rows=2073600 width=0)
 ->  Seq Scan on a_9000_0  (cost=0.00..6979.00 rows=144000 
width=0)

 ->  Materialize  (cost=7616.00..9549.00 rows=144000 width=0)
   ->  Seq Scan on b_9000_0  (cost=0.00..6979.00 rows=144000 
width=0)

(5 rows)


Regards,
Praveen


On Fri, 8 Sep 2006, Heikki Linnakangas wrote:


Date: Fri, 08 Sep 2006 15:30:37 +0100
From: Heikki Linnakangas <[EMAIL PROTECTED]>
To: Praveen Kumar N <[EMAIL PROTECTED]>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:

On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a merge 
join on the sorted inputs.


could you tell me how can we trace that? I mean which functions shall I 
checkout for that.


The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. Can't 
remember function names from the top of my head.





--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 12:19:19PM -0400, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote:
> >> what's more, the docs suggest that it doesn't support anything wider
> >> than UTF16.
> 
> > Well, that's not true, which part of the docs were you looking at?
> 
> AFAICT, most of the useful operations work on UChar, which is uint16:
> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b305324ef288165e2ac

Oh, you're confusing UCS-2 with UTF-16, UCS-2 is a subset of UTF-16
that only handles the basic plane. Just like no-one is surprised that
UTF-8 handles more than 256 characters, it shouldn't surprise you that
UTF-16 handles more than 65536. ICU hasn't used UCS-2 since 1996.

It's in the FAQ:
http://icu.sourceforge.net/userguide/icufaq.html

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane

Martijn van Oosterhout  writes:
> On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote:
>> what's more, the docs suggest that it doesn't support anything wider
>> than UTF16.

> Well, that's not true, which part of the docs were you looking at?

AFAICT, most of the useful operations work on UChar, which is uint16:
http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b305324ef288165e2ac

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote:
> The reason this is a relevant consideration: we are talking about
> changes that would remove existing functionality for people who don't
> have that library.

Huh? If you don't select ICU at compile time you get no difference from
what we have now. I'm not sure I'm seeing your point. My COLLATE
patches did allow both to coexist, but no-one appeared to like that
idea either.

> I suppose it might be possible to do
>   #ifdef HAVE_ICU
>   ... new code ...
>   #else
>   ... existing code ...
>   #endif
> but given the differences in API I can't believe this would be readable
> or maintainable.

That's what the patch does. And the api differences are marginal. They
even have C compatability functions to make it easier.

> Another problem is that AFAICT, depending on ICU would force us to
> standardize on Unicode as the *only* server internal encoding; 

Huh? You can use whatever encoding you like... Actual collations are
determined on the basis of unicode properties, but I don't think that
is what you're referring to. 

> what's more, the docs suggest that it doesn't support anything wider
> than UTF16.

Well, that's not true, which part of the docs were you looking at?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

[HACKERS] BF failure - kookaburra - ecpg

2006-09-08 Thread Rocco Altier

One of the new ecpg tests for comments fails to compile the resulting .c
file, because my vendor C compiler doesn't like '//' style comments,
when running in C mode.

Specifically its line 12 of comment.pgc:
// we also understand this style

It seems like ecpg should translate the comment from '//' to '/* */'
style, for the output .c file.

Thanks,
-rocco


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark

Martijn van Oosterhout  writes:

> I'm still missing the argument of why you can't just make a 16-byte
> type. Around half the datatypes in postgresql are fixed-length and have
> no header. I'm completely confused about why people are hung up about
> bytea(16) not being fixed length when it's trivial to create a type
> that is.

Because by the time you have a CHAR(1), CHAR(2), CHAR(4), and CHAR(8) your
head is already swimming trying to keep track of all the casts and
cross-data-type comparators and you haven't even covered all the cases. If you
define types just for the lengths up to 128 you would have 16,384 casts and
114,688 different cross-data-type comparisons just between them.

Without them you wouldn't be able to have things like 
 phone_number char(10)
 area_code char(3)
and do things like:
 WHERE phone_number LIKE area_code||'%'

And before you say so, sure this isn't the only way to do this and there are
reasons why this may not be the best. But if you were shipping separate data
types for char(3) and char(10) I think it would be a bug if the above didn't
work.

The problem is worse with numeric in that it would definitely be a bug if you
couldn't use an index when comparing two numeric columns just because one had
less precision than the other. There wouldn't be nearly as many types but even
with just three such types you're already talking about hundreds of
cross-data-type comparisons.

Would others really consider shipping hundreds of new types to take care of
this problem? I was looking for a more general solution.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Tom Lane

Martijn van Oosterhout  writes:
> On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote:
>> Uhm, an ICU source tree is over 40 *megabytes*.

> I don't understand this argument. No-one asked what size the LDAP
> libraries were when we added support for them. No-one cares that
> libssl/libcrypto is as large as glibc.

The reason this is a relevant consideration: we are talking about
changes that would remove existing functionality for people who don't
have that library.  People who don't have LDAP don't care that the PG
sources have some LDAP functionality they're not getting, people who
don't have SSL evidently don't care about that, etc.  But there is
existing, portable locale and multi-charset support in PG, and even
though it's rather limited it's still useful.  So you're telling people
"to maintain the same functionality you have today, you will have to add
this rather large library".  That is only zero-cost from the perspective
of someone who already has ICU installed; from everyone else, you should
expect pushback.

I suppose it might be possible to do
#ifdef HAVE_ICU
... new code ...
#else
... existing code ...
#endif
but given the differences in API I can't believe this would be readable
or maintainable.

Another problem is that AFAICT, depending on ICU would force us to
standardize on Unicode as the *only* server internal encoding; what's
more, the docs suggest that it doesn't support anything wider than
UTF16.  From the point of view of some of our far eastern users, both
of those are serious steps backward.  "Add large library, get *less*
functionality" is an even harder sell.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas


Praveen Kumar N wrote:

On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a 
merge join on the sorted inputs.


could you tell me how can we trace that? I mean which functions shall 
I checkout for that.


The sort code is in src/backend/utils/sort/tuplesort.c and logtape.c. 
Can't remember function names from the top of my head.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N

On Fri, 8 Sep 2006, Heikki Linnakangas wrote:

Date: Fri, 08 Sep 2006 15:22:19 +0100
From: Heikki Linnakangas <[EMAIL PROTECTED]>
To: Praveen Kumar N <[EMAIL PROTECTED]>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:

Following is the output of query i have executed.

Looks like Alvaro guessed right. It reads both relations in sequence, sorts 
them in temporary storage, outside bufmgr, and then does a merge join on the 
sorted inputs.

could you tell me how can we trace that? I mean which functions shall I 
checkout for that.

If you want to see the behavior you expected, I think you need to define 
indexes on a2 and b2, if you don't have them already, and coerce the planner 
to choose a nested loop join. I'd suggest using "SET enable_seqscan=false; 
SET enable_bitmapscan=false;" and see if that gets you a nested loop join.

--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas


Praveen Kumar N wrote:


Following is the output of query i have executed.


Looks like Alvaro guessed right. It reads both relations in sequence, 
sorts them in temporary storage, outside bufmgr, and then does a merge 
join on the sorted inputs.


If you want to see the behavior you expected, I think you need to define 
indexes on a2 and b2, if you don't have them already, and coerce the 
planner to choose a nested loop join. I'd suggest using "SET 
enable_seqscan=false; SET enable_bitmapscan=false;" and see if that gets 
you a nested loop join.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N



Following is the output of query i have executed.

praveen=# explain select count(*) from a_9000_0,b_9000_0 where a2=b2;
 QUERY PLAN

 Aggregate  (cost=1924635.42..1924635.43 rows=1 width=0)
   ->  Merge Join  (cost=109515.42..1665435.42 rows=10368 width=0)
 Merge Cond: (a_9000_0.a2 = b_9000_0.b2)
 ->  Sort  (cost=54757.71..55117.71 rows=144000 width=260)
   Sort Key: a_9000_0.a2
   ->  Seq Scan on a_9000_0  (cost=0.00..6979.00 rows=144000 
width=260)

 ->  Sort  (cost=54757.71..55117.71 rows=144000 width=260)
   Sort Key: b_9000_0.b2
   ->  Seq Scan on b_9000_0  (cost=0.00..6979.00 rows=144000 
width=260)

(9 rows)




On Fri, 8 Sep 2006, Heikki Linnakangas wrote:


Date: Fri, 08 Sep 2006 14:57:57 +0100
From: Heikki Linnakangas <[EMAIL PROTECTED]>
To: Praveen Kumar N <[EMAIL PROTECTED]>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:

Let me explain once more.

I have two relations which are 10 times more than bufferpool size.I have 
observed the following things when joined that two relations(it using merge 
join to join both relations)


1.It first accessed system catalog tables
2.Relation 1
3.Relation 2

my doubt is one whole relation cant fit in the main memory.That too when we 
use merge join, it should keep some part of 1st relations and should scan 
second relation as bufferpool size is less compared to size of each 
relation.similarly for the remainin part of 1st  relation.But it is not 
happening here.First whole Relation1 is scanned and then Relation 2 is 
scanned. Then how is it joining two relations using merge join? Am I 
missing something?


Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what it 
really does, otherwise we're just guessing.


I traced scanning of relation by editing the functions ReadBuffer() and 
BufferAlloc(),StrategyGetBuffer().


That sounds valid.




--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> Praveen Kumar N wrote:
> >Let me explain once more.
> >
> >I have two relations which are 10 times more than bufferpool size.I 
> >have observed the following things when joined that two relations(it 
> >using merge join to join both relations)
> >
> >1.It first accessed system catalog tables
> >2.Relation 1
> >3.Relation 2
> >
> >my doubt is one whole relation cant fit in the main memory.That too 
> >when we use merge join, it should keep some part of 1st relations and 
> >should scan second relation as bufferpool size is less compared to 
> >size of each relation.similarly for the remainin part of 1st  
> >relation.But it is not happening here.First whole Relation1 is scanned 
> >and then Relation 2 is scanned. Then how is it joining two relations 
> >using merge join? Am I missing something?
> 
> Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what 
> it really does, otherwise we're just guessing.

Another option would be that the tuples are written to a sort tape, I
think.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas


Praveen Kumar N wrote:

Let me explain once more.

I have two relations which are 10 times more than bufferpool size.I 
have observed the following things when joined that two relations(it 
using merge join to join both relations)


1.It first accessed system catalog tables
2.Relation 1
3.Relation 2

my doubt is one whole relation cant fit in the main memory.That too 
when we use merge join, it should keep some part of 1st relations and 
should scan second relation as bufferpool size is less compared to 
size of each relation.similarly for the remainin part of 1st  
relation.But it is not happening here.First whole Relation1 is scanned 
and then Relation 2 is scanned. Then how is it joining two relations 
using merge join? Am I missing something?


Hmm. A hash join, maybe? You should do EXPLAIN on the query to see what 
it really does, otherwise we're just guessing.


I traced scanning of relation by editing the functions ReadBuffer() 
and BufferAlloc(),StrategyGetBuffer().


That sounds valid.

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 09:28:21AM -0400, [EMAIL PROTECTED] wrote:
> > But that won't help in the example you posted upthread, because char(N) 
> > is not fixed-length.
> 
> It can be fixed-length, or at least, have an upper bound. If marked
> up to contain only ascii characters, it doesn't, at least in theory,
> and even if it is unicode, it's not going to need more than 4 bytes
> per character. char(2) through char(16) only require 4 bits to
> store the length header, leaving 4 bits for encoding information.
> bytea(2) through bytea(16), at least in theory, should require none.

If your talking about an upper-bound, then it's not fixed length
anymore, and you need to expend bytes storing the length. ASCII bytes
only take one byte in most encodings, include UTF8.

Doodling this morning I remember why the simple approach didn't work.
If you look at the varlena header, 2 bits are reserved. Say you take
one bit to indicate "short header". Then lengths 0-31 bytes can be
represented with a one byte header, yay!

However, now you only have enough bits leftover to store 29 bits for
the length, so we've just cut the maximum datum size from 1GB to 512MB.
Is that a fair trade? Probably not, so you'd need a more sophisticated
scheme.

> For my own uses, I would like for bytea(16) to have no length header.
> The length is constant. UUID or MD5SUM. Store the length at the head
> of the table, or look up the information from the schema.

I'm still missing the argument of why you can't just make a 16-byte
type. Around half the datatypes in postgresql are fixed-length and have
no header. I'm completely confused about why people are hung up about
bytea(16) not being fixed length when it's trivial to create a type
that is.

> I see the complexity argument. Existing code is too heavy to change
> completely. People talking about compromises such as allowing the
> on disk layout to be different from the in memory layout. 

The biggest cost of having differing memory and disk layouts is that
you have to "unpack" each disk page as it's read it. This means an
automatic doubling of memory usage for the buffer cache. If you're RAM
limited, that's the last thing you want.

Currently, the executor will use the contents of the actual disk page
when possible, saving a lot of copying.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N


Let me explain once more.

I have two relations which are 10 times more than bufferpool 
size.I have observed the following things when joined that two 
relations(it using merge join to join both relations)


1.It first accessed system catalog tables
2.Relation 1
3.Relation 2

my doubt is one whole relation cant fit in the main memory.That too when 
we use merge join, it should keep some part of 1st relations and should 
scan second relation as bufferpool size is less compared to size 
of each relation.similarly for the remainin part of 1st  relation.But 
it is not happening here.First whole Relation1 is scanned and then 
Relation 2 is scanned. Then how is it joining two relations using merge 
join? Am I missing something?
I traced scanning of relation by editing the functions ReadBuffer() and 
BufferAlloc(),StrategyGetBuffer().


I hope now it is clear.

thanks in anticipation.

bye


On Fri, 8 Sep 2006, Heikki Linnakangas wrote:


Date: Fri, 08 Sep 2006 14:30:01 +0100
From: Heikki Linnakangas <[EMAIL PROTECTED]>
To: Praveen Kumar N <[EMAIL PROTECTED]>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] postgresql shared buffers

Praveen Kumar N wrote:

I have installed postgresql from sourcecode.I would like to know
how pages are replaced in the bufferpool when we join two relations.I tried 
to trace it by editing files 
pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
am missing some information after observing extracted information abt 
buffer replacement.My input datasize is 10times more than main memory/RAM 
size.When I joined two relations,postgresql accessed both relations 
sequentially one by one and that too only once.Then how is it joining two 
relations by accessing only once? Is it storing that accessed relations 
some where other than main memory/bufferpool(Becos they cant fit into main 
memory).


What kind of a join is it? If it's a merge join, using indexes, it would only 
have to visit each heap page once.


So can anybdy tell me is there is ne thing I am missing? Is there any 
concept like postgresql cache similar to kernel cache otherthan 
sharedbuffers.If so how can we figure it out.


No. All access to relations (except temporary relations) go through bufmgr 
and the shared memory buffer cache.


Is there any way by which postgresql is accessing database relations 
through,other than rotines in bufmgr.c nd freelist.c(I mean any other 
routines like ReadBuffer,StrategyGet etc.)


No.




--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 08:50:57AM +0200, Peter Eisentraut wrote:
> Gregory Stark wrote:
> > But it's largely true for OLTP applications too. The more compact the
> > data the more tuples fit on a page and the greater the chance you
> > have the page you need in cache.
> But a linear amount of more RAM is still more affordable than a CPU that 
> is 100 times faster, which is about what some of the proposed schemes 
> would require.

100 times faster?

I don't think it has been proven that a change in how data is stored
would result in an increase in CPU usage. It's an assumption. It might
be correct. It might not.

I guess this is where patches speak louder than words... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Gregory Stark

Praveen Kumar N <[EMAIL PROTECTED]> writes:

> hai...
>
> I have installed postgresql from sourcecode.I would like to know
> how pages are replaced in the bufferpool when we join two relations.I tried to
> trace it by editing files
> pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
> am missing some information after observing extracted information abt buffer
> replacement.

Try

explain select ...

The output may be enlightening.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] postgresql shared buffers

2006-09-08 Thread Heikki Linnakangas


Praveen Kumar N wrote:

I have installed postgresql from sourcecode.I would like to know
how pages are replaced in the bufferpool when we join two relations.I 
tried to trace it by editing files 
pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
am missing some information after observing extracted information abt 
buffer replacement.My input datasize is 10times more than main 
memory/RAM size.When I joined two relations,postgresql accessed both 
relations sequentially one by one and that too only once.Then how is 
it joining two relations by accessing only once? Is it storing that 
accessed relations some where other than main memory/bufferpool(Becos 
they cant fit into main memory).


What kind of a join is it? If it's a merge join, using indexes, it would 
only have to visit each heap page once.


So can anybdy tell me is there is ne thing I am missing? Is there any 
concept like postgresql cache similar to kernel cache otherthan 
sharedbuffers.If so how can we figure it out.


No. All access to relations (except temporary relations) go through 
bufmgr and the shared memory buffer cache.


Is there any way by which postgresql is accessing database relations 
through,other than rotines in bufmgr.c nd freelist.c(I mean any other 
routines like ReadBuffer,StrategyGet etc.)


No.

--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread mark

On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote:
> Gregory Stark wrote:
> > I think we have to find a way to remove the varlena length header
> > entirely for fixed length data types since it's going to be the same
> > for every single record in the table.
> But that won't help in the example you posted upthread, because char(N) 
> is not fixed-length.

It can be fixed-length, or at least, have an upper bound. If marked
up to contain only ascii characters, it doesn't, at least in theory,
and even if it is unicode, it's not going to need more than 4 bytes
per character. char(2) through char(16) only require 4 bits to
store the length header, leaving 4 bits for encoding information.
bytea(2) through bytea(16), at least in theory, should require none.

For my own uses, I would like for bytea(16) to have no length header.
The length is constant. UUID or MD5SUM. Store the length at the head
of the table, or look up the information from the schema.

I see the complexity argument. Existing code is too heavy to change
completely. People talking about compromises such as allowing the
on disk layout to be different from the in memory layout. I wonder
whether the change could be small enough to not significantly
increase CPU, while still having significant effect. I find myself
doubting the CPU bound numbers. If even 20% data is saved, this
means 20% more RAM for caching, 20% less pages touched when
scanning, and 20% less RAM read. When people say CPU-bound, are we
sure they do not mean RAM speed bound? How do they tell the
difference between the two? RAM lookups count as CPU on most
performance counters I've ever used. RAM speed is also slower than
CPU speed, allowing for calculations between accesses assuming
that the loop allows for prefetching to be possible and accurate.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] postgresql shared buffers

2006-09-08 Thread Praveen Kumar N


hai...

I have installed postgresql from sourcecode.I would like to know
how pages are replaced in the bufferpool when we join two relations.I 
tried to trace it by editing files 
pgsql/src/backend/storage/buffer/(bufmgr.c,freelist.c).But i feel still I
am missing some information after observing extracted information abt 
buffer replacement.My input datasize is 10times more than main memory/RAM 
size.When I joined two relations,postgresql accessed both relations 
sequentially one by one and that too only once.Then how is it joining two 
relations by accessing only once? Is it storing that accessed relations 
some where other than main memory/bufferpool(Becos they cant fit into main 
memory).


So can anybdy tell me is there is ne thing I am 
missing? Is there any concept like postgresql cache similar to kernel 
cache otherthan sharedbuffers.If so how can we figure it out.


Is there any way by which postgresql is accessing database 
relations through,other than rotines in bufmgr.c nd freelist.c(I 
mean any other routines like ReadBuffer,StrategyGet etc.)


thanks in anticipation.

--
 N Praveen Kumar
 Btech-IV CSE
 IIIT,Hyd
 AP,India

Imagination is more important than knowledge...
--Albert Einstein

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 02:14:58PM +0200, Peter Eisentraut wrote:
> So "mathematically", you are right, the collation is a property of the 
> operation, not of the operands.  But semantically, the operands do 
> carry the information of what collation order they would like to be 
> compared under, and if two pieces of data with different choices meet, 
> you need an override.

Sure, but SQL COLLATE handles all that just fine. At no point is the
collation a property of the operands. At best is a property of the
source of the operands but can be overridden at any point. SQL also
covers the case where there is ambiguity, and the writer of the query
has to clarify.

Collation is hard precisly because it's not a property of the operands,
which makes it very difficult to make postgresql do it.

> Incidentally, if you buy into that, this would also neatly solve the 
> problem of how to arrange for column-specific case conversion rules, 
> which SQL does not address at all.

SQL does say that UPPER and LOWER should be handled by Unicode rules,
however the notes do mention that they should probably pay attention to
the collation and character set, since the results are dependant on
them.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas


Peter Eisentraut wrote:

The real problem is that the established method dividing up the locale
categories ignores both the technological and the linguistic reality.
In reality, all properties like lc_collate, lc_ctype, and lc_numeric
are dependent on the property "language of the text". 


I don't buy that. lc_collate, lc_ctype and lc_numeric are certainly 
related, but they're not a property of the "language of the text". For 
example, imagine an employee database for an international company. When 
a user wants to print out a sorted list of employees, the language of 
the text in the database (name of an employee) is irrelevant. A german 
user would like to see the names in different order than an 
English-speaking user.


I've seen this in practice. Also, see:
http://www.unicode.org/unicode/reports/tr10/#Common_Misperceptions
for another example.


In general, it
doesn't make sense to sort a text by Spanish rules, downcase by Turkish
rules, and embed numbers using English punctuation. Of course you can
do all that, but it's generally not very useful and might give
inconsistent results. (For extra credit: how do you do
case-insensitive sorts with inconsistent lc_collate and lc_ctype
settings?)


Sure. Don't do that, that's just silly. But I don't see how that's relevant.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut

Heikki Linnakangas wrote:
> have a default set per-database, per-table or per-column, but it's
> not a property of the actual value of a field. I think that the
> phrase "collation of a string" doesn't make sense.

The real problem is that the established method dividing up the locale 
categories ignores both the technological and the linguistic reality.  
In reality, all properties like lc_collate, lc_ctype, and lc_numeric 
are dependent on the property "language of the text".  In general, it 
doesn't make sense to sort a text by Spanish rules, downcase by Turkish 
rules, and embed numbers using English punctuation.  Of course you can 
do all that, but it's generally not very useful and might give 
inconsistent results.  (For extra credit: how do you do 
case-insensitive sorts with inconsistent lc_collate and lc_ctype 
settings?)

So "mathematically", you are right, the collation is a property of the 
operation, not of the operands.  But semantically, the operands do 
carry the information of what collation order they would like to be 
compared under, and if two pieces of data with different choices meet, 
you need an override.

Incidentally, if you buy into that, this would also neatly solve the 
problem of how to arrange for column-specific case conversion rules, 
which SQL does not address at all.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] large object regression tests

2006-09-08 Thread Markus Schaber

Hi, Jeremy,
Jeremy Drake wrote:

> I am considering, and I think that in order to get a real test of the
> large objects, I would need to load data into a large object which would
> be sufficient to be loaded into more than one block (large object blocks
> were 1 or 2K IIRC) so that the block boundary case could be tested.  Is
> there any precedent on where to grab such a large chunk of data from?

You could generate such data on the fly, as part of the test scripts.

E. G. a blob of zero bytes, blob of 0xff bytes, a blob of pseudo random
data...

Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote:
> Uhm, an ICU source tree is over 40 *megabytes*. That's almost as much as the
> rest of Postgres itself and that doesn't even include documentation. Even if
> you exclude the data and regression tests you're still talking about depending
> on the portability and correctness of over 10 megabytes of new code.

I don't understand this argument. No-one asked what size the LDAP
libraries were when we added support for them. No-one cares that
libssl/libcrypto is as large as glibc. What size the libraries are that
postgresql uses is somewhat irrelevent. It's not like we're forcing
people to install them.

> Neither is ICU available on most platforms. In any case we only need strcoll_l
> as a performance optimization, the regular interface works, it's just slow.

Can you point me to a common platform where postgresql runs and ICU doesn't?

http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/readme.html?rev=HEAD#HowToBuildSupported

The only one I can see in the buildfarm that isn't mentioned is
Unixware.

> Well equal is part of collation at least in the sense you mean. What it
> doesn't help with is things like tolower or regexp matching. These are the
> things that I would suggest you usually want to be doing on the client because
> SQL's string manipulation facilities are so poor compared to most client
> languages.

If I specify a collation where case and accents are ignored, then GROUP
BY should ignore them too, and regexps should honour that. Moving all
this to the client doesn't seem like a good move at all.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 11:58:59AM +0100, Heikki Linnakangas wrote:
> Martijn van Oosterhout wrote:
> >I think that if SQL COLLATE gets in we'll get this almost for free.
> >Collation and charset are both properties of strings. Once you've got a
> >mechanism to know the collation of a string, you just attach the
> >charset to the same place. The only difference is that changing charsets
> >requires recoding, wheres changing collation does not.
> 
> Not quite. Collation is a property of the operation that you're doing. 
> For example, if you're doing a sort, you might do it in different 
> collation depending on the user that's doing it, or it might even be 
> chosen by the user case-by-case. Of course, usually you have a default 
> set per-database, per-table or per-column, but it's not a property of 
> the actual value of a field. I think that the phrase "collation of a 
> string" doesn't make sense.

Sorry, you're quite right. The collation is the property of an executor
node, or at least that's how I thought of it while I was working on it.
By that I mean that each source (say column) has a defined value for
collation and charset (the SQL defines the rules for determining
collation, don't know about charset). At each point in the query you
can point at the charset and collation applying to that node.

However, I think my point that charset and collations could be treated
via the same mechanism is still valid. If I get time I might rework the
COLLATE spec I wrote to include charset stuff.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] A note about buildfarm ecpg-check

2006-09-08 Thread Michael Meskes

On Fri, Sep 08, 2006 at 05:38:27AM -0400, Andrew Dunstan wrote:
> We can, although I have tried to keep the order fairly logical. Are we 
> seeing so many ECPG failures that it's a major blocker?  bustard has had 
> one ECPG failure in the last 5 days. We are currently only reporting 
> failures on OpenBSD and the mipsel box.

The mipsel one is fixed too. So except for the OpenBSD strtod bug
ecpg-check seems to run cleanly on all machines. However, I plan to do
some more changes to the test suite. Nothing major, just trying to get
rid of the complex tests that test so many features at the same time and
divide them into several testcases. This might give us a red from time
to time, but hopefully nothing major.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas


Martijn van Oosterhout wrote:

I think that if SQL COLLATE gets in we'll get this almost for free.
Collation and charset are both properties of strings. Once you've got a
mechanism to know the collation of a string, you just attach the
charset to the same place. The only difference is that changing charsets
requires recoding, wheres changing collation does not.


Not quite. Collation is a property of the operation that you're doing. 
For example, if you're doing a sort, you might do it in different 
collation depending on the user that's doing it, or it might even be 
chosen by the user case-by-case. Of course, usually you have a default 
set per-database, per-table or per-column, but it's not a property of 
the actual value of a field. I think that the phrase "collation of a 
string" doesn't make sense.


--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Martijn van Oosterhout

On Fri, Sep 08, 2006 at 05:54:01AM -0400, Andrew Dunstan wrote:
> >The encoding is set per-database. Even if you need UTF-8 to encode 
> >user-supplied strings, there can still be many small ASCII fields in 
> >the database. Country code, currency code etc.
> 
> ISTM we should revisit this when we get per-column encoding.

I think that if SQL COLLATE gets in we'll get this almost for free.
Collation and charset are both properties of strings. Once you've got a
mechanism to know the collation of a string, you just attach the
charset to the same place. The only difference is that changing charsets
requires recoding, wheres changing collation does not.

I think it'd just become a special case of the Relabel node.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew Dunstan




Heikki Linnakangas wrote:

Gregory Stark wrote:

But why would you use UTF8 to encode fixed length ascii strings?


The encoding is set per-database. Even if you need UTF-8 to encode 
user-supplied strings, there can still be many small ASCII fields in 
the database. Country code, currency code etc.





ISTM we should revisit this when we get per-column encoding.

cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Heikki Linnakangas


Gregory Stark wrote:

But why would you use UTF8 to encode fixed length ascii strings?


The encoding is set per-database. Even if you need UTF-8 to encode 
user-supplied strings, there can still be many small ASCII fields in the 
database. Country code, currency code etc.


--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] A note about buildfarm ecpg-check

2006-09-08 Thread Andrew Dunstan




Tom Lane wrote:

I see that the buildfarm script seems to be running ecpg-check pretty
early in the sequence.  Considering that the ecpg tests are still far
from stable, this seems to be taking away the opportunity to learn as
much as we can from a buildfarm run.  Could we run the ecpg tests last?
  


We can, although I have tried to keep the order fairly logical. Are we 
seeing so many ECPG failures that it's a major blocker?  bustard has had 
one ECPG failure in the last 5 days. We are currently only reporting 
failures on OpenBSD and the mipsel box.


Anyway, I have made the change in CVS and buildfarm members can upgrade 
to CVS version 1.69 of run_build.pl. I guess that would mainly matter 
for the owners of the currently failing 3 boxes. If it proves to be 
necessary beyond a small time frame I will cut a new release.



An even better idea would be to teach the script about test dependencies
so that it could run test steps even when an earlier-but-unrelated test
had failed.  But I'm sure that's a lot more work.


  



Yes. It might be possible with the modularisation work I'm 
contemplating, but that's some way off, and I'm not going to hack it in 
right now.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Peter Eisentraut

Gregory Stark wrote:
> > But that won't help in the example you posted upthread, because
> > char(N) is not fixed-length.
>
> Sure it is because any sane database--certainly any sane database
> using char(N)--is in C locale anyways.

This matter is completely independent of the choice of locale and 
therefore any unilateral redefinition of sanity that you might come up 
with.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark

Gregory Stark <[EMAIL PROTECTED]> writes:

> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> 
> > Gregory Stark wrote:
> > > > But that won't help in the example you posted upthread, because
> > > > char(N) is not fixed-length.
> > >
> > > Sure it is because any sane database--certainly any sane database
> > > using char(N)--is in C locale anyways.
> > 
> > This matter is completely independent of the choice of locale and 
> > therefore any unilateral redefinition of sanity that you might come up 
> > with.
> 
> Except it isn't. If you're dealing with fixed length ascii codes from existing
> databases you interoperate with then you will have problems if you initialize
> your database in a non-C locale. Interpreting those codes in your locale will
> be do incorrect things like treat them as case insensitive or ignore spaces in
> collation, etc.

Oh, I think I misread your comment. You're saying the choice of encoding is
independent of the choice of locale.

Sure, if you're using UTF8 then how efficiently Postgres stores fixed length
data types isn't terribly relevant to you. Just as it isn't relevant if you're
storing other variable length data types.

But why would you use UTF8 to encode fixed length ascii strings?

-- 
greg

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Andrew - Supernews

On 2006-09-08, Gregory Stark <[EMAIL PROTECTED]> wrote:
>> But that won't help in the example you posted upthread, because char(N) 
>> is not fixed-length.
>
> Sure it is because any sane database--certainly any sane database using
> char(N)--is in C locale anyways.

You're confusing locale and charset.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark

Peter Eisentraut <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
> > > But that won't help in the example you posted upthread, because
> > > char(N) is not fixed-length.
> >
> > Sure it is because any sane database--certainly any sane database
> > using char(N)--is in C locale anyways.
> 
> This matter is completely independent of the choice of locale and 
> therefore any unilateral redefinition of sanity that you might come up 
> with.

Except it isn't. If you're dealing with fixed length ascii codes from existing
databases you interoperate with then you will have problems if you initialize
your database in a non-C locale. Interpreting those codes in your locale will
be do incorrect things like treat them as case insensitive or ignore spaces in
collation, etc.

-- 
greg

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Fixed length data types issue

2006-09-08 Thread Gregory Stark

Peter Eisentraut <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
> > I think we have to find a way to remove the varlena length header
> > entirely for fixed length data types since it's going to be the same
> > for every single record in the table.
> 
> But that won't help in the example you posted upthread, because char(N) 
> is not fixed-length.

Sure it is because any sane database--certainly any sane database using
char(N)--is in C locale anyways.

In any case if you disagree about that (and you're wrong) then substitute some
other data type. Defining such a data type may be part of the problem that has
to be solved here.

-- 
greg

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

87 matches

Mail list logo