Re: [HACKERS] Release notes

2006-09-09 Thread Joshua D. Drake

Bruce Momjian wrote:

I know I promised the release notes for today/Saturday, but a few other
things have taken my time.  I will have a draft posted by
Monday/Tuesday.  Sorry.


Thanks for the update.

J







--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module

2006-09-09 Thread Jeremy Kronuz
I also added other functionality to the module but I forgot to update the 
README... contrary to what it says in the README.isn, you do can cast from 
ean13 to other types, as long as the cast is valid (i.e. you can't cast a 
EAN13 of a UPC or a ISMN code to a ISBN number, for instance.)


So, the text starting at line 184 in the README.isn should say something 
like this instead:


-- note that you can only cast from ean13 to other type when the casted
-- number would be valid in the realm of the casted type;
-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
-- but these will:
select upc(ean13('0220356483481'));
select ean13(upc('220356483481'));

(this is not in my patch, I just realized about this)

Kronuz.
"Fools rush in where fools have been before" – Unknown



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] ISBN/ISSN/ISMN/EAN13 module

2006-09-09 Thread Tom Lane
"Jeremy Kronuz" <[EMAIL PROTECTED]> writes:
> Tom, I've checked the version in the cvs and I had made significant changes 
> from that version.

Hm, it sounds like I guessed wrong about which version was newer ... is
there something flaky about your machine's system clock?  The file
timestamps in the two tarballs definitely pointed the other way.

> I fixed some major bugs that prevented some ISBN numbers 
> from working. I've merged the changes made for the cvs with my own, hoping I 
> didn't miss anything that was already fixed in the cvs... I noticed you 
> changed some log messages and that you added GET_STR and GET_TEXT. indeed 
> there was that thing that took me some time to figure in both 
> ean13_cast_to_text() and isn_cast_to_text() functions, not changing to a 
> valid TEXT the return value; but I did not know about the other 
> *_cast_from_text() functions having the problem of the missing GET_STR() (I 
> was using VARDATA()).

Yeah, your to/from text functions simply did not work --- the VARDATA
hack for example presumes that the data part of a text value is
null-terminated, which it isn't.  In any case it seems pretty likely
that the content of TEXT datums will change someday to allow better
charset/collation support, so you really are better off calling
textin/textout if possible rather than assuming you know the
representation of type TEXT.  I copied the GET_STR/GET_TEXT macros
from some other contrib module ... you're free to do it differently
if you want, but in any case you have to deal with the fact that TEXT
is *not* a C-string.

As for the log message changes, there's room for debate about that, but
the way you had it struck me as far too chatty.  The use-case for weak
checking is that you have a large pile of data you don't want to
validate right now, correct?  So why would you want a warning for every
single bad datum during the import?  Also, all the fancy formatting in
that one big warning was directly against our message style guidelines.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Foreign keys

2006-09-09 Thread MAR - Secretariado Geral



Hi everybody,
 
First of all i'de like to apolagize cause my poor 
english. After this, i shuould say that i 
beleavee a year ago i brought this problem to the community but i donn't 
remember some answering about it. The problem is:
 
Every time a users misses a external refrenced key 
the PGSql raises an exception. 
Well as far as i realise if we had 5 or 10 Foreign 
keys during an Insert/Update transaction only exception 
should be raised reporting all erros/messages after last external refrenced 
field missed at one time,not one by one.
Well, in order to implement this idea we will 
need to desable the built-in refencial integrety and build it all by your self- 
all the validation (look-ups etc..) before 
insert/update If tg_op='insert' or tg_op='update'  then as people do with 
non relational Databases - all hand-made. Well, this is very hard to beleave!!! 
I must be missing something.
 
Please i'must be wrong can some one explain me what 
i'm missing?
 
Thanks in Advance 
 
Mário Reis


Re: [HACKERS] @ versus ~, redux

2006-09-09 Thread Michael Glaesemann


On Sep 10, 2006, at 5:21 , Tom Lane wrote:

In the category of "egad", I found out that the tinterval datatype  
uses

<< for "contains"!  This is simply bizarre --- whether you approve of
the inet-like notation or not, surely the arrows are pointing the  
wrong
way.  Given that tinterval is deprecated and not even documented,  
maybe

this isn't worth fixing.  Any thoughts?


I'd say leave tinterval as it is. If it's ever updated to use the  
modern time types (rather than abstime), that would be a logical time  
to change it, it seems to me.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] TODO item: GUID

2006-09-09 Thread Tom Lane
> On Sat, Sep 09, 2006 at 07:47:19PM +0200, Aleksandar Dezelin wrote:
>> Using time based GUIDs in database tables is not a good choice for
>> performance reasons because they can not be indexed properly - every newly
>> created time-based GUID is guaranteed to be larger than all previously
>> created, so RDBMS engine must re balance b-tree every time a new GUID item
>> is added to data table.

Only if you have a particularly bad b-tree implementation.  Do you also
not believe in indexing timestamp or serial columns?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] ISBN/ISSN/ISMN/EAN13 module

2006-09-09 Thread Jeremy Kronuz
>> I'm reviewing this for addition to contrib/ now.  I notice that there 
is

>> no clear license statement.  Is it OK to put the following into the
>> README file?

> Tom, yes. Also, I just put copyright in the files that contain a 
significant

> amount of code written by me. (i.e. isn.h and isn.c).

Excellent, thanks.

> Also, did you get the 'isn-1_0_beta_20060924' version from my ftp? 
'cause

> that’s the last one I updated (2006-09-24).  It's at:
> ftp://ftp.kronuz.com/pub/programming/isn-1_0_beta_20060924.tar.bz2 (just 
in

> case)

You had submitted two slightly different versions to the mailing list
awhile back.  I took the one that seemed to have later file dates and
did some fixes/editorializations on that.  Please look at what I've
just committed to PG CVS and see if you want to make any adjustments
--- if so, submit a patch through the usual pgsql-patches channel.



Tom, I've checked the version in the cvs and I had made significant changes 
from that version. I fixed some major bugs that prevented some ISBN numbers 
from working. I've merged the changes made for the cvs with my own, hoping I 
didn't miss anything that was already fixed in the cvs... I noticed you 
changed some log messages and that you added GET_STR and GET_TEXT. indeed 
there was that thing that took me some time to figure in both 
ean13_cast_to_text() and isn_cast_to_text() functions, not changing to a 
valid TEXT the return value; but I did not know about the other 
*_cast_from_text() functions having the problem of the missing GET_STR() (I 
was using VARDATA()).


As I changed so much things with the patch (I'm sending the patch to 
pgsql-patches too), Could you please check that those two functions 
(ean13_cast_to_text() and isn_cast_to_text()) to see if the use of 
GET_TEXT() would be a better option from the one I'm using? (I hope I left 
all the improved messages okay)


Regards,
Kronuz.

Patch follows next:
diff -ruN oldisn/isn.c isn/isn.c
--- oldisn/isn.c2006-09-08 23:07:52.0 -0500
+++ isn/isn.c   2006-09-09 16:58:43.098148200 -0500
@@ -31,7 +31,7 @@

enum isn_type { INVALID, ANY, EAN13, ISBN, ISMN, ISSN, UPC };

-static const char *isn_names[] = { "ISN", "ISN", "EAN13", "ISBN", "ISMN", 
"ISSN", "UPC" };
+static const char *isn_names[] = { "EAN13/UPC/ISxN", "EAN13/UPC/ISxN", 
"EAN13", "ISBN", "ISMN", "ISSN", "UPC" };


static bool g_weak = false;
static bool g_initialized = false;
@@ -43,11 +43,11 @@

/***
 **
- **Routines for ISNs.
+ **Routines for EAN13/UPC/ISxNs.
 **
 ** Note:
 **  In this code, a normalized string is one that is known to be a valid
- **  ISN number containing only digits and hyphens and with enough space
+ **  ISxN number containing only digits and hyphens and with enough space
 **  to hold the full 13 digits plus the maximum of four hyphens.
 ***/

@@ -217,7 +217,7 @@
}

/*
- * weight_checkdig -- Receives a buffer with a normalized ISN string 
number,
+ * weight_checkdig -- Receives a buffer with a normalized ISxN string 
number,

 * and the length to weight.
 *
 * Returns the weight of the number (the check digit value, 0-10)
@@ -239,7 +239,7 @@


/*
- * checkdig --- Receives a buffer with a normalized ISN string number,
+ * checkdig --- Receives a buffer with a normalized ISxN string number,
 *   and the length to check.
 *
 * Returns the check digit value (0-9)
@@ -267,8 +267,94 @@
}

/*
- * ean2isn --- Convert in-place a normalized EAN13 string to the 
corresponding

- *ISN string number. Assumes the input string is normalized.
+ * ean2isn --- Try to convert an ean13 number to a UPC/ISxN number.
+ * This doesn't verify for a valid check digit.
+ *
+ * If errorOK is false, ereport a useful error message if the ean13 is bad.
+ * If errorOK is true, just return "false" for bad input.
+ */
+static
+bool ean2isn(ean13 ean, bool errorOK, ean13 *result, enum isn_type accept)
+{
+   enum isn_type type = INVALID;
+
+   char buf[MAXEAN13LEN + 1];
+   char *firstdig, *aux;
+   unsigned digval;
+   unsigned search;
+   ean13 ret = ean;
+
+   ean >>= 1;
+   /* verify it's in the EAN13 range */
+   if(ean > UINT64CONST(9))
+   goto eantoobig;
+
+   /* convert the number */
+   search = 0;
+   firstdig = aux = buf + 13;
+   *aux = '\0';/* terminate string; aux points to last digit */
+   do {
+   digval = (unsigned)(ean % 10);  /* get the decimal 
value */
+   ean /= 10;  
/* get next digit */
+   *--aux = (char)(digval + '0');  /* convert to 
ascii and store */
+   } while(ean && search++<12);
+   while(search++<12) *--aux = '0';  

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM init

2006-09-09 Thread Lamar Owen
On Saturday 26 August 2006 22:08, Matthew T. O'Connor wrote:
> Joshua D. Drake wrote:
> > Matthew T. O'Connor wrote:
> >> script.  If we installed the datadir during the RPM install, it would
> >> still be newbie friendly and would removed initdb from start script
> >> solving that problem.
> >  initdb will not overwrite an existing installation.

> Poorly chosen words.  I meant, the problem where the start script will
> create a new data dir when it doesn't see that one exists even though
> one actually does exist it's just not available at the moment.  Either
> way, if the start scripts never created a data dir, then there is no
> problem.

If a prebuilt datadir tree were available, it would make it even easier for 
people to overwrite their existing data, as RPM does not check 
non-RPM-managed files prior to overwriting them.

This was in fact done several years ago by Red Hat, and was speedily removed.

I understand the needs from both angles; so I'll ask just a simple question: 
which is worse, annoying all the newbies who just want to get started 
quickly, or annoying the small number of people who NFS mount their datadirs?  
(I personally wouldn't in a million years trust NFS for ACID compliance; 
maybe iSCSI, but NFS?!?!).

The behavior, in my opinion, should be configurable and ON by default.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] TODO item: GUID

2006-09-09 Thread mark
On Sat, Sep 09, 2006 at 07:47:19PM +0200, Aleksandar Dezelin wrote:
> Hello,
> you just have to make random 128 bits and set version bits. And that's all.

> This is the way this data type is implemented in Mono
> (http://svn.myrealbox.com/source/trunk/mcs/class/corlib/System/Guid.cs).
> 
> Using time based GUIDs in database tables is not a good choice for
> performance reasons because they can not be indexed properly - every newly
> created time-based GUID is guaranteed to be larger than all previously
> created, so RDBMS engine must re balance b-tree every time a new GUID item
> is added to data table.
> 
> Sorry, for sending this message three times - problem with Gmail.

Depends how badly you want to skew the odds that a newly generated ID
is actually new, and how much you trust the distribution of your random
number generator.

There are several ways to generate a UUID - and I think it is wrong to
say that only one is the right way. Different applications choose
different generation routines. I *like* sorting by time, as it allows
the UUID to be used similar to sequence, leaving older, lesser accessed
UUIDs in the past. You and Mono might prefer something else. Some choose
random numbers over the MAC address as well - better? Depends on how big
your system is.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Move completed ...

2006-09-09 Thread Marc G. Fournier


All the servers have been moved, and everything *appears* to have come up 
fine from what I can tell ...


If anyone notices any problems, please let me know ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Release notes

2006-09-09 Thread Bruce Momjian
I know I promised the release notes for today/Saturday, but a few other
things have taken my time.  I will have a draft posted by
Monday/Tuesday.  Sorry.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] large object regression tests

2006-09-09 Thread Lamar Owen
On Tuesday 05 September 2006 02:59, Jeremy Drake wrote:
> I am considering, and I think that in order to get a real test of the
> large objects, I would need to load data into a large object which would
> be sufficient to be loaded into more than one block (large object blocks
> were 1 or 2K IIRC) so that the block boundary case could be tested.  Is
> there any precedent on where to grab such a large chunk of data from?  I
> was thinking about using an excerpt from a public domain text such as Moby
> Dick, but on second thought binary data may be better to test things with.

A 5 or 6 megapixel JPEG image.  Maybe a photograph of an elephant.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Fixed length data types issue

2006-09-09 Thread Gregory Stark

Gregory Stark <[EMAIL PROTECTED]> writes:

> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > The performance costs of that seem pretty daunting, however, especially when
> > you reflect that simply stepping over a varlena field would require
> > memcpy'ing its length word to someplace.
> 
> I think if you give up on disk and in-memory representations being the same
> then there are ways of finessing that. For example you could have all the
> lengths together in the header prior to the variable length fields.

Hm, this might have nice cache effects when reading in a tuple too. Since all
the lengths would likely fit in a single cache line and probably the same
cache line as the null bitmap even it means you can find all the offsets
without actually having to bring in the rest of the tuple into the processor.

I don't think that alone would be enough to outweigh the costs of having to
convert to an in-memory representation though. Even if that was still just a
pointer to the buffer memory in the simple case.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] log_duration is redundant, no?

2006-09-09 Thread Martijn van Oosterhout
On Sat, Sep 09, 2006 at 06:33:10PM +0200, Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > #2, I think, but I am confused if you don't know the query, how
> > > valuable is the log_duration.
> >
> > Statistics?
> 
> I doubt that there is a statistical merit to calculating aggregate 
> values over the duration of an anonymous set of queries.

Eh? Sure there is, if you want totals per database, number of queries
and where they were spread over the day. A perfectly reasonable example
was given in this thread.

If you're a service provider you don't care about what queries the
users are doing, just how many resources they're taking.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] @ versus ~, redux

2006-09-09 Thread Tom Lane
[ just when you thought it was safe to go back in the water ... ]

"Zeugswetter Andreas DCP SD" <[EMAIL PROTECTED]> writes:
>> But maybe we should just stick with @> and <@ as per the ltree precedent,
>> and not worry about leaving room for strict inclusion tests.

> +1

OK, nobody objected to that, so I guess we have consensus on the names.
I started looking through the code to see exactly what will be affected,
and here's what I find:

contrib/cube:
a @ b   contains
a ~ b   is contained by
contrib/intarray:
a @ b   contains
a ~ b   is contained by
contrib/hstore:
a @ b   contains
a ~ b   is contained by
contrib/ltree:
a @> b  ancestor or equal
a <@ b  descendant or equal
ltree also uses ~ and @ for "tree satisfies query", which is pretty close to
the core string types' usage of ~
ltree appears not to need changes

contrib/seg:
a @ b   contains
a ~ b   is contained by (misdocumented!)
contrib/tsearch2:
a @ b   contains
a ~ b   is contained by

Core operators:

 @(anyarray,anyarray) | arraycontains
 @(box,box)   | box_contained
 @(circle,circle) | circle_contained
 @(polygon,polygon)   | poly_contained
 ~(anyarray,anyarray) | arraycontained
 ~(box,box)   | box_contain
 ~(circle,circle) | circle_contain
 ~(polygon,polygon)   | poly_contain

I was pleased to find out that every one of the above ops is non-strict
containment (ie, equality yields 'true').  So at least we don't have
inconsistency on that point.

I also realized that there are several more core operators using @ and ~
for containment-like comparisons:

 @(lseg,box)  | on_sb
 @(lseg,line) | on_sl
 @(point,box) | on_pb
 @(point,circle)  | pt_contained_circle
 @(point,line)| on_pl
 @(point,lseg)| on_ps
 @(point,path)| on_ppath
 @(point,polygon) | pt_contained_poly
 ~(aclitem[],aclitem) | aclcontains
 ~(circle,point)  | circle_contain_pt
 ~(path,point)| path_contain_pt
 ~(polygon,point) | poly_contain_pt

Since these are cross-data-type comparisons, they are arguably not quite
the same animal as the operators listed earlier, which all compare two
objects of the same datatype.  I'm inclined to think that these ops
should be renamed too, but does anyone want to argue differently?

In the category of "egad", I found out that the tinterval datatype uses
<< for "contains"!  This is simply bizarre --- whether you approve of
the inet-like notation or not, surely the arrows are pointing the wrong
way.  Given that tinterval is deprecated and not even documented, maybe
this isn't worth fixing.  Any thoughts?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] TODO item: GUID

2006-09-09 Thread Aleksandar Dezelin
Hello, you just have to make random 128 bits and set version bits. And that's all. This is the way this data type is implemented in Mono (

http://svn.myrealbox.com/source/trunk/mcs/class/corlib/System/Guid.cs). Using
time based GUIDs in database tables is not a good choice for
performance reasons because they can not be indexed properly - every
newly created time-based GUID is guaranteed to be larger than all
previously created, so RDBMS engine must re balance b-tree every time a
new GUID item is added to data table.
Sorry, for sending this message three times - problem with Gmail.Cheers,Aleksandar Dezelin


Re: [HACKERS] log_duration is redundant, no?

2006-09-09 Thread David Fetter
On Sat, Sep 09, 2006 at 06:33:10PM +0200, Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > > #2, I think, but I am confused if you don't know the query, how
> > > valuable is the log_duration.
> >
> > Statistics?
> 
> I doubt that there is a statistical merit to calculating aggregate
> values over the duration of an anonymous set of queries.

"How heavily loaded is the server" is a perfectly legitimate metric to
have available, especially when more detailed, i.e. more invasive
probes could bring it down.

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal for GUID datatype

2006-09-09 Thread Stephan Szabo
On Sat, 9 Sep 2006, Jan de Visser wrote:

> On Saturday 09 September 2006 01:33, [EMAIL PROTECTED] wrote:
> > I don't think so. If it isn't 128 bits - and you want to fit it into
> > 128 bits, it means padding. Where should the padding go? As application
> > specific, it is up to the application to convert.
>
> I am not saying that. I am just saying that you shouldn't limit yourself to
> any particular input formats.

I'd wonder if it'd be better to have a set of literal formats and "input"
functions like to_guid(text, text) for more complicated cases. The broader
we make the literal format, the harder it is to determine if the input
actually is what was intended. For example, did the user mean to put that
ipv6 address in this guid column and what about this other ipv6 address
looking thing which is abbreviated, are we putting in what the user
expects?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] log_duration is redundant, no?

2006-09-09 Thread Peter Eisentraut
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > #2, I think, but I am confused if you don't know the query, how
> > valuable is the log_duration.
>
> Statistics?

I doubt that there is a statistical merit to calculating aggregate 
values over the duration of an anonymous set of queries.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Schedualed Server Downtime

2006-09-09 Thread Marc G. Fournier


Should have sent this out last night, apologies ... in order to move our 
servers to a new rack (so that we can add more servers), the servers are 
going down this afternoon for a couple of hours.


Will post as soon as they are back up again ...




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixed length data types issue

2006-09-09 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> The performance costs of that seem pretty daunting, however, especially when
> you reflect that simply stepping over a varlena field would require
> memcpy'ing its length word to someplace.

I think if you give up on disk and in-memory representations being the same
then there are ways of finessing that. For example you could have all the
lengths together in the header prior to the variable length fields.

In a separate unrelated thought, if we bring back the idea of having logical
and physical field orders be distinct then we could also have the initial
table creation sort the fields to minimize padding. It won't always be perfect
but sometimes it could help quite a bit. It also wouldn't help much if you
start altering the table afterward but even then the next time you pg_dump and
reload you'll get a more efficient layout.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal for GUID datatype

2006-09-09 Thread mark
On Sat, Sep 09, 2006 at 08:29:16AM -0400, [EMAIL PROTECTED] wrote:
> On Sat, Sep 09, 2006 at 07:06:23AM -0400, Jan de Visser wrote:
> > On Saturday 09 September 2006 01:33, [EMAIL PROTECTED] wrote:
> > > I don't think so. If it isn't 128 bits - and you want to fit it into
> > > 128 bits, it means padding. Where should the padding go? As application
> > > specific, it is up to the application to convert.
> > I am not saying that. I am just saying that you shouldn't limit yourself to 
> > any particular input formats. I understand that the example I gave is not a 
> > full GUID. As I said, I use that result as a base for a 128 bit GUID.
> > Aargh.
> ...
> It ignores punctuation - great - so now what? You provided a number that
> used an odd number of hexadecimal characters, that was less than 128-bits
> worth of data.
> Exactly which 128-bit value would you expect it to represent? Where does
> the padding go? The beginning? The end? Around the punctuation? When you
> say you use it as a base - what do you mean, and may your intention match
> the intention of anybody else who wishes to stuff a non-UUID into a UUID?

I'll be more specific, because I obviously wasn't getting through before.

According to the JavaDoc, it is:

32-bit intprocess identifier
64-bit inttime identiier
16-bit intcounter

If non-standard, and possibly allowing for overlap with a standard UUID,
I believe this could map to a UUID.

The 32-bit value would fit where a random number or MAC address is
normally represented as a 48-bit value.  Some 16-bit prefix, possibly
all-zero might be used. The 64-bit value would fit where the time
value is normally put into the UUID. The counter could be where the
counter is normally put. A somewhat compatible version number could
be put in the version number part.

You see now? Punctuation is actually *significant* in this case, but
only the Java RMI libraries know what the value represents, and how it
might be mapped to UUID. If you look at your original value, you
should note the odd number of hexadecimal characters. This probably
means leading zeroes were stripped, and they would need to be restored
in the correct place. This is intimate knowledge of a very specific
UID type, with a specific number of bits, and automatic conversion to
a different UUID, with a greater number of bits, where the bits are
in the wrong order and not properly padded if interpretted raw.

Make sense now?

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal for GUID datatype

2006-09-09 Thread mark
On Sat, Sep 09, 2006 at 07:06:23AM -0400, Jan de Visser wrote:
> On Saturday 09 September 2006 01:33, [EMAIL PROTECTED] wrote:
> > I don't think so. If it isn't 128 bits - and you want to fit it into
> > 128 bits, it means padding. Where should the padding go? As application
> > specific, it is up to the application to convert.
> I am not saying that. I am just saying that you shouldn't limit yourself to 
> any particular input formats. I understand that the example I gave is not a 
> full GUID. As I said, I use that result as a base for a 128 bit GUID.
> Aargh.

You say Aargh - but you still haven't explained what the parser would do.

It ignores punctuation - great - so now what? You provided a number that
used an odd number of hexadecimal characters, that was less than 128-bits
worth of data.

Exactly which 128-bit value would you expect it to represent? Where does
the padding go? The beginning? The end? Around the punctuation? When you
say you use it as a base - what do you mean, and may your intention match
the intention of anybody else who wishes to stuff a non-UUID into a UUID?

I'm not trying to be difficult - you never answered this, and it is a
decision that the software would need to make, if it were to do as you
suggest. I think it is improper, and that it is up to the application
to pad the non-128 bit value to 128-bits *before* passing it in -
although I still think this is invalid. What "version" of UUID are you
using?

Please don't answer again that you aren't saying something. Three
answers like this hasn't got us anywhere. Just tell me what 128-bit
value you think your original suggestion would represent (where should
the padding go?), and then justify why it should be a UUID at all.

Thanks,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Proposal for GUID datatype

2006-09-09 Thread Jan de Visser
On Saturday 09 September 2006 01:33, [EMAIL PROTECTED] wrote:
> I don't think so. If it isn't 128 bits - and you want to fit it into
> 128 bits, it means padding. Where should the padding go? As application
> specific, it is up to the application to convert.

I am not saying that. I am just saying that you shouldn't limit yourself to 
any particular input formats. I understand that the example I gave is not a 
full GUID. As I said, I use that result as a base for a 128 bit GUID.

Aargh.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match