Re: [HACKERS] Out parameters handling

2009-03-09 Thread Ryan Bradetich
On Sun, Mar 8, 2009 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ryan Bradetich rbradet...@gmail.com writes:
 This is one of the things I wanted to start looking at for 8.5.
 My idea was to optionally use : or @ (not sure which is more popular) to
 specify this token is only a variable.

 This whole line of thought is really a terrible idea IMHO.  plpgsql is
 supposed to follow Oracle's pl/sql syntax, not invent random syntax of
 its own.  I believe that 80% of the problems here are occurring because
 we used a crude substitution method that got the priorities backwards
 from the way Oracle does it.

Fair Enough.   I just hope what every solution the community decides upon
solves this problem.  It is a very annoying problem to track down and I tend
to get even more agitated when I figure out this is the problem.

I do not want to distract from the release efforts, so I will withhold further
comments until the 8.5 development cycle.

Thanks,

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Out parameters handling

2009-03-08 Thread Ryan Bradetich
Hello Robert,

I have been bitten by this problem many times as well.

 I wonder whether it would be possible to make PL/pgsql take :foo to
 mean the parameter named foo, and then provide an option to make that
 THE ONLY WAY to refer to the parameter foo.  For
 backward-compatibility, and compatibility with (ahem) other database
 products, we probably don't want to remove the option to have foo
 mean... any damn thing named foo you can put your hands on.  But it
 would be nice to at least have the option of disabling that behavior
 when compatibility is not an issue, and correctness is.

This is one of the things I wanted to start looking at for 8.5.
My idea was to optionally use : or @ (not sure which is more popular) to
specify this token is only a variable.  Do not try to match it to columns or
other database object.   If the variable did not start with : or @ then normal
rules would apply for backwards compatibility.

No idea how feasible this plan is, I was just hoping to find a way to solve this
problem.

Thanks,

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.

2009-01-08 Thread Ryan Bradetich
Bruce,

Thanks for adding this to the TODO.
I am planning on continuing to work on this patch after 8.4 releases.
I know we are in feature freeze and I do not want to sidetrack the
release process.

Thanks!

- Ryan


On Thu, Jan 8, 2009 at 8:41 AM, Bruce Momjian br...@momjian.us wrote:

 Added to TODO:

Reduce data row alignment requirements on some 64-bit systems

* http://archives.postgresql.org/pgsql-hackers/2008-10/msg00369.php


 ---

 Ryan Bradetich wrote:
 Hello all,

 Here is a proof-of-concept patch for reducing the alignment
 requirement for heap tuples on 64-bit systems.
 This patch passes the regression tests and a couple of other data sets
 I have thrown at it.

 I am hoping to get some early feedback on this patch so I have time to
 make any corrections, improvements,
 etc before the November commit-fest deadline.

 There are two visible savings in the system catalogs using this patch
 (size was gathered using \S+):

 Catalog  Pre-Patch Size  After Patch Size

 pg_depend 312 kB  296 kB
 pg_description160 kB  152 kB


 One thing I am considering, but I am not sure if it is worth the code
 uglification, is to wrap the majority of these
 checks in #ifdefs so they only are used on 64-bit systems.   This
 would completely eliminate the impact for this
 patch on 32-bit systems, which would not gain any benefit from this patch.

 Feedback welcome!

 Thanks,

 - Ryan

 [ Attachment, skipping... ]


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

 --
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.

2008-10-09 Thread Ryan Bradetich
Hello Zdenek,

On Wed, Oct 8, 2008 at 10:59 PM, Zdenek Kotala [EMAIL PROTECTED] wrote:
 Just a quick look. At first point. Your change introduces new page layout
 version. Which is not acceptable from my point of view for 8.4 (it add

I would like to see this patch (or some variant) go in if possible.
Since the inplace
upgrades a concern to you, is there anything I can do to help with the inplace
upgrades to help offset the disruption this patch causes you?

 another complexity to inplace upgrade). And I guest that it maybe works fine
 on 64bits x86 but it will fail on SPARC and other machine which requires
 aligned data.

Did I miss something?  My intention was to keep the data aligned so it
should work
on any platform.   The patch checks the user-defined data to see if
any column requires
the double storage type.  If the double storage type is required, it
uses the MAXALIGN()
macro which preserves the alignment for 64-bit data types.   If no
columns require the
double storage type, then the data will be INTALIGN() which still
preserves the alignment
requirements.  If I have a complete mis-understanding of this issue,
please explain it
to me and I will either fix it or withdraw the patch.

Thanks for your feedback!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.

2008-10-09 Thread Ryan Bradetich
Hello Zdenek,

On Thu, Oct 9, 2008 at 12:38 AM, Zdenek Kotala [EMAIL PROTECTED] wrote:
 Ryan Bradetich napsal(a):
 On Wed, Oct 8, 2008 at 10:59 PM, Zdenek Kotala [EMAIL PROTECTED]
 wrote:

 I would like to see this patch (or some variant) go in if possible.
 Since the inplace
 upgrades a concern to you, is there anything I can do to help with the
 inplace
 upgrades to help offset the disruption this patch causes you?

 Yaah, wait until 8.5 :-). However, currently there is no clear consensus
 which upgrade method is best. I hope that It will clear after Prato
 developers meeting. Until this meeting I cannot say more.

Heh, understood. :)

I believe the proposed CRC patch also affects the heap page layout (adds
the pd_checksum field to the PageHeaderData). [1]   Just pointing out another
patch that could affect you as well.   My offer to help still stands.

 I overlooked 'd' test. Your idea seems to me reasonable. Maybe, you could
 test 'd' alignment only for NOT NULL values.

Funny you should mention this.  I had just started looking into this
optimization
to see if I could convince myself it would be safe.  My initial
conclusion indicates
it would be safe, but I have not tested nor verified that yet.  Having
an independent
proposal for this boosts my confidence even more.   Thanks!

 The problem there is add_item which it is used for indexes as well and they
 has  IndexTupleHeader structure. I'm not convenience about idea has two
 different alignment for items on page.

Just to clarify, this patch only affects heap storage when (i.e. the
is_heap flag is
set).   I have not had a chance to analyze or see if I can reduce
other storage types
yet.

 I guess another problem is with MAX_TUPLE_CHUNK_SIZE which uses MAXALIGN for
 computing. It seems to me that toast chunk could waste a space anyway.

 And of course you should bump page layout version.

Thanks.  I will do.

 I also suggest create function/macro to compute hoff and replace code with
 this function/macro.

Great.  That is some of the feedback I was looking for.  I did not
implement it yet,
because I wanted to see if the basic implementation was feasible first.

Thanks again for your feedback!

- Ryan


[1] http://archives.postgresql.org/pgsql-hackers/2008-10/msg00070.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.

2008-10-09 Thread Ryan Bradetich
Hello ITAGAKI-san,

I apologize for not replying earlier, I needed to head out to work.

On Thu, Oct 9, 2008 at 5:01 AM, ITAGAKI Takahiro
[EMAIL PROTECTED] wrote:
 Ryan Bradetich [EMAIL PROTECTED] wrote:
 Here is a proof-of-concept patch for reducing the alignment
 requirement for heap tuples on 64-bit systems.

 pg_depend 312 kB  296 kB
 pg_description160 kB  152 kB

 5 percent of gain seems reasonable for me.

I agree.   I am seeing ~10% gain in some of my tables where
the tuple size was 44 bytes but due to the alignment was being
padded out to 48 bytes.

 Is it possible to apply your improvement for indexes?
 I think there are more benefits for small index entries
 that size are often 12 or 20 bytes.

I am not sure if this improvement will affect indexes or not yet.  My
current implementation specifically excludes indexes and only affects
heap tuples.   Now that I have a better understanding of the disk
structure for heap tuples, I am planning to look at the index layout in the
near future to see if there is some possible gain there as well.

 This would completely eliminate the impact for this
 patch on 32-bit systems, which would not gain any benefit from this patch.
 No! There are *also* benefits even on 32-bit systems, because some
 of them have 8-byte alignment. (for example, 32-bit Windows)

Excellent!  I was not aware of that.   Thanks for this information.

Any ideas on what I should use for this check?
I was thinking of using #if SIZEOF_SIZE_T = 8
Guess I should search around for standard solution for this problem.

 BTW, there might be a small mistabke on the following lines in patch.
 They do the same things ;-)

 ***
 *** 1019,1025  toast_flatten_tuple_attribute(Datum value,
new_len += BITMAPLEN(numAttrs);
if (olddata-t_infomask  HEAP_HASOID)
new_len += sizeof(Oid);
 !   new_len = MAXALIGN(new_len);
Assert(new_len == olddata-t_hoff);
new_data_len = heap_compute_data_size(tupleDesc,
   
toast_values, toast_isnull);
 --- 1025,1034 
new_len += BITMAPLEN(numAttrs);
if (olddata-t_infomask  HEAP_HASOID)
new_len += sizeof(Oid);
 !   if (olddata-t_infomask  HEAP_INTALIGN)
 !   new_len = MAXALIGN(new_len);
 !   else
 !   new_len = MAXALIGN(new_len);
Assert(new_len == olddata-t_hoff);
new_data_len = heap_compute_data_size(tupleDesc,
   
toast_values, toast_isnull);

Thanks for that catch!  I have this fixed in my local GIT tree now.

Thanks for your feedback and review!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.

2008-10-08 Thread Ryan Bradetich
Hello all,

Here is a proof-of-concept patch for reducing the alignment
requirement for heap tuples on 64-bit systems.
This patch passes the regression tests and a couple of other data sets
I have thrown at it.

I am hoping to get some early feedback on this patch so I have time to
make any corrections, improvements,
etc before the November commit-fest deadline.

There are two visible savings in the system catalogs using this patch
(size was gathered using \S+):

Catalog  Pre-Patch Size  After Patch Size

pg_depend 312 kB  296 kB
pg_description160 kB  152 kB


One thing I am considering, but I am not sure if it is worth the code
uglification, is to wrap the majority of these
checks in #ifdefs so they only are used on 64-bit systems.   This
would completely eliminate the impact for this
patch on 32-bit systems, which would not gain any benefit from this patch.

Feedback welcome!

Thanks,

- Ryan


heap-tuple-v1.0.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Fwd: [HACKERS] [Patch Review] TRUNCATE Permission

2008-09-06 Thread Ryan Bradetich
Hello Robert,

On Fri, Sep 5, 2008 at 8:13 PM, Robert Haas [EMAIL PROTECTED] wrote:
 Updated patch attached, based on comments from Ryan Bradetich and Tom
 Lane, and sync'd to latest CVS version.

Thanks for the update.   I am out of town until tomorrow evening.
I will re-review this patch when I get back if it has not been
committed by then.

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Review] Tests citext casts by David Wheeler.

2008-09-04 Thread Ryan Bradetich
Hello all,

Here is my review of the Test citext casts written by David Wheeler:
http://archives.postgresql.org/message-id/[EMAIL PROTECTED]


1. The patch applies cleanly to the latest GIT repository.

2. The citext type installs, uninstalls, and re-installs cleanly.

3. The coding style is mostly consistent with the existing code.
The only coding style difference I noticed was introduced in this patch:

In the citext.sql.in file the following functions are created using the
non-dollar quoting syntax:
* regex_matches
* regexp_replace
* regexp_split_to_array
* regexp_split_to table
* strpos

In the citext.sql.in file the following functions are created using the
dollar quoting syntax:
 * replay
 * split_part
 * translate

I do not have a strong preference either way and I do not even care if
they are consistent.  I am interested to see if there was a reason for
using both syntaxes for these functions.

4. The regression tests successfully pass when PostgreSQL is built with
--with-libxml.   They fail when the PostgreSQL is built without
--with-libxml.

Since the default PostgreSQL configuration does not include --with-libxml
and is not run-time detected when the libxml2 libraries are present on
the system I would recommend adding an additional expected output
file (citext_1.out) that covers the conditions when PostgreSQL is compiled
without --with-libxml.

As an experiment, I was able to add the citext_1.out and the
regression tests
passed with and without the --with-libxml option.

Review of the additional regression tests show they provide coverage of the
new casts and functions added with this patch.

Overall I think the patch looks good.   After reviewing the patch, I
played with
citext for an hour or so and I did not encounter any bugs or other surprises.

Thanks,

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-02 Thread Ryan Bradetich
Hello all,

On Mon, Aug 11, 2008 at 2:24 AM, Gregory Stark [EMAIL PROTECTED] wrote:

 Ryan Bradetich [EMAIL PROTECTED] writes:

  After a cursory glance at the HeapTupleHeaderData structure, it appears it
  could be aligned with INTALIGN instead of MAXALIGN. The one structure I was
  worried about was the 6 byte t_ctid structure. The comments in
  src/include/storage/itemptr.h file indicate the ItemPointerData structure is
  composed of 3 int16 fields. So everthing in the HeapTupleHeaderData
  structure is 32-bits or less.

 Sure, but the tuple itself could contain something with double alignment. If
 you have a bigint or double in the tuple then heap_form_tuple needs to know
 where to put it so it ends up at right alignment.

For fun, I looked around in heap_form_tuple() today to see how big of a job this
change would be.   It did not seem very hard to implement.  I know there are
probably several other places I missed with this patch, but this patch does pass
the regression tests and simple testing I threw at it (including ALTER TABLE
ADD COLUMN, etc).

The patch concept is fairly simple.

   1.  Add a new boolean local variable: require_max_align
(initialized to false).
   2.  When we loop through the attributes, check to see if any
attributes require
double alignment.   If they do, set require_max_align = true.
   3. If the tuple has OIDS (double alignment requirement), set
require_max_align = true.
   4. If require_max_align = true, use the MAXALIGN macro; otherwise
use the INTALIGN macro.

A check of the physical storage for my previous test case does show
the tuple length to be 44 bytes
instead of the previous 48 bytes, so the patch does appear to work.  I
have not run any performance
tests using this patch, etc.

Attached is my proof-of-concept patch for this idea in case someone
wants to inform me of other
problems this patch will introduce :)

Thanks,

- Ryan
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
***
*** 863,868  heap_form_tuple(TupleDesc tupleDescriptor,
--- 863,869 
  data_len;
  	int			hoff;
  	bool		hasnull = false;
+ 	bool		require_max_align = false;
  	Form_pg_attribute *att = tupleDescriptor-attrs;
  	int			numberOfAttributes = tupleDescriptor-natts;
  	int			i;
***
*** 886,891  heap_form_tuple(TupleDesc tupleDescriptor,
--- 887,895 
  	 */
  	for (i = 0; i  numberOfAttributes; i++)
  	{
+ 		if (att[i]-attalign == 'd')
+ 			require_max_align = true;
+ 
  		if (isnull[i])
  			hasnull = true;
  		else if (att[i]-attlen == -1 
***
*** 907,916  heap_form_tuple(TupleDesc tupleDescriptor,
  	if (hasnull)
  		len += BITMAPLEN(numberOfAttributes);
  
! 	if (tupleDescriptor-tdhasoid)
  		len += sizeof(Oid);
  
! 	hoff = len = MAXALIGN(len); /* align user data safely */
  
  	data_len = heap_compute_data_size(tupleDescriptor, values, isnull);
  
--- 911,925 
  	if (hasnull)
  		len += BITMAPLEN(numberOfAttributes);
  
! 	if (tupleDescriptor-tdhasoid) {
  		len += sizeof(Oid);
+ 		require_max_align = true;
+ 	}
  
! 	if (require_max_align == true)
! 		hoff = len = MAXALIGN(len); /* align user data safely */
! 	else
! 		hoff = len = INTALIGN(len); /* align user data safely */
  
  	data_len = heap_compute_data_size(tupleDescriptor, values, isnull);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] statement_cost_limit for regression testing.

2008-09-02 Thread Ryan Bradetich
Hello Simon,

On Mon, Sep 1, 2008 at 9:35 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 On Thu, 2008-08-28 at 19:57 -0700, Ryan Bradetich wrote:
 I just wanted to throw out another possible use for this GUC.   There
 maybe a better way to
 solve this problem, but I believe this patch would be useful for
 regression testing.
 It's possible to do this using planner hooks, so no patch needed.

Excellent!

Time for me to read up on using these planner hooks.

Thanks!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-02 Thread Ryan Bradetich
Hello Martijn,

 You need to arrange testing on an architechture that has strict
 alignment reuiqrements. For example i386 doesn't care about alignment
 at all and will anything from anywhere, with performance degradation.

 Other architechtures will simply throw exceptions, that's the smoke
 test.

From other comments in this thread, I still have quite a bit of work ahead
of me :)  If I can accomplish this, then I will run it on a different
architecture.
I have access to a parisc-linux and a powerpc system.

 Also, what's the performance cost?

Not sure yet.  My goal would be 0 impact on 32-bit system and something small
on 64-bit systems that would hopefully be overcome by a performance gain in IO.

Thanks for your feedback.

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-02 Thread Ryan Bradetich
Hello Tom,

On Tue, Sep 2, 2008 at 8:07 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Ryan Bradetich [EMAIL PROTECTED] writes:
 The patch concept is fairly simple.

1.  Add a new boolean local variable: require_max_align
 (initialized to false).

 This really can't possibly work, because you'd need to propagate
 knowledge of the tuple's alignment requirement all over the place.
 In particular, how would code *reading* the tuple know where the
 data starts?  Also, I don't think you get (very much of) the actual
 benefit unless the code that inserts tuples into disk pages knows
 to do something different in the int-align case.

I  figured there was something wrong with my proof-of-concept patch
because it was too easy and it passed regression tests on the first
attempt.  I just didn't know what was wrong.  After reading your comment,
I went back and looked at my test data and I am embarrassed to admit that
I misread the data :(   Sure the tuple length is 44 bytes, but the data
is still aligned on a 8 byte boundary.   So this poc patch effectively
does nothing :(

As Greg mentioned, the code reading the user data would know where
to start because of the t_hoff field.  In my first email in this thread, I had
looked at the requirements for the HeapTupleHeaderData to be MAXALIGN.
I could not find any reason the HeapTupleHeaderData needed to be MAXALIGN,
but Greg pointed out that the t_hoff filed is required to be MAXALIGN.
  I can only
see two reasons why the t_hoff field is required to be MAXALIGN:
   1. If the tuple has oids (OIDS require MAXALIGN).
   2. If the tuple has any column requiring MAXALIGN.

If neither of the above conditions are true, then it seems to me that
t_hoff could
safely be INTALIGN.   I was working under the mistaken assumption that if I
addressed this issue in heap_form_tuple, that it would be written down on disk
this way.  Obviously I was wrong and still have a lot of code reading and
experimentation to do :)

 It's conceivable that we could make this work if we wanted to dedicate
 an infomask bit to showing whether the tuple needs int or double
 alignment.  I don't really think it's worth the trouble though.

I am not sure what an infomask bit is yet, but is it safe to assume
(in theory) that if I get the
page written correctly, that the tuple reading code would work because
of the t_hoff field?

Thanks for your feedback!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-02 Thread Ryan Bradetich
Hello Greg,

On Tue, Sep 2, 2008 at 8:30 AM, Gregory Stark [EMAIL PROTECTED] wrote:
 Ryan Bradetich [EMAIL PROTECTED] writes:

4. If require_max_align = true, use the MAXALIGN macro; otherwise
 use the INTALIGN macro.

 Huh, I didn't think of doing it like that.

 But I'm confused. You seem to be tweaking the alignment of the data inside the
 tuple? After the tuple header? I thought we had only one byte of wasted space
 in there and that's used by the null bitmap. So unless you have more than 8
 columns and some of them are null I wouldn't expect you to save any space. If
 you do then I guess you could save 4 bytes if the null bitmap is 2-5 bytes
 (mod 8) long.

I was not trying to tweak the alignment of the data inside the tuple
header, I was
trying to adjust the alignment of t_hoff so it would not have the requirement of
MAXALIGN.  I believe my proof-of-concept patch was bad and I need to spend some
more time on it tonight with the new knowledge I gained from this email thread.

 I thought the goal was to save space by aligning the tuples on the page more
 densely. That seems to me to be more fruitful as about half the tuples will
 save four bytes even on tables with small or missing null bitmaps.

That is the goal.  Basically my test data is 44 bytes in length for each tuple.
I have no data in the tuple that is required to be MAXALIGN, but since
t_hoff has
the MAXALIGN requirement I throw away 4 bytes for each tuple (on a
64-bit machine).
This proof-of-concept patch is to help me understand the PostgreSQL code and to
see if I can recover those 4 bytes per tuple.

Thanks again for your feedback!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-02 Thread Ryan Bradetich
Hello Greg,

On Tue, Sep 2, 2008 at 9:54 AM, Gregory Stark [EMAIL PROTECTED] wrote:
 Ryan Bradetich [EMAIL PROTECTED] writes:
 The structure on the page is to have a bunch of tuples stored one after the
 other. Each tuple is maxaligned after the previous (actually before the
 previous since they start from the end but we'll ignore that -- the point is
 that they all *start* at a maxaligned position on the page).

 Within each tuple there's a header which is 23 bytes long. Then an optional
 null bitmap. In the header is t_hoff which points to the first byte of actual
 data -- effectively specifying the length of the null bitmap. t_hoff is
 maxaligned as well.

Thanks for your persistence in explaining this issue to me.  I think I
am starting
to understand the issues now and I appreciate the help.

I think I see the error in my thinking.   You and Tom are saying the tuple is
MAXALIGN on the page as well.  My assumption was that since t_hoff had to be
MAXALIGN, the tuple on the page was being MAXALIGN by default.  If I
was able to
INTALIGNt_hoff then the tuple on the page would be free to be INTALIGN as well.

Tom gave me the next place to go look later tonight (PageAddItem).

 So by intaligning t_hoff you're adjusting where the data within the tuple fits
 by making the null bitmap 4 bytes shorter. That seems worth a few cycles but
 isn't going to affect every table. It'll only affect tables that have a null
 bitmap between 2-5 bytes (mod 8), ie, with 9-40 fields at least one of which
 is null. Tables with 9 fields are likely to be relatively wide which means
 saving 4 bytes isn't that much of a gain percentagewise. (Though they might be
 mostly null)

 The other 4 bytes you could save is by packing the whole tuples themselves
 more closely on the page. That happens when the item pointer is added and
 pointed to the tuple. To do that heap_form_tuple would have to return data
 back to the caller about the alignment needed to whomever is going to copy it
 into the page. AFAICS that could be done in the HeapTuple structure itself
 rather than in the tuple though which doesn't seem so bad. Tom may be seeing
 something I'm not though.

 Incidentally, can't you set the alignment during the compute_data_size() call
 instead of doing an extra pass through the attributes? Probably doesn't change
 the cost though.

Actually I did not do an extra pass through the attributes.  I added
the check in an
existing pass through the attributes.   I did look at adding this check to the
heap_compute_data_size(), but took the easy path to see if my patch
worked.  I will
update my patch to do the alignment check in heap_compute_data_size().

Thanks!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question regarding the database page layout.

2008-09-02 Thread Ryan Bradetich
Hello Tom,

 Well, as Greg pointed out, setting t_hoff correctly should be sufficient
 for squeezing out useless padding between the tuple header and the tuple
 data.  The real objection here is that that's leaving most of the
 possible gain still on the table.  The tuple *as a whole* (header and
 data both) is still going to get maxaligned when it's plopped onto a
 disk page.  I think that in real-world cases that's often where the
 main padding cost comes in: with 23-byte tuple headers and no OIDs,
 you aren't saving a thing by int-aligning t_hoff unless you have a nulls
 bitmap and it's the wrong number of bytes long.

 So IMHO the real problem is over near PageAddItem.

Thanks for your feed back as well.  Between you and Greg I think I finally
understand the error in my thinking.  I will investigate the PageAddItem()
later tonight.

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Patch Review] TRUNCATE Permission

2008-09-01 Thread Ryan Bradetich
Hello all,

Robert Haas submitted the TRUNCATE permissions patch for the September
commit fest:
http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

I had some extra time tonight, so I downloaded, installed and reviewed this
patch.
Here are my findings:

1. I found the patch style to be consistent with the surrounding code.
2. The patch provides documentation updates and regression test updates.
3. The patch applies (with some fuzz) to the latest GIT tree.

Three issues I found with the patch via code reading and verified via
testing:

1. File: src/backend/catalog/aclchk.c:
Function: pg_class_aclmask():

I believe the ACL_TRUNCATE trigger should be added to this check and mask.

/*
 * Deny anyone permission to update a system catalog unless
 * pg_authid.rolcatupdate is set.   (This is to let superusers
protect
 * themselves from themselves.)  Also allow it if
allowSystemTableMods.
 *
 * As of 7.4 we have some updatable system views; those shouldn't be
 * protected in this way.  Assume the view rules can take care of
 * themselves.  ACL_USAGE is if we ever have system sequences.
 */
if ((mask  (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_USAGE)) 
IsSystemClass(classForm) 
classForm-relkind != RELKIND_VIEW 
!has_rolcatupdate(roleid) 
!allowSystemTableMods)
{
#ifdef ACLDEBUG
elog(DEBUG2, permission denied for system catalog update);
#endif
mask = ~(ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_USAGE);
}


Here is where it is visible via the psql interface:

template1=# select rolname, rolcatupdate from pg_authid;
 rolname | rolcatupdate
-+--
 rbrad   | t
(1 row)

template1=# select has_table_privilege('pg_class', 'delete');
 has_table_privilege
-
 t
(1 row)

template1=# select has_table_privilege('pg_class', 'truncate');
 has_table_privilege
-
 t
(1 row)

template1=# update pg_authid set rolcatupdate = false;
UPDATE 1

template1=# select has_table_privilege('pg_class', 'delete');
 has_table_privilege
-
 f
(1 row)

template1=# select has_table_privilege('pg_class', 'truncate');
 has_table_privilege
-
 t
(1 row)


I do not believe this is a huge issue since truncate is prohibited on the
system catalogs
by the truncate_check_rel().

template1=# truncate pg_authid;
ERROR:  permission denied: pg_authid is a system catalog


  2. The src/test/regress/sql/privileges.sql regression test has tests for
  the has_table_privilege() function.  It looks like all the other
permissions
  are tested in this function, but there is not a test for the TRUNCATE
  privilege.


  3.  I believe you missed a documentation update in the ddl.sgml file:

   There are several different privileges: literalSELECT/,
   literalINSERT/, literalUPDATE/, literalDELETE/,
   literalREFERENCES/, literalTRIGGER/,
   literalCREATE/, literalCONNECT/, literalTEMPORARY/,
   literalEXECUTE/, and literalUSAGE/.


I played around with the patch for an hour or so tonight and I did not
observer any other unusual behaviors.

Hopefully this review is useful.  It is my first patch review for a
commit-fest.
I will update the wiki with a link to this email message for my review.

Thanks,

- Ryan


Re: [HACKERS] [Patch Review] TRUNCATE Permission

2008-09-01 Thread Ryan Bradetich
Hello all,

On Sun, Aug 31, 2008 at 11:55 PM, Ryan Bradetich [EMAIL PROTECTED]wrote:


 I do not believe this is a huge issue since truncate is prohibited on the
 system catalogs
 by the truncate_check_rel().

 template1=# truncate pg_authid;
 ERROR:  permission denied: pg_authid is a system catalog


I thought about this some more.  I believe my suggestion was incorrect.
Since truncate_check_rel() prevents the use of the truncate command on
system catalogs, the TRUNCATE permission should always be stripped
from the system catalogs.

Here is the inconsistency I observed:

template1=# \z pg_catalog.pg_authid

  Access privileges
   Schema   |   Name| Type  |  Access privileges
+---+---+-
 pg_catalog | pg_authid | table | rbrad=arwdDxt/rbrad
(1 row)

  template1=# select rolname, rolcatupdate from pg_authid;
   rolname | rolcatupdate
  -+--
   rbrad   | t
  (1 row)

  template1=# select has_table_privilege('pg_authid', 'truncate');
   has_table_privilege
  -
   t
  (1 row)

  template1=# truncate pg_authid;
  ERROR:  permission denied: pg_authid is a system catalog

The TRUNCATE fails even though \z and has_table_privilege() said I had
permissions.
Compare with the DELETE privilege:

  template1=# select has_table_privilege('pg_authid', 'delete');
   has_table_privilege
  -
   t
  (1 row)

template1=# delete from pg_authid;
DELETE 1

Thanks,

- Ryan


Fwd: [HACKERS] [Patch Review] TRUNCATE Permission

2008-09-01 Thread Ryan Bradetich
I had intended to send this message to the pgsql-hackers mailing list as
well.

Thanks,

- Ryan

-- Forwarded message --
From: Ryan Bradetich [EMAIL PROTECTED]
Date: Mon, Sep 1, 2008 at 2:20 PM
Subject: Re: [HACKERS] [Patch Review] TRUNCATE Permission
To: Tom Lane [EMAIL PROTECTED]


Hello Tom,

On Mon, Sep 1, 2008 at 1:00 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Ryan Bradetich [EMAIL PROTECTED] writes:
  I do not believe this is a huge issue since truncate is prohibited on
 the
  system catalogs
  by the truncate_check_rel().

 Only when allowSystemTableMods is false.  I think it would be a serious
 mistake for your patch to treat the system catalogs differently from
 other tables.


Good Point.  Looks like I still have more code reading to do :)

This is Robert Haas's patch for the September 2008 commit-fest.
I am just offering my review.   Gives me a good excuse to dig into
the PostgreSQL code base.  Hopefully this review is useful to the
person committing the patch.


   Here is the inconsistency I observed:

 It seems to me that you are assuming that lack of a TRUNCATE permission
 bit is the only valid reason for a permission denied failure.  This is
 fairly obviously not so, since multiple permissions typically enter into
 any command (consider schema-level permissions for instance).


I was just comparing the behavior between DELETE and TRUNCATE.
My last suggestion for the TRUNCATE permission always being removed
on the system tables is obviously bogus because of the allowSystemTableMods
issue you raised.

Does my first suggestion still make sense for removing the TRUNCATE in
pg_class_aclmask() when pg_Authid.rolcatupdate is not set?

Thanks,

- Ryan


[HACKERS] statement_cost_limit for regression testing.

2008-08-28 Thread Ryan Bradetich
Hello,

Sorry for the new thread on this topic, I did not have a copy in my inbox I
could replay to :(

I am not sure of the status of the patch, but I did read through the thread
at:
   http://archives.postgresql.org/pgsql-hackers/2008-08/msg00054.php


I just wanted to throw out another possible use for this GUC.   There maybe
a better way to
solve this problem, but I believe this patch would be useful for regression
testing.

Here is the problem I ran into when regression testing the hash index on the
unsigned integer type
and how I could like to use the statement_cost_limit parameter:

Regression testing steps:

1. Create the table:
CREATE TABLE hash_i4_heap (seqno uint4, random uint4);

2. Create the hash index:
 CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random
uint4_ops);

3. Load data into the hash_i4_heap table (data is from PostgreSQL regression
suit).
 COPY hash_i4_heap FROM
'/home/rbrad/src/PostgreSQL/src/test/regress/data/hash.data';

4. Perform query:
 SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989;
 seqno |  random
---+---
 15| 843938989
(1 row)

This would pass the regression testing, but we did not actually test the
hash index here:

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989;
  QUERY PLAN
--
Seq Scan on hash_i4_heap  (cost=0.00..137.00 rows=1 width=8)
  Filter: (random = 843938989)
(2 rows)

5. Attempt to force an index scan:
SET enable_seqscan = off;

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989;
QUERY PLAN


Seq Scan on hash_i4_heap  (cost=1.00..10137.00 rows=1
width=8)
   Filter: (random = 843938989)
(2 rows)

6. Still uses an sequential scan.   But this query would have still passed
my regression tests.  Try the statement_cost_limit:
SET statement_cost_limit = ;

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989;
ERROR:  execution plan is too expensive: 10137.00

SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989;
ERROR:  execution plan is too expensive: 10137.00


7.  This is good because finally, my regression test failed since the
sequential scan cost bonus is larger then the statement_cost_limit.
For those interested, the reason it failed to use the hash index is
because I did not cast the hash_i4_heap.random value to an uint4 type.

   EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989::uint4;
QUERY PLAN

--
   Index Scan using hash_i4_index on hash_i4_heap  (cost=0.00..8.27
rows=1 width=8)
  Index Cond: (random = '843938989'::uint4)
   (2 rows)

 The issue is fixed in my regression tests for the unsigned integer
types, but it would be nice for the regression tests to fail
 in the future when the index is not used.

I looked at how the main PostgreSQL regression tests handle this problem and
as far as I can tell they do not.
Maybe this is not a likely problem, but it seems we do not have a good way
to ensure the indexes are actually being used during regression testing.

Thanks,

- Ryan

P.S. There appears to be a bug in the statement_cost_limit1.patch:

   SET statement_cost_limit = 0;
   ERROR:  0 is outside the valid range for parameter statement_cost_limit
(100 .. 2147483647)


Re: [HACKERS] [PgFoundry] Unsigned Data Types

2008-08-17 Thread Ryan Bradetich
On Sat, Aug 16, 2008 at 10:53 AM, Decibel! [EMAIL PROTECTED] wrote:

 On Aug 15, 2008, at 1:00 AM, Ryan Bradetich wrote:

 Here is the first pass at the unsigned data type I have been working on.

 I am planning on adding these to the September 2008 commitfest wiki page.
 The unsigned data type is not targeted for core, but for the uint
 PgFoundry project.



 Is the intention for the types to go into pg_catalog? It'd be nice if you
 could specify what schema they should be installed in. An uninstall would
 also be good.


The pg_catalog made since to me at first (especially for my application),
but on reflection I believe you are right.   I will remove the references to
the pg_catalog schema and allow the user to add the unsigned data type to
any schema.  Good catch on the uninstall script.  I should have written this
as well.   I will post an update to the wiki later tonight.



 Thanks for doing this, I've wished we had uint types in the past, and I'm
 sure I will again in the future!


I am glad it is useful.  I needed it for my current project, and I was
hoping others could use it as well.

Thanks,

- Ryan


[HACKERS] [PgFoundry] Unsigned Data Types

2008-08-15 Thread Ryan Bradetich
Hello all,

Here is the first pass at the unsigned data type I have been working on.

I am planning on adding these to the September 2008 commitfest wiki page.
The unsigned data type is not targeted for core, but for the uint PgFoundry
project.

The uint.c.gz file is the main source file for the uint1, uint2, and uint4
data types.
The uing.sql.gz file contains the SQL statements to add the unsigned data
type to the database.
The pg_atoui.c.gz file is based off the function in the PostgreSQL source
code but works for unsigned data types instead of signed data types.
The Makefile is used to build the unsigned data type shared library on
Linux.

The tests.tar.gz is my unit test suit that I worked on to make sure the
unsigned integer types worked as expected.

The tests cover cases like:
* table creation with the unsigned integer types.
* comparision operations.
* INSERT statements (binary and text forms).
* COPY statements (binary and text forms).
* unique btree index support.

In addition to correctness issues, I would also appreciate feedback on best
practices and portability concerns.

For example:
   I doubt my Makefiles are very portable.
   What is the proper solution to handle this?  pgxs?

Thanks,

- Ryan


uint.c.gz
Description: GNU Zip compressed data


uint.sql.gz
Description: GNU Zip compressed data


pg_atoui.c.gz
Description: GNU Zip compressed data


Makefile
Description: Binary data


tests.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Question regarding the database page layout.

2008-08-11 Thread Ryan Bradetich
Hello all,

I have been digging into the database page layout (specifically the tuples)
to ensure the unsigned integer types were consuming the proper storage.
While digging around, I found one thing surprising:

It appears the heap tuples are padded at the end to the MAXALIGN distance.

Below is my data that I used to come to this conclusion.
(This test was performed on a 64-bit system with --with-blocksize=32).

The goal was to compare data from comparable type sizes.
The first column indicates the type (char, uint1, int2, uint2, int4, and
uint4),
the number in () indicates the number of columns in the table.

The Length is from the .lp_off field in the ItemId structure.
The Offset is from the .lp_len field in the ItemId structure.
The Size is the offset difference.

char (1)Length  Offset  Sizechar (9)
Length   Offset   Size
25  32736
32 33  32728  40
25  32704
32 33  32688  40
25  32672
32 33  32648  40
25
32640  33  32608

uint1 (1)   Length   Offset   Sizeuint1 (9)
Length   Offset  Size
 25  32736
3233  32728 40
 25  32704
3233  32688 40
 25  32672
3233  32648 40
 25  32640
33  32608

int2 (1) Length   Offset   Sizeint2 (5)
Length   Offset  Size
 26  32736
3234  32728 40
 26  32704
3234  32688 40
 26  32672
3234  32648 40
 26
32640  34  32608

uint2 (1)Length  Offset   Sizeunt2 (5)
Length   Offset  Size
 26  32736
3234  32728 40
 26  32704
3234  32688 40
 26  32672
3234  32648 40
 26
32640  34  32608

int4 (1)   Length  Offset  Sizeint4 (3)
Length Offset  Size
 28  32736
32  36 32728 40
 28  32704
32  36 32688 40
 28  32672
32  36 32648 40
 28
32640   36 32608

uint4 (1) Length   Offset  Sizeuint4 (3)
Length  Offset  Size
  28  32736
32 36 32728 40
  28  32704
32 36 32688 40
  28  32672
32 36 32648 40
  28
32640  36 32608

From the documentation at:
http://www.postgresql.org/docs/8.3/static/storage-page-layout.html
and from the comments in src/include/access/htup.h I understand the user
data (indicated by t_hoff)
must by a multiple of MAXALIGN distance, but I did not find anything
suggesting the heap tuple itself
had this requirement.

After a cursory glance at the HeapTupleHeaderData structure, it appears it
could be aligned with
INTALIGN instead of MAXALIGN.  The one structure I was worried about was the
6 byte t_ctid
structure.  The comments in src/include/storage/itemptr.h file indicate the
ItemPointerData structure
is composed of 3 int16 fields.  So everthing in the HeapTupleHeaderData
structure is 32-bits or less.

I am interested in attempting to generate a patch if this idea appears
feasible.   The current data
set I am playing with it would save over 3GB of disk space.  (Back of the
envelope calculations
indicate that 5% of my current storage is consumed by this padding.   My
tuple length is 44 bytes.)

Thanks,

- Ryan


Re: [HACKERS] Question regarding the database page layout.

2008-08-11 Thread Ryan Bradetich
Hello Greg,

On Mon, Aug 11, 2008 at 2:24 AM, Gregory Stark [EMAIL PROTECTED]wrote:

 Ryan Bradetich [EMAIL PROTECTED] writes:

  After a cursory glance at the HeapTupleHeaderData structure, it appears
 it
  could be aligned with INTALIGN instead of MAXALIGN. The one structure I
 was
  worried about was the 6 byte t_ctid structure. The comments in
  src/include/storage/itemptr.h file indicate the ItemPointerData structure
 is
  composed of 3 int16 fields. So everthing in the HeapTupleHeaderData
  structure is 32-bits or less.

 Sure, but the tuple itself could contain something with double alignment.
 If
 you have a bigint or double in the tuple then heap_form_tuple needs to know
 where to put it so it ends up at right alignment.


My first thought was we can still figure this out because the user data is
already
forced to be MAXALIGN.   Then I realized oh, since that is true then I am
still going
to eat the padding anyhow.  The padding would just move to one of two
places:
1. To MAXALIGN the user data or
2. To MAXALIGN the heap tuple header.

Thanks for the sanity check.  I was wrapped up in looking for alignment
requirements
in the HeapTupleHeaderData structure, I overlooked the significance of the
MAXALIGN
on the user data.

Thanks,

- Ryan

P.S.  Just for the archive, it seems this idea still may be workable (need
to look at the
heap_form_tuple in significant more detail) if/when someone implements the
proposal
to separate the physical storage from the column order.   That solution is a
bit more
than I am ready to tackle at the moment :)  Maybe in the future.


[HACKERS] Initial Unsigned Integer data type performance test results.

2008-08-04 Thread Ryan Bradetich
Hello All,

I wanted to pass some performance data on to the group regarding the
unsigned integer
data types I am working on.  I tested on two systems running Ubuntu
Hardy.   The first system
is an 8 x 2.66GHz x86-64 processor system.  The second system is a 2 x
533 celeron i386
system.  For this test I disabled autovaccuum and manually ran the
vacuum during times that
were not timed.  I also added some sleep and sync commands to try and
stabalize the test
results.  The integer and unsigned integer test results were also
interleaved to compensate for
other system activities.   Finally, I have not done a lot of bench
marking, so feedback and
improvement ideas are welcome!

I benchmarked four tests:
32-bit int2 vs uint2--  My expectation is these results
would be approximately
  -- equal.  The uint2 casts
up to an int4 which is still 32-bits.

32-bit int4 vs uint4-- My expectation is  operator
should be approximately
  -- equal since I explicitly
added int4  uint4 operators.

64-bit int2 vs uint2 -- My expectation is these results
would be approximately
   -- equal.  The uint2 casts
up to an int4 which is less than
   -- the 64-bit processor bus width.

64-bit int4 vs uint4 -- My expectation is these results
would be approximately
   -- equal.  The uint4 casts
up to an int8 which is the processor
   -- bus width.

Here is the definition of each column (times are displayed in seconds):

Rows-- The number of rows added to the table.
Test  -- The test number (I ran each benchmark 3 times).
S Load -- Time to load the table with signed integers.
U Load -- Time to load the table with unsigned integers.
S  OP  -- Time to perform an  operator on each row in the
table (signed integers)
U  OP  -- Time to perform an  operator on each row in the
table (unsigned integers)
S  OP -- Time to perform an AND operator on each row in
the table (signed integers)
U  OP -- Time to perform an AND operator on each row in
the table (unsigned integers)


I still have some more investigating to do into the results of this
data (i.e. why is the signed  OP cheaper for int2 than uint2 types,
etc),
but I wanted to share this performance data with the community.  Rough
calculations show about a 4% performance hit for 32-bit processors
to use the int8 type and a 6% performance hit for 64-bit processors.
This seems like a reasonable trade-off for the reduced storage space
I am after, and the reduction in operators Greg and Tom were concerned
about.  If there are no objections from the community, I will plan to
complete my implementation of Tom's proposal at:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01204.php.

Thanks!

- Ryan

P.S. The code I have so far is not ready for review.  It was
implemented quickly to get performance numbers to determine if Tom's
proposal
would be acceptable by the community.  I can send it out if people are
interested, but I am planning on submitting it to a future commit fest
once I have it cleaned up and better tested.



Benchmark Data:
==



  32-bit int2 vs. uint2

  Rows  Test  S Load  U Load  S  OP  U  OP  S
 OP  U  OP
--
 100 1  000.002888  000.002151  000.010881  000.014691
000.011124  000.011000
 100 2  000.002780  000.002127  000.011729  000.011611
000.012014  000.011925
 100 3  000.002747  000.002085  000.010193  000.010318
000.010588  000.010576

1000 1  000.003201  000.003870  000.037837  000.037360
000.032064  000.032478
1000 2  000.003259  000.003912  000.033495  000.036281
000.032502  000.035195
1000 3  000.003201  000.003913  000.039156  000.035592
000.032405  000.040543

   1 1  000.024683  000.021306  000.255958  000.329045
000.255887  000.283782
   1 2  000.020214  000.021224  000.260252  000.290933
000.281468  000.255171
   1 3  000.020371  000.020940  000.276401  000.264791
000.257598  000.257258

  10 1  001.669571  001.687523  002.591442  002.682428
003.410724  003.490362
  10 2  001.682251  001.702598  003.379377  002.855622
002.549476  002.583431
  10 3  001.693429  001.684732  002.546024  002.641240
002.540556  003.366534

  50 1  010.138317  011.014532  015.707597  015.61
015.394598  015.502639
  50 2  010.042176  010.179163  015.290994  015.407479
015.332925  016.321578
  50 3  010.047930  010.206489  015.016276  015.430527
015.201759  015.411601

 100 1  020.762680  022.145950  030.338606  

Re: [HACKERS] Type Categories for User-Defined Types

2008-07-30 Thread Ryan Bradetich
Tom,

On Tue, Jul 29, 2008 at 2:33 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Yeah, that's the point of the proposal.  I think the issue has come up
 once or twice before, too, else I'd not be so interested in a general
 solution.  (digs in archives ... there was some discussion of this
 in connection with unsigned integer types, and I seem to recall older
 threads but can't find any right now.)

Anything I should be looking into and/or testing for unsigned integer support?

Thanks!

- Ryan

P.S. I have most of the uint2 and uint4 types implemented.  I am currently
working on the unit and performance testing.  I need to upgrade one of
my Linux boxes first before I can complete my testing. I am hoping to post
my code and performance testing later this week.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Type Categories for User-Defined Types

2008-07-30 Thread Ryan Bradetich
Hello Tom,

On Wed, Jul 30, 2008 at 7:50 AM, Tom Lane [EMAIL PROTECTED] wrote:
 Dunno, I forget what the conclusion was about implicit casting for the
 unsigned types in your proposal.  Have you experimented with seeing
 whether, eg, UNION'ing an unsigned with some signed-integer value
 behaves sensibly?

That was not one of my tests, but I will add it :)

Here are the performance tests I am planning / working on:

1. Loading of both integer and unsigned integers into a table.  This test is
mainly a sanity check to ensure the unsigned integers do not add
significant time during inserts.   In a perfect world, I believe they should
take the same time when the unsigned integers and integers have equal
widths.  Experimentation is showing the unsigned integer to take slightly
longer.  I suspect this is due to the additional integer-unsigned
integer cast?  I am still digging into this for my personal curiosity.

2. Testing the  operator.  The unsigned integer type provides a native cast
for the  operator.

3. Testing the  operator.  The unsigned integer type does not provide a
native cast for the  operator, so they are cast up to the next larger size.

I am testing this for both signed and unsigned integers with data sets of the
following sizes: 100, 1000, 1, 10, 50, 1M, 5M, and 10M rows.
I am planning to test on both 32-bit and 64-bit x86 platforms.  If there is
interest, I can also run these tests on 32-bit and 64-bit PowerPC platforms.

I will add the union test to my test plan.Anything else I should add or any
holes I am missing with this test plan?

 The thread I mention above was a year or so back and was originated by
 someone who wanted to duplicate mysql's behavior.  Your proposal is
 a lot more limited and might not really need to try to put the unsigned
 types into the numeric category.

Ah, ok.  I will not worry about it for now.

Thanks!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Ryan Bradetich
Hello Peter,

On Fri, Jul 25, 2008 at 5:14 AM, Peter Eisentraut [EMAIL PROTECTED] wrote:
 Am Friday, 25. July 2008 schrieb Ryan Bradetich:
 PgFoundry already has an uint project:
 http://pgfoundry.org/projects/uint/

 Unfortunately this project seems to have not gone anywhere.  Last
 activity was late 2006 and there are not any files checked into the
 SCM repository.
 Is it acceptable to hijack this PgFoundry project?  Or should I
 start a new project (assuming there is any interest in publishing this
 work).

 Please hijack the project and develop your code there.  Of course you can
 always ask for advice here.

I will work on getting the PgFoundry project setup.

Thanks!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Ryan Bradetich
Hello Greg,

On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark [EMAIL PROTECTED] wrote:
 Ryan Bradetich [EMAIL PROTECTED] writes:

 My plans for the example above would be:

   1. SELECT 15 + 15  -- Throws overflow error.
   2. SELECT 15::uint4 + 15 -- Returns 30::uint4.

 I think that wouldn't actually work. Postgres's parser immediately assigns a
 type to the bare unquoted integral constant so it would end up with a int4
 type. Then when it has to pick an operator for uint4+int4 it wouldn't be able
 to cast the int4 to uint4 because there would be no implicit cast.

 You could make it work by having a uint4+int4 operator which returns uint4 but
 then you're going to need a *lot* of operators

This was my plan.  I performed some testing last night to verify that
bare literals
are considered plain integers and would not be implicitly casted to a
different type
(i.e. smallint or bigint).  I am seeing three operators for most operations:

1. uint4 - uint4  = uint4
2. int4 - uint4= uint4
3. uint4 - int4= uint4

Is there something I need to watch out for when adding this number of
operators (i.e.
performance impact, etc)?  Some tests I should be running to measure the impact
of adding these operators?


 One other idea that's been mentioned before is treating integral constants
 like 15 as type unknown like the quoted '15' constant is. That way
 the parser would see uint4+unknown and could pick the uint4 operator. But that
 would be a pretty massive semantics change.

This would require changes to the core PostgreSQL code correct?  My
goal for this
type was to have it as an external project on PgFoundry since there
does not appear
to be much demand for it and unsigned types are not specified in the
SQL standard.
If the community decides this support would be better in core
PostgreSQL code, then
I am willing to help with that work, but I will need a significant
amount of guidance :)

With my limited knowledge, the best (and easiest) path seems to take
advantage of
the extensible type system in PostgreSQL and support unsigned integers as a
PgFoundry project.

Thanks for your review and comments!

- Ryan

 --
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Ryan Bradetich
Hello Dann,

On Fri, Jul 25, 2008 at 1:06 PM, Dann Corbit [EMAIL PROTECTED] wrote:

 At the cost of one bit of storage, you have compatible types using

Thanks for your review and feedback!  Unfortunately, I do need the full range
of the unsigned types for the project I am looking at.  The reason I started
working on these types is because it seemed wasteful to use the next size
larger signed integer for the storage type of the unsigned integer.

Thanks for the suggestion!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Ryan Bradetich
Tom,

On Fri, Jul 25, 2008 at 12:32 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Consider the idea of not having any uint4-specific arithmetic operators,
 but instead providing the following:

* assignment casts from int4 and int8 to uint4
  (these throw error if out of range, of course)
* implicit cast from uint4 to int8 (can never fail)

 The effect of providing the latter cast would be that any arithmetic
 involving a uint4 column would automatically be done in int8.  Which
 would make it a shade slower than a native implementation, but probably
 not enough slower to be a problem --- and you'd avoid having to write
 dozens of operators and underlying support functions.  Storing into the
 uint4 column would work fine with no extra notation because of the
 assignment casts.

This is an interesting idea that I will test out tonight.  I did have
the following
concern looking through src/backend/utils/adt/int8.c:  There is code that is
optionally compiled based on the INT64_IS_BUSTED pre-processor define.
Is this pre-processor define something I should worry about for portability
with this plan?

After I get uint types implemented, for fun I might try some benchmarks
to see if I can detect the int8 overhead on a 32-bit system.

 Moreover, you'd avoid cluttering the system with a pile of cross-type
 operators, which we have recently realized are not a good thing, because
 they increase the likelihood of ambiguous operator problems --- see
 http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php

Good to know.  Thanks for the link.

 For uint8 you'd have to promote to numeric to guarantee no failure
 in the implicit cast; which is going to be a rather bigger performance
 hit, but I don't really see uint8 as being a type with huge demand.

Hopefully I will not need the uint8 type.  Right now for a project I am
looking at I need the uint2 and uint4 types.  uint8 support can come
later if it is needed or requested.

 Now you probably *will* want cross-type comparison operators, if you
 are going to support indexing of unsigned columns, so that something
 like
uint4col  42
 can be indexed without any casting.  But limiting yourself to the six
 basic comparison operators certainly makes it a much less bulky project.

This sounds excellent!  Hopefully by using these operators I will be able to
avoid most of the casting to int8 for my use, while still providing the
complete functionality for this type.

Thanks again for your review and feedback!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] Unsigned integer support.

2008-07-24 Thread Ryan Bradetich
Hello hackers,

I know the development community is in the middle of the July 2008
commit-fest, so I apologize if this design proposals are in
appropriate at this time.

I am looking to take advantage of PostgreSQL extensible type system
and implement unsigned integer support.   The data I am dealing with
contains
many unsigned data types and I am planning on using this type to
reduce storage consumption. I am initially looking to add support for
16-bit and
32-bit unsigned integers with the potential to add 8-bit and 64-bit
unsigned integers if needed or desired by the community.

Searching through the list archives, I found two objections raised in the past:


1. Implicit casts between different data types.

I am hoping the removal of many of the implicit casts in
PostgreSQL 8.3 will simplify this task to where this objection can be
removed.

My plan (without much experimentation) is to have maybe a handful
of casts (something like):
 * uint4 - integer
 * integer - uint4
 * bigint - uint4
 * integer - uint2
 * uint2 - smallint

and then provide operators to provide a reasonable set of
functionality.   My initial thought for this functionality is to
provide default operators on any
type that is not implicitly casted on the psql command-line.

As an example, I am planning for the following SQL statements to
work correctly:

30::uint4 + 10  and
30::uint4  10

   My understanding is the SQL standard does not provide support for
unsigned integers, so I am planning on making all casts from unsigned
integers
   to other data types explicit.   Is this acceptable to the community?

   Another question for the community is should we allow the following cast?
   -1::uint4

Even though this is acceptable c-code, I am leaning towards
throwing an out-of-range error when this occurs.



Are there some areas I am missing or should investigate further
before working on this project?


2. There is not much demand for unsigned integer types.

 Not much I can do about that :)  I am willing to post my work as
a PgFoundry project.

 PgFoundry already has an uint project:
http://pgfoundry.org/projects/uint/

Unfortunately this project seems to have not gone anywhere.  Last
activity was late 2006 and there are not any files checked into the
SCM repository.
Is it acceptable to hijack this PgFoundry project?  Or should I
start a new project (assuming there is any interest in publishing this
work).


Although I am not targeting inclusion for this type in the core
PostgreSQL code, I would like to post code for review and receive
feedback from the
community on this work.  As I understand this RFC is the first step in
the process :)  Once I have some code ready for review, is it
acceptable to use the
commit-fest wiki for this project?

Thanks much for your time!

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-24 Thread Ryan Bradetich
Hello Tom,

On Thu, Jul 24, 2008 at 10:10 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Ryan Bradetich [EMAIL PROTECTED] writes:
 I am looking to take advantage of PostgreSQL extensible type system
 and implement unsigned integer support.

 This has been proposed before, and foundered before on the question
 of implicit coercions.  If you're willing to make all coercions *to*
 unsigned types be explicit (or at most assignment), then I think
 it can be made to work without breaking anything.  But usually the
 folk who ask for this feature are hoping that bare integer literals
 like 42 will get interpreted as unsigned when they want them to be.
 The problem with that wish is illustrated by

select 15 + 15;

 These literals might be either int4 or uint4, therefore this command
 might yield either an integer-overflow error or 30::uint4.
 That's not a distinction you can fuzz over --- it's got to be one
 or the other, and backwards compatibility says it'd better be the
 first.

I am in agreement with you on this.  Since SQL does not specify
unsigned types, I was assuming only explicit and assignment casts.
I should have probably mentioned that in the RFC.  Thanks for
pointing this out.

My main goal for this type is the reduced storage space.  I am fine
with people needing to cast to the unsigned types to benefit from the
reduced storage space.

My plans for the example above would be:

  1. SELECT 15 + 15  -- Throws overflow error.
  2. SELECT 15::uint4 + 15 -- Returns 30::uint4.



 I am hoping the removal of many of the implicit casts in
 PostgreSQL 8.3 will simplify this task to where this objection can be
 removed.

 The implicit casts we removed were cross-type-category cases.
 If you hope for unsigned types to be considered part of the numeric
 category, there's no guidance for you there.  In fact, the real nub
 of the problem is what type shall be initially assigned to an
 integer-looking literal, and how will you get things to behave sanely
 if that initial choice wasn't what was desired.  We still have some
 issues around the fact that 42 isn't considered a smallint.  Throwing
 in another possible meaning isn't going to help.

My understanding is the SQL standard does not provide support for
 unsigned integers, so I am planning on making all casts from unsigned
 integers to other data types explicit.

 It's really the other direction that would be contentious ...

regards, tom lane

Thanks for your comments!  I have already started to play around a bit with
the types and will hopefully have some code ready for review / feedback soon.

- Ryan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Questions about indexes?

2003-02-16 Thread Ryan Bradetich
Hello postgres hackers,

Been a while since I have participated on this list ... but I have a new
itch to scratch

Although the table schema is immaterial, I will provide it so we have a
common framework for this discussion:


host_id integer (not null)
timestamp   datetime(not null)
categorytext(not null)  [=5 chars]
anomaly text(not null)  [= 1024 chars]

This table is used to store archived data, so each row in the table must
be unique.  Currently I am using a primary key across each column to
enforce this uniqueness.  This table currently has ~86 million rows and
is 16+ GB in size.  This primary key index is also 16+ GB in size,
because it appears all the data is duplicated in the index.  (I have
only done some preliminary looking at the database file with strings,
etc ... so this assumption is purly based on these observations).

I am not sure why all the data is duplicated in the index ... but i bet
it has to do with performance since it would save a lookup in the main
table.  Is there any benchmarks or papers related to this topic I should
locate and read?  I am curious about this because it seems the only
advantaged gained is searching the index for the specified values
Once the entry is found, the full entry needs to be pulled from the main
table anyhow since the index does not contain all the data.  Also with
the increased size, it seems additional pressure would be put on the
shared memory caches (no idea how this really works, just guessing! :))


Since my only requirement is that the rows be unique, I have developed a
custom MD5 function in C, and created an index on the MD5 hash of the
concatanation of all the fields.  This has reduced the disk space usage
considerably, as show below against my test database ~6 million rows
at 1+ GB.

All this data is based off the test database running 7.3.2:

TypeSize
---
Database Table  1188642816
All columns pkey1510252544
MD5 columns pkey 370999296

Just using MD5 hash data instead of all the columns is a considerable
diskspace win going from 1.5 GB to 370 MB.

Has anyone else solved this problem?  Has anyone else looked into
something like this and mind sharing so I do not have to re-invent the
wheel? :)  Also (assuming there is no papers / benchmarks proving data
in index is a good idea), how difficult would it be to impliment an
index type that extracts the data from the main table?


Thanks for reading.  I will be happy to field any question that I can,
or read any papers, research, etc that relates to this topic.

- Ryan

P.S. the production database is running 7.2.4 if that makes a
difference.

-- 
Ryan Bradetich [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Questions about indexes?

2003-02-16 Thread Ryan Bradetich
On Mon, 2003-02-17 at 00:15, Tom Lane wrote:
 Ryan Bradetich [EMAIL PROTECTED] writes:
  On Sun, 2003-02-16 at 23:34, Tom Lane wrote:
  It's not real clear to me why you bother enforcing a constraint that the
  complete row be unique.  Wouldn't a useful constraint be that the first
  three columns be unique?
 
  The table holds system policy compliance data.  The catagory is
  basically the policy, and the anomaly is the detailed text explaining
  why the system is out of compliance.  So the anomaly data is important
  (and often the reason why the key is unique).
 
 Well, sure the anomaly is important: it's the payload, the reason why
 you bother to have the table in the first place.  But that doesn't mean
 it's part of the key.  Generally the key would be the info you use to
 look up a particular anomaly text.  In this example, it's not clear to
 me why you'd need/want two different anomaly texts entered for the same
 host_id and the same category at the same instant of time.  ISTM there's
 something inadequate about your category column if you need that.

Ok, I understand what you are asking now :)

Let me make up a contrived example to show how the table is used.

host_id 1 = hosta.somewhere.com
host_id 2 = hostb.somewhere.com

The catagories are coded so (made up examples):
cat p101 = /etc/passwd check
cat f101 = filesystem check.

the table would look like:
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.
etc...

So I do not need the anomaly to be part of the index, I only need it to 

I agree with you, that I would not normally add the anomally to the
index, except for the unique row requirement.  Thinking about it now,
maybe I should guarentee unique rows via a check constraint...

Thanks for making me think about this in a different way!

- Ryan

   regards, tom lane
-- 
Ryan Bradetich [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] a couple of minor itches: RI Trigger Names, and additional Alterownerships commands.

2002-05-06 Thread Ryan Bradetich

Hello postgresql-hackers,

Been a while sine I've particiapated on this list so I wanted to say
thank you for the great product postgresql 7.2.1 is!  I have been 
doing some testing in preperation of a database upgrade from 7.0.3
to 7.2.1 and I have a few small itches to scratch, so I thought I'd
get opinions from the experts :)


Itch #1: Referential Integrity trigger names in psql \d output.

Currently the \d ouput from psql displays non-obvious names for 
what the RI trigger actually does.  Reading through the source code
and quering the mailing lists indicate this is easily changed (and 
I have already done this on a test database without any ill effects
so far).

Here is what the \d displays from 7.2.1:

Before:

test=# \d foo
  Table foo
 Column  |  Type   | Modifiers 
-+-+---
 blah_id | integer | not null
 foo | text| not null
Primary key: foo_pkey
Triggers: RI_ConstraintTrigger_30670

test=# \d blah 
 Table blah
 Column  |  Type   | Modifiers 
-+-+---
 blah_id | integer | not null
 blah| text| not null
Primary key: blah_pkey
Triggers: RI_ConstraintTrigger_30672,
  RI_ConstraintTrigger_30674


After:

test=# \d foo
  Table foo
 Column  |  Type   | Modifiers 
-+-+---
 blah_id | integer | not null
 foo | text| not null
Primary key: foo_pkey
Triggers: RI_blah_id (insert)

test=# \d blah
 Table blah
 Column  |  Type   | Modifiers 
-+-+---
 blah_id | integer | not null
 blah| text| not null
Primary key: blah_pkey
Triggers: RI_blah_id (delete),
  RI_blah_id (update)


This change was made with a simple update to the pg_trigger
system table for the tgname column.

Searching through the code and the mailing list, it looks like
the only constraint to the tgname column is that it needs to be
unique (although the database schema does not inforce this via 
a unique index) since the OID tacked on to the RI_ConstraintTrigger_*
was designed to keep this uniqueness.

What I would propose is to base the RI_* off the constrain name provided
during the RI_trigger creation, if the constrain name is not provided,
then to default to the current nameing scheme.

Can anyone think of side-affects of changing the tgname column in the
pg_trigger system table?  Does this proposal seem like an acceptable
solution?  Would there be interest in this if I provided a patch to do
this?



Itch #2: Alter ownership on a sequence, etc.

Alter table provides the functionality to change the ownership of a
table, but ownership on other structures like sequences, etc can not
be changed without dropping and recreating as the new owner.  Would
there be any interest if I worked on a patch to do this too?



Thanks again for all the hard work and a great database!

- Ryan Bradetich



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])