Re: [HACKERS] Out parameters handling
On Sun, Mar 8, 2009 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ryan Bradetich rbradet...@gmail.com writes: This is one of the things I wanted to start looking at for 8.5. My idea was to optionally use : or @ (not sure which is more popular) to specify this token is only a variable. This whole line of thought is really a terrible idea IMHO. plpgsql is supposed to follow Oracle's pl/sql syntax, not invent random syntax of its own. I believe that 80% of the problems here are occurring because we used a crude substitution method that got the priorities backwards from the way Oracle does it. Fair Enough. I just hope what every solution the community decides upon solves this problem. It is a very annoying problem to track down and I tend to get even more agitated when I figure out this is the problem. I do not want to distract from the release efforts, so I will withhold further comments until the 8.5 development cycle. Thanks, - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Out parameters handling
Hello Robert, I have been bitten by this problem many times as well. I wonder whether it would be possible to make PL/pgsql take :foo to mean the parameter named foo, and then provide an option to make that THE ONLY WAY to refer to the parameter foo. For backward-compatibility, and compatibility with (ahem) other database products, we probably don't want to remove the option to have foo mean... any damn thing named foo you can put your hands on. But it would be nice to at least have the option of disabling that behavior when compatibility is not an issue, and correctness is. This is one of the things I wanted to start looking at for 8.5. My idea was to optionally use : or @ (not sure which is more popular) to specify this token is only a variable. Do not try to match it to columns or other database object. If the variable did not start with : or @ then normal rules would apply for backwards compatibility. No idea how feasible this plan is, I was just hoping to find a way to solve this problem. Thanks, - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.
Bruce, Thanks for adding this to the TODO. I am planning on continuing to work on this patch after 8.4 releases. I know we are in feature freeze and I do not want to sidetrack the release process. Thanks! - Ryan On Thu, Jan 8, 2009 at 8:41 AM, Bruce Momjian br...@momjian.us wrote: Added to TODO: Reduce data row alignment requirements on some 64-bit systems * http://archives.postgresql.org/pgsql-hackers/2008-10/msg00369.php --- Ryan Bradetich wrote: Hello all, Here is a proof-of-concept patch for reducing the alignment requirement for heap tuples on 64-bit systems. This patch passes the regression tests and a couple of other data sets I have thrown at it. I am hoping to get some early feedback on this patch so I have time to make any corrections, improvements, etc before the November commit-fest deadline. There are two visible savings in the system catalogs using this patch (size was gathered using \S+): Catalog Pre-Patch Size After Patch Size pg_depend 312 kB 296 kB pg_description160 kB 152 kB One thing I am considering, but I am not sure if it is worth the code uglification, is to wrap the majority of these checks in #ifdefs so they only are used on 64-bit systems. This would completely eliminate the impact for this patch on 32-bit systems, which would not gain any benefit from this patch. Feedback welcome! Thanks, - Ryan [ Attachment, skipping... ] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.
Hello Zdenek, On Wed, Oct 8, 2008 at 10:59 PM, Zdenek Kotala [EMAIL PROTECTED] wrote: Just a quick look. At first point. Your change introduces new page layout version. Which is not acceptable from my point of view for 8.4 (it add I would like to see this patch (or some variant) go in if possible. Since the inplace upgrades a concern to you, is there anything I can do to help with the inplace upgrades to help offset the disruption this patch causes you? another complexity to inplace upgrade). And I guest that it maybe works fine on 64bits x86 but it will fail on SPARC and other machine which requires aligned data. Did I miss something? My intention was to keep the data aligned so it should work on any platform. The patch checks the user-defined data to see if any column requires the double storage type. If the double storage type is required, it uses the MAXALIGN() macro which preserves the alignment for 64-bit data types. If no columns require the double storage type, then the data will be INTALIGN() which still preserves the alignment requirements. If I have a complete mis-understanding of this issue, please explain it to me and I will either fix it or withdraw the patch. Thanks for your feedback! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.
Hello Zdenek, On Thu, Oct 9, 2008 at 12:38 AM, Zdenek Kotala [EMAIL PROTECTED] wrote: Ryan Bradetich napsal(a): On Wed, Oct 8, 2008 at 10:59 PM, Zdenek Kotala [EMAIL PROTECTED] wrote: I would like to see this patch (or some variant) go in if possible. Since the inplace upgrades a concern to you, is there anything I can do to help with the inplace upgrades to help offset the disruption this patch causes you? Yaah, wait until 8.5 :-). However, currently there is no clear consensus which upgrade method is best. I hope that It will clear after Prato developers meeting. Until this meeting I cannot say more. Heh, understood. :) I believe the proposed CRC patch also affects the heap page layout (adds the pd_checksum field to the PageHeaderData). [1] Just pointing out another patch that could affect you as well. My offer to help still stands. I overlooked 'd' test. Your idea seems to me reasonable. Maybe, you could test 'd' alignment only for NOT NULL values. Funny you should mention this. I had just started looking into this optimization to see if I could convince myself it would be safe. My initial conclusion indicates it would be safe, but I have not tested nor verified that yet. Having an independent proposal for this boosts my confidence even more. Thanks! The problem there is add_item which it is used for indexes as well and they has IndexTupleHeader structure. I'm not convenience about idea has two different alignment for items on page. Just to clarify, this patch only affects heap storage when (i.e. the is_heap flag is set). I have not had a chance to analyze or see if I can reduce other storage types yet. I guess another problem is with MAX_TUPLE_CHUNK_SIZE which uses MAXALIGN for computing. It seems to me that toast chunk could waste a space anyway. And of course you should bump page layout version. Thanks. I will do. I also suggest create function/macro to compute hoff and replace code with this function/macro. Great. That is some of the feedback I was looking for. I did not implement it yet, because I wanted to see if the basic implementation was feasible first. Thanks again for your feedback! - Ryan [1] http://archives.postgresql.org/pgsql-hackers/2008-10/msg00070.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.
Hello ITAGAKI-san, I apologize for not replying earlier, I needed to head out to work. On Thu, Oct 9, 2008 at 5:01 AM, ITAGAKI Takahiro [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] wrote: Here is a proof-of-concept patch for reducing the alignment requirement for heap tuples on 64-bit systems. pg_depend 312 kB 296 kB pg_description160 kB 152 kB 5 percent of gain seems reasonable for me. I agree. I am seeing ~10% gain in some of my tables where the tuple size was 44 bytes but due to the alignment was being padded out to 48 bytes. Is it possible to apply your improvement for indexes? I think there are more benefits for small index entries that size are often 12 or 20 bytes. I am not sure if this improvement will affect indexes or not yet. My current implementation specifically excludes indexes and only affects heap tuples. Now that I have a better understanding of the disk structure for heap tuples, I am planning to look at the index layout in the near future to see if there is some possible gain there as well. This would completely eliminate the impact for this patch on 32-bit systems, which would not gain any benefit from this patch. No! There are *also* benefits even on 32-bit systems, because some of them have 8-byte alignment. (for example, 32-bit Windows) Excellent! I was not aware of that. Thanks for this information. Any ideas on what I should use for this check? I was thinking of using #if SIZEOF_SIZE_T = 8 Guess I should search around for standard solution for this problem. BTW, there might be a small mistabke on the following lines in patch. They do the same things ;-) *** *** 1019,1025 toast_flatten_tuple_attribute(Datum value, new_len += BITMAPLEN(numAttrs); if (olddata-t_infomask HEAP_HASOID) new_len += sizeof(Oid); ! new_len = MAXALIGN(new_len); Assert(new_len == olddata-t_hoff); new_data_len = heap_compute_data_size(tupleDesc, toast_values, toast_isnull); --- 1025,1034 new_len += BITMAPLEN(numAttrs); if (olddata-t_infomask HEAP_HASOID) new_len += sizeof(Oid); ! if (olddata-t_infomask HEAP_INTALIGN) ! new_len = MAXALIGN(new_len); ! else ! new_len = MAXALIGN(new_len); Assert(new_len == olddata-t_hoff); new_data_len = heap_compute_data_size(tupleDesc, toast_values, toast_isnull); Thanks for that catch! I have this fixed in my local GIT tree now. Thanks for your feedback and review! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [WIP] Reduce alignment requirements on 64-bit systems.
Hello all, Here is a proof-of-concept patch for reducing the alignment requirement for heap tuples on 64-bit systems. This patch passes the regression tests and a couple of other data sets I have thrown at it. I am hoping to get some early feedback on this patch so I have time to make any corrections, improvements, etc before the November commit-fest deadline. There are two visible savings in the system catalogs using this patch (size was gathered using \S+): Catalog Pre-Patch Size After Patch Size pg_depend 312 kB 296 kB pg_description160 kB 152 kB One thing I am considering, but I am not sure if it is worth the code uglification, is to wrap the majority of these checks in #ifdefs so they only are used on 64-bit systems. This would completely eliminate the impact for this patch on 32-bit systems, which would not gain any benefit from this patch. Feedback welcome! Thanks, - Ryan heap-tuple-v1.0.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Fwd: [HACKERS] [Patch Review] TRUNCATE Permission
Hello Robert, On Fri, Sep 5, 2008 at 8:13 PM, Robert Haas [EMAIL PROTECTED] wrote: Updated patch attached, based on comments from Ryan Bradetich and Tom Lane, and sync'd to latest CVS version. Thanks for the update. I am out of town until tomorrow evening. I will re-review this patch when I get back if it has not been committed by then. - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Review] Tests citext casts by David Wheeler.
Hello all, Here is my review of the Test citext casts written by David Wheeler: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] 1. The patch applies cleanly to the latest GIT repository. 2. The citext type installs, uninstalls, and re-installs cleanly. 3. The coding style is mostly consistent with the existing code. The only coding style difference I noticed was introduced in this patch: In the citext.sql.in file the following functions are created using the non-dollar quoting syntax: * regex_matches * regexp_replace * regexp_split_to_array * regexp_split_to table * strpos In the citext.sql.in file the following functions are created using the dollar quoting syntax: * replay * split_part * translate I do not have a strong preference either way and I do not even care if they are consistent. I am interested to see if there was a reason for using both syntaxes for these functions. 4. The regression tests successfully pass when PostgreSQL is built with --with-libxml. They fail when the PostgreSQL is built without --with-libxml. Since the default PostgreSQL configuration does not include --with-libxml and is not run-time detected when the libxml2 libraries are present on the system I would recommend adding an additional expected output file (citext_1.out) that covers the conditions when PostgreSQL is compiled without --with-libxml. As an experiment, I was able to add the citext_1.out and the regression tests passed with and without the --with-libxml option. Review of the additional regression tests show they provide coverage of the new casts and functions added with this patch. Overall I think the patch looks good. After reviewing the patch, I played with citext for an hour or so and I did not encounter any bugs or other surprises. Thanks, - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question regarding the database page layout.
Hello all, On Mon, Aug 11, 2008 at 2:24 AM, Gregory Stark [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: After a cursory glance at the HeapTupleHeaderData structure, it appears it could be aligned with INTALIGN instead of MAXALIGN. The one structure I was worried about was the 6 byte t_ctid structure. The comments in src/include/storage/itemptr.h file indicate the ItemPointerData structure is composed of 3 int16 fields. So everthing in the HeapTupleHeaderData structure is 32-bits or less. Sure, but the tuple itself could contain something with double alignment. If you have a bigint or double in the tuple then heap_form_tuple needs to know where to put it so it ends up at right alignment. For fun, I looked around in heap_form_tuple() today to see how big of a job this change would be. It did not seem very hard to implement. I know there are probably several other places I missed with this patch, but this patch does pass the regression tests and simple testing I threw at it (including ALTER TABLE ADD COLUMN, etc). The patch concept is fairly simple. 1. Add a new boolean local variable: require_max_align (initialized to false). 2. When we loop through the attributes, check to see if any attributes require double alignment. If they do, set require_max_align = true. 3. If the tuple has OIDS (double alignment requirement), set require_max_align = true. 4. If require_max_align = true, use the MAXALIGN macro; otherwise use the INTALIGN macro. A check of the physical storage for my previous test case does show the tuple length to be 44 bytes instead of the previous 48 bytes, so the patch does appear to work. I have not run any performance tests using this patch, etc. Attached is my proof-of-concept patch for this idea in case someone wants to inform me of other problems this patch will introduce :) Thanks, - Ryan *** a/src/backend/access/common/heaptuple.c --- b/src/backend/access/common/heaptuple.c *** *** 863,868 heap_form_tuple(TupleDesc tupleDescriptor, --- 863,869 data_len; int hoff; bool hasnull = false; + bool require_max_align = false; Form_pg_attribute *att = tupleDescriptor-attrs; int numberOfAttributes = tupleDescriptor-natts; int i; *** *** 886,891 heap_form_tuple(TupleDesc tupleDescriptor, --- 887,895 */ for (i = 0; i numberOfAttributes; i++) { + if (att[i]-attalign == 'd') + require_max_align = true; + if (isnull[i]) hasnull = true; else if (att[i]-attlen == -1 *** *** 907,916 heap_form_tuple(TupleDesc tupleDescriptor, if (hasnull) len += BITMAPLEN(numberOfAttributes); ! if (tupleDescriptor-tdhasoid) len += sizeof(Oid); ! hoff = len = MAXALIGN(len); /* align user data safely */ data_len = heap_compute_data_size(tupleDescriptor, values, isnull); --- 911,925 if (hasnull) len += BITMAPLEN(numberOfAttributes); ! if (tupleDescriptor-tdhasoid) { len += sizeof(Oid); + require_max_align = true; + } ! if (require_max_align == true) ! hoff = len = MAXALIGN(len); /* align user data safely */ ! else ! hoff = len = INTALIGN(len); /* align user data safely */ data_len = heap_compute_data_size(tupleDescriptor, values, isnull); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] statement_cost_limit for regression testing.
Hello Simon, On Mon, Sep 1, 2008 at 9:35 AM, Simon Riggs [EMAIL PROTECTED] wrote: On Thu, 2008-08-28 at 19:57 -0700, Ryan Bradetich wrote: I just wanted to throw out another possible use for this GUC. There maybe a better way to solve this problem, but I believe this patch would be useful for regression testing. It's possible to do this using planner hooks, so no patch needed. Excellent! Time for me to read up on using these planner hooks. Thanks! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question regarding the database page layout.
Hello Martijn, You need to arrange testing on an architechture that has strict alignment reuiqrements. For example i386 doesn't care about alignment at all and will anything from anywhere, with performance degradation. Other architechtures will simply throw exceptions, that's the smoke test. From other comments in this thread, I still have quite a bit of work ahead of me :) If I can accomplish this, then I will run it on a different architecture. I have access to a parisc-linux and a powerpc system. Also, what's the performance cost? Not sure yet. My goal would be 0 impact on 32-bit system and something small on 64-bit systems that would hopefully be overcome by a performance gain in IO. Thanks for your feedback. - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question regarding the database page layout.
Hello Tom, On Tue, Sep 2, 2008 at 8:07 AM, Tom Lane [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: The patch concept is fairly simple. 1. Add a new boolean local variable: require_max_align (initialized to false). This really can't possibly work, because you'd need to propagate knowledge of the tuple's alignment requirement all over the place. In particular, how would code *reading* the tuple know where the data starts? Also, I don't think you get (very much of) the actual benefit unless the code that inserts tuples into disk pages knows to do something different in the int-align case. I figured there was something wrong with my proof-of-concept patch because it was too easy and it passed regression tests on the first attempt. I just didn't know what was wrong. After reading your comment, I went back and looked at my test data and I am embarrassed to admit that I misread the data :( Sure the tuple length is 44 bytes, but the data is still aligned on a 8 byte boundary. So this poc patch effectively does nothing :( As Greg mentioned, the code reading the user data would know where to start because of the t_hoff field. In my first email in this thread, I had looked at the requirements for the HeapTupleHeaderData to be MAXALIGN. I could not find any reason the HeapTupleHeaderData needed to be MAXALIGN, but Greg pointed out that the t_hoff filed is required to be MAXALIGN. I can only see two reasons why the t_hoff field is required to be MAXALIGN: 1. If the tuple has oids (OIDS require MAXALIGN). 2. If the tuple has any column requiring MAXALIGN. If neither of the above conditions are true, then it seems to me that t_hoff could safely be INTALIGN. I was working under the mistaken assumption that if I addressed this issue in heap_form_tuple, that it would be written down on disk this way. Obviously I was wrong and still have a lot of code reading and experimentation to do :) It's conceivable that we could make this work if we wanted to dedicate an infomask bit to showing whether the tuple needs int or double alignment. I don't really think it's worth the trouble though. I am not sure what an infomask bit is yet, but is it safe to assume (in theory) that if I get the page written correctly, that the tuple reading code would work because of the t_hoff field? Thanks for your feedback! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question regarding the database page layout.
Hello Greg, On Tue, Sep 2, 2008 at 8:30 AM, Gregory Stark [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: 4. If require_max_align = true, use the MAXALIGN macro; otherwise use the INTALIGN macro. Huh, I didn't think of doing it like that. But I'm confused. You seem to be tweaking the alignment of the data inside the tuple? After the tuple header? I thought we had only one byte of wasted space in there and that's used by the null bitmap. So unless you have more than 8 columns and some of them are null I wouldn't expect you to save any space. If you do then I guess you could save 4 bytes if the null bitmap is 2-5 bytes (mod 8) long. I was not trying to tweak the alignment of the data inside the tuple header, I was trying to adjust the alignment of t_hoff so it would not have the requirement of MAXALIGN. I believe my proof-of-concept patch was bad and I need to spend some more time on it tonight with the new knowledge I gained from this email thread. I thought the goal was to save space by aligning the tuples on the page more densely. That seems to me to be more fruitful as about half the tuples will save four bytes even on tables with small or missing null bitmaps. That is the goal. Basically my test data is 44 bytes in length for each tuple. I have no data in the tuple that is required to be MAXALIGN, but since t_hoff has the MAXALIGN requirement I throw away 4 bytes for each tuple (on a 64-bit machine). This proof-of-concept patch is to help me understand the PostgreSQL code and to see if I can recover those 4 bytes per tuple. Thanks again for your feedback! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question regarding the database page layout.
Hello Greg, On Tue, Sep 2, 2008 at 9:54 AM, Gregory Stark [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: The structure on the page is to have a bunch of tuples stored one after the other. Each tuple is maxaligned after the previous (actually before the previous since they start from the end but we'll ignore that -- the point is that they all *start* at a maxaligned position on the page). Within each tuple there's a header which is 23 bytes long. Then an optional null bitmap. In the header is t_hoff which points to the first byte of actual data -- effectively specifying the length of the null bitmap. t_hoff is maxaligned as well. Thanks for your persistence in explaining this issue to me. I think I am starting to understand the issues now and I appreciate the help. I think I see the error in my thinking. You and Tom are saying the tuple is MAXALIGN on the page as well. My assumption was that since t_hoff had to be MAXALIGN, the tuple on the page was being MAXALIGN by default. If I was able to INTALIGNt_hoff then the tuple on the page would be free to be INTALIGN as well. Tom gave me the next place to go look later tonight (PageAddItem). So by intaligning t_hoff you're adjusting where the data within the tuple fits by making the null bitmap 4 bytes shorter. That seems worth a few cycles but isn't going to affect every table. It'll only affect tables that have a null bitmap between 2-5 bytes (mod 8), ie, with 9-40 fields at least one of which is null. Tables with 9 fields are likely to be relatively wide which means saving 4 bytes isn't that much of a gain percentagewise. (Though they might be mostly null) The other 4 bytes you could save is by packing the whole tuples themselves more closely on the page. That happens when the item pointer is added and pointed to the tuple. To do that heap_form_tuple would have to return data back to the caller about the alignment needed to whomever is going to copy it into the page. AFAICS that could be done in the HeapTuple structure itself rather than in the tuple though which doesn't seem so bad. Tom may be seeing something I'm not though. Incidentally, can't you set the alignment during the compute_data_size() call instead of doing an extra pass through the attributes? Probably doesn't change the cost though. Actually I did not do an extra pass through the attributes. I added the check in an existing pass through the attributes. I did look at adding this check to the heap_compute_data_size(), but took the easy path to see if my patch worked. I will update my patch to do the alignment check in heap_compute_data_size(). Thanks! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question regarding the database page layout.
Hello Tom, Well, as Greg pointed out, setting t_hoff correctly should be sufficient for squeezing out useless padding between the tuple header and the tuple data. The real objection here is that that's leaving most of the possible gain still on the table. The tuple *as a whole* (header and data both) is still going to get maxaligned when it's plopped onto a disk page. I think that in real-world cases that's often where the main padding cost comes in: with 23-byte tuple headers and no OIDs, you aren't saving a thing by int-aligning t_hoff unless you have a nulls bitmap and it's the wrong number of bytes long. So IMHO the real problem is over near PageAddItem. Thanks for your feed back as well. Between you and Greg I think I finally understand the error in my thinking. I will investigate the PageAddItem() later tonight. - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [Patch Review] TRUNCATE Permission
Hello all, Robert Haas submitted the TRUNCATE permissions patch for the September commit fest: http://archives.postgresql.org/message-id/[EMAIL PROTECTED] I had some extra time tonight, so I downloaded, installed and reviewed this patch. Here are my findings: 1. I found the patch style to be consistent with the surrounding code. 2. The patch provides documentation updates and regression test updates. 3. The patch applies (with some fuzz) to the latest GIT tree. Three issues I found with the patch via code reading and verified via testing: 1. File: src/backend/catalog/aclchk.c: Function: pg_class_aclmask(): I believe the ACL_TRUNCATE trigger should be added to this check and mask. /* * Deny anyone permission to update a system catalog unless * pg_authid.rolcatupdate is set. (This is to let superusers protect * themselves from themselves.) Also allow it if allowSystemTableMods. * * As of 7.4 we have some updatable system views; those shouldn't be * protected in this way. Assume the view rules can take care of * themselves. ACL_USAGE is if we ever have system sequences. */ if ((mask (ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_USAGE)) IsSystemClass(classForm) classForm-relkind != RELKIND_VIEW !has_rolcatupdate(roleid) !allowSystemTableMods) { #ifdef ACLDEBUG elog(DEBUG2, permission denied for system catalog update); #endif mask = ~(ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_USAGE); } Here is where it is visible via the psql interface: template1=# select rolname, rolcatupdate from pg_authid; rolname | rolcatupdate -+-- rbrad | t (1 row) template1=# select has_table_privilege('pg_class', 'delete'); has_table_privilege - t (1 row) template1=# select has_table_privilege('pg_class', 'truncate'); has_table_privilege - t (1 row) template1=# update pg_authid set rolcatupdate = false; UPDATE 1 template1=# select has_table_privilege('pg_class', 'delete'); has_table_privilege - f (1 row) template1=# select has_table_privilege('pg_class', 'truncate'); has_table_privilege - t (1 row) I do not believe this is a huge issue since truncate is prohibited on the system catalogs by the truncate_check_rel(). template1=# truncate pg_authid; ERROR: permission denied: pg_authid is a system catalog 2. The src/test/regress/sql/privileges.sql regression test has tests for the has_table_privilege() function. It looks like all the other permissions are tested in this function, but there is not a test for the TRUNCATE privilege. 3. I believe you missed a documentation update in the ddl.sgml file: There are several different privileges: literalSELECT/, literalINSERT/, literalUPDATE/, literalDELETE/, literalREFERENCES/, literalTRIGGER/, literalCREATE/, literalCONNECT/, literalTEMPORARY/, literalEXECUTE/, and literalUSAGE/. I played around with the patch for an hour or so tonight and I did not observer any other unusual behaviors. Hopefully this review is useful. It is my first patch review for a commit-fest. I will update the wiki with a link to this email message for my review. Thanks, - Ryan
Re: [HACKERS] [Patch Review] TRUNCATE Permission
Hello all, On Sun, Aug 31, 2008 at 11:55 PM, Ryan Bradetich [EMAIL PROTECTED]wrote: I do not believe this is a huge issue since truncate is prohibited on the system catalogs by the truncate_check_rel(). template1=# truncate pg_authid; ERROR: permission denied: pg_authid is a system catalog I thought about this some more. I believe my suggestion was incorrect. Since truncate_check_rel() prevents the use of the truncate command on system catalogs, the TRUNCATE permission should always be stripped from the system catalogs. Here is the inconsistency I observed: template1=# \z pg_catalog.pg_authid Access privileges Schema | Name| Type | Access privileges +---+---+- pg_catalog | pg_authid | table | rbrad=arwdDxt/rbrad (1 row) template1=# select rolname, rolcatupdate from pg_authid; rolname | rolcatupdate -+-- rbrad | t (1 row) template1=# select has_table_privilege('pg_authid', 'truncate'); has_table_privilege - t (1 row) template1=# truncate pg_authid; ERROR: permission denied: pg_authid is a system catalog The TRUNCATE fails even though \z and has_table_privilege() said I had permissions. Compare with the DELETE privilege: template1=# select has_table_privilege('pg_authid', 'delete'); has_table_privilege - t (1 row) template1=# delete from pg_authid; DELETE 1 Thanks, - Ryan
Fwd: [HACKERS] [Patch Review] TRUNCATE Permission
I had intended to send this message to the pgsql-hackers mailing list as well. Thanks, - Ryan -- Forwarded message -- From: Ryan Bradetich [EMAIL PROTECTED] Date: Mon, Sep 1, 2008 at 2:20 PM Subject: Re: [HACKERS] [Patch Review] TRUNCATE Permission To: Tom Lane [EMAIL PROTECTED] Hello Tom, On Mon, Sep 1, 2008 at 1:00 PM, Tom Lane [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: I do not believe this is a huge issue since truncate is prohibited on the system catalogs by the truncate_check_rel(). Only when allowSystemTableMods is false. I think it would be a serious mistake for your patch to treat the system catalogs differently from other tables. Good Point. Looks like I still have more code reading to do :) This is Robert Haas's patch for the September 2008 commit-fest. I am just offering my review. Gives me a good excuse to dig into the PostgreSQL code base. Hopefully this review is useful to the person committing the patch. Here is the inconsistency I observed: It seems to me that you are assuming that lack of a TRUNCATE permission bit is the only valid reason for a permission denied failure. This is fairly obviously not so, since multiple permissions typically enter into any command (consider schema-level permissions for instance). I was just comparing the behavior between DELETE and TRUNCATE. My last suggestion for the TRUNCATE permission always being removed on the system tables is obviously bogus because of the allowSystemTableMods issue you raised. Does my first suggestion still make sense for removing the TRUNCATE in pg_class_aclmask() when pg_Authid.rolcatupdate is not set? Thanks, - Ryan
[HACKERS] statement_cost_limit for regression testing.
Hello, Sorry for the new thread on this topic, I did not have a copy in my inbox I could replay to :( I am not sure of the status of the patch, but I did read through the thread at: http://archives.postgresql.org/pgsql-hackers/2008-08/msg00054.php I just wanted to throw out another possible use for this GUC. There maybe a better way to solve this problem, but I believe this patch would be useful for regression testing. Here is the problem I ran into when regression testing the hash index on the unsigned integer type and how I could like to use the statement_cost_limit parameter: Regression testing steps: 1. Create the table: CREATE TABLE hash_i4_heap (seqno uint4, random uint4); 2. Create the hash index: CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random uint4_ops); 3. Load data into the hash_i4_heap table (data is from PostgreSQL regression suit). COPY hash_i4_heap FROM '/home/rbrad/src/PostgreSQL/src/test/regress/data/hash.data'; 4. Perform query: SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989; seqno | random ---+--- 15| 843938989 (1 row) This would pass the regression testing, but we did not actually test the hash index here: EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989; QUERY PLAN -- Seq Scan on hash_i4_heap (cost=0.00..137.00 rows=1 width=8) Filter: (random = 843938989) (2 rows) 5. Attempt to force an index scan: SET enable_seqscan = off; EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989; QUERY PLAN Seq Scan on hash_i4_heap (cost=1.00..10137.00 rows=1 width=8) Filter: (random = 843938989) (2 rows) 6. Still uses an sequential scan. But this query would have still passed my regression tests. Try the statement_cost_limit: SET statement_cost_limit = ; EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989; ERROR: execution plan is too expensive: 10137.00 SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989; ERROR: execution plan is too expensive: 10137.00 7. This is good because finally, my regression test failed since the sequential scan cost bonus is larger then the statement_cost_limit. For those interested, the reason it failed to use the hash index is because I did not cast the hash_i4_heap.random value to an uint4 type. EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989::uint4; QUERY PLAN -- Index Scan using hash_i4_index on hash_i4_heap (cost=0.00..8.27 rows=1 width=8) Index Cond: (random = '843938989'::uint4) (2 rows) The issue is fixed in my regression tests for the unsigned integer types, but it would be nice for the regression tests to fail in the future when the index is not used. I looked at how the main PostgreSQL regression tests handle this problem and as far as I can tell they do not. Maybe this is not a likely problem, but it seems we do not have a good way to ensure the indexes are actually being used during regression testing. Thanks, - Ryan P.S. There appears to be a bug in the statement_cost_limit1.patch: SET statement_cost_limit = 0; ERROR: 0 is outside the valid range for parameter statement_cost_limit (100 .. 2147483647)
Re: [HACKERS] [PgFoundry] Unsigned Data Types
On Sat, Aug 16, 2008 at 10:53 AM, Decibel! [EMAIL PROTECTED] wrote: On Aug 15, 2008, at 1:00 AM, Ryan Bradetich wrote: Here is the first pass at the unsigned data type I have been working on. I am planning on adding these to the September 2008 commitfest wiki page. The unsigned data type is not targeted for core, but for the uint PgFoundry project. Is the intention for the types to go into pg_catalog? It'd be nice if you could specify what schema they should be installed in. An uninstall would also be good. The pg_catalog made since to me at first (especially for my application), but on reflection I believe you are right. I will remove the references to the pg_catalog schema and allow the user to add the unsigned data type to any schema. Good catch on the uninstall script. I should have written this as well. I will post an update to the wiki later tonight. Thanks for doing this, I've wished we had uint types in the past, and I'm sure I will again in the future! I am glad it is useful. I needed it for my current project, and I was hoping others could use it as well. Thanks, - Ryan
[HACKERS] [PgFoundry] Unsigned Data Types
Hello all, Here is the first pass at the unsigned data type I have been working on. I am planning on adding these to the September 2008 commitfest wiki page. The unsigned data type is not targeted for core, but for the uint PgFoundry project. The uint.c.gz file is the main source file for the uint1, uint2, and uint4 data types. The uing.sql.gz file contains the SQL statements to add the unsigned data type to the database. The pg_atoui.c.gz file is based off the function in the PostgreSQL source code but works for unsigned data types instead of signed data types. The Makefile is used to build the unsigned data type shared library on Linux. The tests.tar.gz is my unit test suit that I worked on to make sure the unsigned integer types worked as expected. The tests cover cases like: * table creation with the unsigned integer types. * comparision operations. * INSERT statements (binary and text forms). * COPY statements (binary and text forms). * unique btree index support. In addition to correctness issues, I would also appreciate feedback on best practices and portability concerns. For example: I doubt my Makefiles are very portable. What is the proper solution to handle this? pgxs? Thanks, - Ryan uint.c.gz Description: GNU Zip compressed data uint.sql.gz Description: GNU Zip compressed data pg_atoui.c.gz Description: GNU Zip compressed data Makefile Description: Binary data tests.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question regarding the database page layout.
Hello all, I have been digging into the database page layout (specifically the tuples) to ensure the unsigned integer types were consuming the proper storage. While digging around, I found one thing surprising: It appears the heap tuples are padded at the end to the MAXALIGN distance. Below is my data that I used to come to this conclusion. (This test was performed on a 64-bit system with --with-blocksize=32). The goal was to compare data from comparable type sizes. The first column indicates the type (char, uint1, int2, uint2, int4, and uint4), the number in () indicates the number of columns in the table. The Length is from the .lp_off field in the ItemId structure. The Offset is from the .lp_len field in the ItemId structure. The Size is the offset difference. char (1)Length Offset Sizechar (9) Length Offset Size 25 32736 32 33 32728 40 25 32704 32 33 32688 40 25 32672 32 33 32648 40 25 32640 33 32608 uint1 (1) Length Offset Sizeuint1 (9) Length Offset Size 25 32736 3233 32728 40 25 32704 3233 32688 40 25 32672 3233 32648 40 25 32640 33 32608 int2 (1) Length Offset Sizeint2 (5) Length Offset Size 26 32736 3234 32728 40 26 32704 3234 32688 40 26 32672 3234 32648 40 26 32640 34 32608 uint2 (1)Length Offset Sizeunt2 (5) Length Offset Size 26 32736 3234 32728 40 26 32704 3234 32688 40 26 32672 3234 32648 40 26 32640 34 32608 int4 (1) Length Offset Sizeint4 (3) Length Offset Size 28 32736 32 36 32728 40 28 32704 32 36 32688 40 28 32672 32 36 32648 40 28 32640 36 32608 uint4 (1) Length Offset Sizeuint4 (3) Length Offset Size 28 32736 32 36 32728 40 28 32704 32 36 32688 40 28 32672 32 36 32648 40 28 32640 36 32608 From the documentation at: http://www.postgresql.org/docs/8.3/static/storage-page-layout.html and from the comments in src/include/access/htup.h I understand the user data (indicated by t_hoff) must by a multiple of MAXALIGN distance, but I did not find anything suggesting the heap tuple itself had this requirement. After a cursory glance at the HeapTupleHeaderData structure, it appears it could be aligned with INTALIGN instead of MAXALIGN. The one structure I was worried about was the 6 byte t_ctid structure. The comments in src/include/storage/itemptr.h file indicate the ItemPointerData structure is composed of 3 int16 fields. So everthing in the HeapTupleHeaderData structure is 32-bits or less. I am interested in attempting to generate a patch if this idea appears feasible. The current data set I am playing with it would save over 3GB of disk space. (Back of the envelope calculations indicate that 5% of my current storage is consumed by this padding. My tuple length is 44 bytes.) Thanks, - Ryan
Re: [HACKERS] Question regarding the database page layout.
Hello Greg, On Mon, Aug 11, 2008 at 2:24 AM, Gregory Stark [EMAIL PROTECTED]wrote: Ryan Bradetich [EMAIL PROTECTED] writes: After a cursory glance at the HeapTupleHeaderData structure, it appears it could be aligned with INTALIGN instead of MAXALIGN. The one structure I was worried about was the 6 byte t_ctid structure. The comments in src/include/storage/itemptr.h file indicate the ItemPointerData structure is composed of 3 int16 fields. So everthing in the HeapTupleHeaderData structure is 32-bits or less. Sure, but the tuple itself could contain something with double alignment. If you have a bigint or double in the tuple then heap_form_tuple needs to know where to put it so it ends up at right alignment. My first thought was we can still figure this out because the user data is already forced to be MAXALIGN. Then I realized oh, since that is true then I am still going to eat the padding anyhow. The padding would just move to one of two places: 1. To MAXALIGN the user data or 2. To MAXALIGN the heap tuple header. Thanks for the sanity check. I was wrapped up in looking for alignment requirements in the HeapTupleHeaderData structure, I overlooked the significance of the MAXALIGN on the user data. Thanks, - Ryan P.S. Just for the archive, it seems this idea still may be workable (need to look at the heap_form_tuple in significant more detail) if/when someone implements the proposal to separate the physical storage from the column order. That solution is a bit more than I am ready to tackle at the moment :) Maybe in the future.
[HACKERS] Initial Unsigned Integer data type performance test results.
Hello All, I wanted to pass some performance data on to the group regarding the unsigned integer data types I am working on. I tested on two systems running Ubuntu Hardy. The first system is an 8 x 2.66GHz x86-64 processor system. The second system is a 2 x 533 celeron i386 system. For this test I disabled autovaccuum and manually ran the vacuum during times that were not timed. I also added some sleep and sync commands to try and stabalize the test results. The integer and unsigned integer test results were also interleaved to compensate for other system activities. Finally, I have not done a lot of bench marking, so feedback and improvement ideas are welcome! I benchmarked four tests: 32-bit int2 vs uint2-- My expectation is these results would be approximately -- equal. The uint2 casts up to an int4 which is still 32-bits. 32-bit int4 vs uint4-- My expectation is operator should be approximately -- equal since I explicitly added int4 uint4 operators. 64-bit int2 vs uint2 -- My expectation is these results would be approximately -- equal. The uint2 casts up to an int4 which is less than -- the 64-bit processor bus width. 64-bit int4 vs uint4 -- My expectation is these results would be approximately -- equal. The uint4 casts up to an int8 which is the processor -- bus width. Here is the definition of each column (times are displayed in seconds): Rows-- The number of rows added to the table. Test -- The test number (I ran each benchmark 3 times). S Load -- Time to load the table with signed integers. U Load -- Time to load the table with unsigned integers. S OP -- Time to perform an operator on each row in the table (signed integers) U OP -- Time to perform an operator on each row in the table (unsigned integers) S OP -- Time to perform an AND operator on each row in the table (signed integers) U OP -- Time to perform an AND operator on each row in the table (unsigned integers) I still have some more investigating to do into the results of this data (i.e. why is the signed OP cheaper for int2 than uint2 types, etc), but I wanted to share this performance data with the community. Rough calculations show about a 4% performance hit for 32-bit processors to use the int8 type and a 6% performance hit for 64-bit processors. This seems like a reasonable trade-off for the reduced storage space I am after, and the reduction in operators Greg and Tom were concerned about. If there are no objections from the community, I will plan to complete my implementation of Tom's proposal at: http://archives.postgresql.org/pgsql-hackers/2008-07/msg01204.php. Thanks! - Ryan P.S. The code I have so far is not ready for review. It was implemented quickly to get performance numbers to determine if Tom's proposal would be acceptable by the community. I can send it out if people are interested, but I am planning on submitting it to a future commit fest once I have it cleaned up and better tested. Benchmark Data: == 32-bit int2 vs. uint2 Rows Test S Load U Load S OP U OP S OP U OP -- 100 1 000.002888 000.002151 000.010881 000.014691 000.011124 000.011000 100 2 000.002780 000.002127 000.011729 000.011611 000.012014 000.011925 100 3 000.002747 000.002085 000.010193 000.010318 000.010588 000.010576 1000 1 000.003201 000.003870 000.037837 000.037360 000.032064 000.032478 1000 2 000.003259 000.003912 000.033495 000.036281 000.032502 000.035195 1000 3 000.003201 000.003913 000.039156 000.035592 000.032405 000.040543 1 1 000.024683 000.021306 000.255958 000.329045 000.255887 000.283782 1 2 000.020214 000.021224 000.260252 000.290933 000.281468 000.255171 1 3 000.020371 000.020940 000.276401 000.264791 000.257598 000.257258 10 1 001.669571 001.687523 002.591442 002.682428 003.410724 003.490362 10 2 001.682251 001.702598 003.379377 002.855622 002.549476 002.583431 10 3 001.693429 001.684732 002.546024 002.641240 002.540556 003.366534 50 1 010.138317 011.014532 015.707597 015.61 015.394598 015.502639 50 2 010.042176 010.179163 015.290994 015.407479 015.332925 016.321578 50 3 010.047930 010.206489 015.016276 015.430527 015.201759 015.411601 100 1 020.762680 022.145950 030.338606
Re: [HACKERS] Type Categories for User-Defined Types
Tom, On Tue, Jul 29, 2008 at 2:33 PM, Tom Lane [EMAIL PROTECTED] wrote: Yeah, that's the point of the proposal. I think the issue has come up once or twice before, too, else I'd not be so interested in a general solution. (digs in archives ... there was some discussion of this in connection with unsigned integer types, and I seem to recall older threads but can't find any right now.) Anything I should be looking into and/or testing for unsigned integer support? Thanks! - Ryan P.S. I have most of the uint2 and uint4 types implemented. I am currently working on the unit and performance testing. I need to upgrade one of my Linux boxes first before I can complete my testing. I am hoping to post my code and performance testing later this week. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Type Categories for User-Defined Types
Hello Tom, On Wed, Jul 30, 2008 at 7:50 AM, Tom Lane [EMAIL PROTECTED] wrote: Dunno, I forget what the conclusion was about implicit casting for the unsigned types in your proposal. Have you experimented with seeing whether, eg, UNION'ing an unsigned with some signed-integer value behaves sensibly? That was not one of my tests, but I will add it :) Here are the performance tests I am planning / working on: 1. Loading of both integer and unsigned integers into a table. This test is mainly a sanity check to ensure the unsigned integers do not add significant time during inserts. In a perfect world, I believe they should take the same time when the unsigned integers and integers have equal widths. Experimentation is showing the unsigned integer to take slightly longer. I suspect this is due to the additional integer-unsigned integer cast? I am still digging into this for my personal curiosity. 2. Testing the operator. The unsigned integer type provides a native cast for the operator. 3. Testing the operator. The unsigned integer type does not provide a native cast for the operator, so they are cast up to the next larger size. I am testing this for both signed and unsigned integers with data sets of the following sizes: 100, 1000, 1, 10, 50, 1M, 5M, and 10M rows. I am planning to test on both 32-bit and 64-bit x86 platforms. If there is interest, I can also run these tests on 32-bit and 64-bit PowerPC platforms. I will add the union test to my test plan.Anything else I should add or any holes I am missing with this test plan? The thread I mention above was a year or so back and was originated by someone who wanted to duplicate mysql's behavior. Your proposal is a lot more limited and might not really need to try to put the unsigned types into the numeric category. Ah, ok. I will not worry about it for now. Thanks! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Unsigned integer support.
Hello Peter, On Fri, Jul 25, 2008 at 5:14 AM, Peter Eisentraut [EMAIL PROTECTED] wrote: Am Friday, 25. July 2008 schrieb Ryan Bradetich: PgFoundry already has an uint project: http://pgfoundry.org/projects/uint/ Unfortunately this project seems to have not gone anywhere. Last activity was late 2006 and there are not any files checked into the SCM repository. Is it acceptable to hijack this PgFoundry project? Or should I start a new project (assuming there is any interest in publishing this work). Please hijack the project and develop your code there. Of course you can always ask for advice here. I will work on getting the PgFoundry project setup. Thanks! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Unsigned integer support.
Hello Greg, On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: My plans for the example above would be: 1. SELECT 15 + 15 -- Throws overflow error. 2. SELECT 15::uint4 + 15 -- Returns 30::uint4. I think that wouldn't actually work. Postgres's parser immediately assigns a type to the bare unquoted integral constant so it would end up with a int4 type. Then when it has to pick an operator for uint4+int4 it wouldn't be able to cast the int4 to uint4 because there would be no implicit cast. You could make it work by having a uint4+int4 operator which returns uint4 but then you're going to need a *lot* of operators This was my plan. I performed some testing last night to verify that bare literals are considered plain integers and would not be implicitly casted to a different type (i.e. smallint or bigint). I am seeing three operators for most operations: 1. uint4 - uint4 = uint4 2. int4 - uint4= uint4 3. uint4 - int4= uint4 Is there something I need to watch out for when adding this number of operators (i.e. performance impact, etc)? Some tests I should be running to measure the impact of adding these operators? One other idea that's been mentioned before is treating integral constants like 15 as type unknown like the quoted '15' constant is. That way the parser would see uint4+unknown and could pick the uint4 operator. But that would be a pretty massive semantics change. This would require changes to the core PostgreSQL code correct? My goal for this type was to have it as an external project on PgFoundry since there does not appear to be much demand for it and unsigned types are not specified in the SQL standard. If the community decides this support would be better in core PostgreSQL code, then I am willing to help with that work, but I will need a significant amount of guidance :) With my limited knowledge, the best (and easiest) path seems to take advantage of the extensible type system in PostgreSQL and support unsigned integers as a PgFoundry project. Thanks for your review and comments! - Ryan -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Unsigned integer support.
Hello Dann, On Fri, Jul 25, 2008 at 1:06 PM, Dann Corbit [EMAIL PROTECTED] wrote: At the cost of one bit of storage, you have compatible types using Thanks for your review and feedback! Unfortunately, I do need the full range of the unsigned types for the project I am looking at. The reason I started working on these types is because it seemed wasteful to use the next size larger signed integer for the storage type of the unsigned integer. Thanks for the suggestion! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Unsigned integer support.
Tom, On Fri, Jul 25, 2008 at 12:32 PM, Tom Lane [EMAIL PROTECTED] wrote: Consider the idea of not having any uint4-specific arithmetic operators, but instead providing the following: * assignment casts from int4 and int8 to uint4 (these throw error if out of range, of course) * implicit cast from uint4 to int8 (can never fail) The effect of providing the latter cast would be that any arithmetic involving a uint4 column would automatically be done in int8. Which would make it a shade slower than a native implementation, but probably not enough slower to be a problem --- and you'd avoid having to write dozens of operators and underlying support functions. Storing into the uint4 column would work fine with no extra notation because of the assignment casts. This is an interesting idea that I will test out tonight. I did have the following concern looking through src/backend/utils/adt/int8.c: There is code that is optionally compiled based on the INT64_IS_BUSTED pre-processor define. Is this pre-processor define something I should worry about for portability with this plan? After I get uint types implemented, for fun I might try some benchmarks to see if I can detect the int8 overhead on a 32-bit system. Moreover, you'd avoid cluttering the system with a pile of cross-type operators, which we have recently realized are not a good thing, because they increase the likelihood of ambiguous operator problems --- see http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php Good to know. Thanks for the link. For uint8 you'd have to promote to numeric to guarantee no failure in the implicit cast; which is going to be a rather bigger performance hit, but I don't really see uint8 as being a type with huge demand. Hopefully I will not need the uint8 type. Right now for a project I am looking at I need the uint2 and uint4 types. uint8 support can come later if it is needed or requested. Now you probably *will* want cross-type comparison operators, if you are going to support indexing of unsigned columns, so that something like uint4col 42 can be indexed without any casting. But limiting yourself to the six basic comparison operators certainly makes it a much less bulky project. This sounds excellent! Hopefully by using these operators I will be able to avoid most of the casting to int8 for my use, while still providing the complete functionality for this type. Thanks again for your review and feedback! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [RFC] Unsigned integer support.
Hello hackers, I know the development community is in the middle of the July 2008 commit-fest, so I apologize if this design proposals are in appropriate at this time. I am looking to take advantage of PostgreSQL extensible type system and implement unsigned integer support. The data I am dealing with contains many unsigned data types and I am planning on using this type to reduce storage consumption. I am initially looking to add support for 16-bit and 32-bit unsigned integers with the potential to add 8-bit and 64-bit unsigned integers if needed or desired by the community. Searching through the list archives, I found two objections raised in the past: 1. Implicit casts between different data types. I am hoping the removal of many of the implicit casts in PostgreSQL 8.3 will simplify this task to where this objection can be removed. My plan (without much experimentation) is to have maybe a handful of casts (something like): * uint4 - integer * integer - uint4 * bigint - uint4 * integer - uint2 * uint2 - smallint and then provide operators to provide a reasonable set of functionality. My initial thought for this functionality is to provide default operators on any type that is not implicitly casted on the psql command-line. As an example, I am planning for the following SQL statements to work correctly: 30::uint4 + 10 and 30::uint4 10 My understanding is the SQL standard does not provide support for unsigned integers, so I am planning on making all casts from unsigned integers to other data types explicit. Is this acceptable to the community? Another question for the community is should we allow the following cast? -1::uint4 Even though this is acceptable c-code, I am leaning towards throwing an out-of-range error when this occurs. Are there some areas I am missing or should investigate further before working on this project? 2. There is not much demand for unsigned integer types. Not much I can do about that :) I am willing to post my work as a PgFoundry project. PgFoundry already has an uint project: http://pgfoundry.org/projects/uint/ Unfortunately this project seems to have not gone anywhere. Last activity was late 2006 and there are not any files checked into the SCM repository. Is it acceptable to hijack this PgFoundry project? Or should I start a new project (assuming there is any interest in publishing this work). Although I am not targeting inclusion for this type in the core PostgreSQL code, I would like to post code for review and receive feedback from the community on this work. As I understand this RFC is the first step in the process :) Once I have some code ready for review, is it acceptable to use the commit-fest wiki for this project? Thanks much for your time! - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Unsigned integer support.
Hello Tom, On Thu, Jul 24, 2008 at 10:10 PM, Tom Lane [EMAIL PROTECTED] wrote: Ryan Bradetich [EMAIL PROTECTED] writes: I am looking to take advantage of PostgreSQL extensible type system and implement unsigned integer support. This has been proposed before, and foundered before on the question of implicit coercions. If you're willing to make all coercions *to* unsigned types be explicit (or at most assignment), then I think it can be made to work without breaking anything. But usually the folk who ask for this feature are hoping that bare integer literals like 42 will get interpreted as unsigned when they want them to be. The problem with that wish is illustrated by select 15 + 15; These literals might be either int4 or uint4, therefore this command might yield either an integer-overflow error or 30::uint4. That's not a distinction you can fuzz over --- it's got to be one or the other, and backwards compatibility says it'd better be the first. I am in agreement with you on this. Since SQL does not specify unsigned types, I was assuming only explicit and assignment casts. I should have probably mentioned that in the RFC. Thanks for pointing this out. My main goal for this type is the reduced storage space. I am fine with people needing to cast to the unsigned types to benefit from the reduced storage space. My plans for the example above would be: 1. SELECT 15 + 15 -- Throws overflow error. 2. SELECT 15::uint4 + 15 -- Returns 30::uint4. I am hoping the removal of many of the implicit casts in PostgreSQL 8.3 will simplify this task to where this objection can be removed. The implicit casts we removed were cross-type-category cases. If you hope for unsigned types to be considered part of the numeric category, there's no guidance for you there. In fact, the real nub of the problem is what type shall be initially assigned to an integer-looking literal, and how will you get things to behave sanely if that initial choice wasn't what was desired. We still have some issues around the fact that 42 isn't considered a smallint. Throwing in another possible meaning isn't going to help. My understanding is the SQL standard does not provide support for unsigned integers, so I am planning on making all casts from unsigned integers to other data types explicit. It's really the other direction that would be contentious ... regards, tom lane Thanks for your comments! I have already started to play around a bit with the types and will hopefully have some code ready for review / feedback soon. - Ryan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Questions about indexes?
Hello postgres hackers, Been a while since I have participated on this list ... but I have a new itch to scratch Although the table schema is immaterial, I will provide it so we have a common framework for this discussion: host_id integer (not null) timestamp datetime(not null) categorytext(not null) [=5 chars] anomaly text(not null) [= 1024 chars] This table is used to store archived data, so each row in the table must be unique. Currently I am using a primary key across each column to enforce this uniqueness. This table currently has ~86 million rows and is 16+ GB in size. This primary key index is also 16+ GB in size, because it appears all the data is duplicated in the index. (I have only done some preliminary looking at the database file with strings, etc ... so this assumption is purly based on these observations). I am not sure why all the data is duplicated in the index ... but i bet it has to do with performance since it would save a lookup in the main table. Is there any benchmarks or papers related to this topic I should locate and read? I am curious about this because it seems the only advantaged gained is searching the index for the specified values Once the entry is found, the full entry needs to be pulled from the main table anyhow since the index does not contain all the data. Also with the increased size, it seems additional pressure would be put on the shared memory caches (no idea how this really works, just guessing! :)) Since my only requirement is that the rows be unique, I have developed a custom MD5 function in C, and created an index on the MD5 hash of the concatanation of all the fields. This has reduced the disk space usage considerably, as show below against my test database ~6 million rows at 1+ GB. All this data is based off the test database running 7.3.2: TypeSize --- Database Table 1188642816 All columns pkey1510252544 MD5 columns pkey 370999296 Just using MD5 hash data instead of all the columns is a considerable diskspace win going from 1.5 GB to 370 MB. Has anyone else solved this problem? Has anyone else looked into something like this and mind sharing so I do not have to re-invent the wheel? :) Also (assuming there is no papers / benchmarks proving data in index is a good idea), how difficult would it be to impliment an index type that extracts the data from the main table? Thanks for reading. I will be happy to field any question that I can, or read any papers, research, etc that relates to this topic. - Ryan P.S. the production database is running 7.2.4 if that makes a difference. -- Ryan Bradetich [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Questions about indexes?
On Mon, 2003-02-17 at 00:15, Tom Lane wrote: Ryan Bradetich [EMAIL PROTECTED] writes: On Sun, 2003-02-16 at 23:34, Tom Lane wrote: It's not real clear to me why you bother enforcing a constraint that the complete row be unique. Wouldn't a useful constraint be that the first three columns be unique? The table holds system policy compliance data. The catagory is basically the policy, and the anomaly is the detailed text explaining why the system is out of compliance. So the anomaly data is important (and often the reason why the key is unique). Well, sure the anomaly is important: it's the payload, the reason why you bother to have the table in the first place. But that doesn't mean it's part of the key. Generally the key would be the info you use to look up a particular anomaly text. In this example, it's not clear to me why you'd need/want two different anomaly texts entered for the same host_id and the same category at the same instant of time. ISTM there's something inadequate about your category column if you need that. Ok, I understand what you are asking now :) Let me make up a contrived example to show how the table is used. host_id 1 = hosta.somewhere.com host_id 2 = hostb.somewhere.com The catagories are coded so (made up examples): cat p101 = /etc/passwd check cat f101 = filesystem check. the table would look like: 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell. 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell. 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password. 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner. etc... So I do not need the anomaly to be part of the index, I only need it to I agree with you, that I would not normally add the anomally to the index, except for the unique row requirement. Thinking about it now, maybe I should guarentee unique rows via a check constraint... Thanks for making me think about this in a different way! - Ryan regards, tom lane -- Ryan Bradetich [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] a couple of minor itches: RI Trigger Names, and additional Alterownerships commands.
Hello postgresql-hackers, Been a while sine I've particiapated on this list so I wanted to say thank you for the great product postgresql 7.2.1 is! I have been doing some testing in preperation of a database upgrade from 7.0.3 to 7.2.1 and I have a few small itches to scratch, so I thought I'd get opinions from the experts :) Itch #1: Referential Integrity trigger names in psql \d output. Currently the \d ouput from psql displays non-obvious names for what the RI trigger actually does. Reading through the source code and quering the mailing lists indicate this is easily changed (and I have already done this on a test database without any ill effects so far). Here is what the \d displays from 7.2.1: Before: test=# \d foo Table foo Column | Type | Modifiers -+-+--- blah_id | integer | not null foo | text| not null Primary key: foo_pkey Triggers: RI_ConstraintTrigger_30670 test=# \d blah Table blah Column | Type | Modifiers -+-+--- blah_id | integer | not null blah| text| not null Primary key: blah_pkey Triggers: RI_ConstraintTrigger_30672, RI_ConstraintTrigger_30674 After: test=# \d foo Table foo Column | Type | Modifiers -+-+--- blah_id | integer | not null foo | text| not null Primary key: foo_pkey Triggers: RI_blah_id (insert) test=# \d blah Table blah Column | Type | Modifiers -+-+--- blah_id | integer | not null blah| text| not null Primary key: blah_pkey Triggers: RI_blah_id (delete), RI_blah_id (update) This change was made with a simple update to the pg_trigger system table for the tgname column. Searching through the code and the mailing list, it looks like the only constraint to the tgname column is that it needs to be unique (although the database schema does not inforce this via a unique index) since the OID tacked on to the RI_ConstraintTrigger_* was designed to keep this uniqueness. What I would propose is to base the RI_* off the constrain name provided during the RI_trigger creation, if the constrain name is not provided, then to default to the current nameing scheme. Can anyone think of side-affects of changing the tgname column in the pg_trigger system table? Does this proposal seem like an acceptable solution? Would there be interest in this if I provided a patch to do this? Itch #2: Alter ownership on a sequence, etc. Alter table provides the functionality to change the ownership of a table, but ownership on other structures like sequences, etc can not be changed without dropping and recreating as the new owner. Would there be any interest if I worked on a patch to do this too? Thanks again for all the hard work and a great database! - Ryan Bradetich ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])