Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Hello Jamie and Tom. Thank you very much for the feedback and reviews. I will attempt to answer all the questions I found in this thread in this one email. If I miss any questions, let me know and I will answer it :) Jamie: Thanks for the feedback on missing comments. I will go back and add more comments to the code. Jamie: Thanks for the patches. I have applied the Makefile patch to my local tree. I am still reviewing the regressions.diffs patch. I definitely see the issue now, I am still reviewing to see/understand if your solution is the proper fix. I am reviewing the main PostgreSQL regression tests to understand how the COPY is handled. Jamie: This patch is targeted for as a PGFoundry module. I wanted it reviewed by the PostgreSQL community for two reasons: 1. To make the data type is correct (i.e. the bugs you and Tom identified, etc). 2. To go through the community review process so other people can have faith/trust the module is correct. After the community is happy with the uint data type, I will commit it to the PGFoundry repository. Jamie: The uint1 is an unsigned 8-bit value. It is the unsigned variant of the char type. The uint8 type (which I did not provide support for in this patch would be the unsigned variant of the int8 (64-bit type). Jamie and Tom: Tom, you were correct. The -255::uint1 is being promoted to the int4 data type. Here is the sample c program I used to verify this: #include stdio.h #include stdlib.h #include string.h #include assert.h #include postgresql/libpq-fe.h int main() { PGconn *conn; PGresult *res; char query[255]; conn = PQconnectdb(host=127.0.0.1 dbname=test user=rbrad); assert(PQstatus(conn) == CONNECTION_OK); res = PQexec(conn, SELECT -255::uint1;); assert(PQresultStatus(res) == PGRES_TUPLES_OK); snprintf(query, 255, SELECT %d::regtype, PQftype(res, 0)); PQclear(res); res = PQexec(conn, query); assert(PQresultStatus(res) == PGRES_TUPLES_OK); printf(Result Type: %s\n, PQgetvalue(res, 0, 0)); PQclear(res); PQfinish(conn); return 0; } Output: Result Type: integer On Sun, Sep 7, 2008 at 9:07 AM, Tom Lane [EMAIL PROTECTED] wrote: Ah. The scalarltsel/scalargtsel stuff has always been a bit bogus for cross-type comparisons; it assumes it will know both or neither of the two datatypes. So you can get away with using those functions for uint uint (although they won't be very bright about it); but using them for uint int fails outright. If you read the comments around that stuff it leaves quite a lot to be desired, but I don't really have better ideas at the moment. The best near-term solution for the uint module is probably not to rely on scalarltsel/scalargtsel for uint comparisons, but to make its own selectivity functions that know the uint types plus whatever standard types you want to have comparisons with. Ok. Looks like I need to review these functions and develop new functions specific for the unsigned type. I will work on this tomorrow night and submit an updated patch. Thanks again for the feedback and reviews! - Ryan -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
On Mon, Sep 8, 2008 at 1:14 AM, Ryan Bradetich [EMAIL PROTECTED] wrote: If you read the comments around that stuff it leaves quite a lot to be desired, but I don't really have better ideas at the moment. The best near-term solution for the uint module is probably not to rely on scalarltsel/scalargtsel for uint comparisons, but to make its own selectivity functions that know the uint types plus whatever standard types you want to have comparisons with. Ok. Looks like I need to review these functions and develop new functions specific for the unsigned type. the same problem happens in joins, unions, hash, etc... so you have to look at those functions as well PS: Jaime not Jamie :) -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157 -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Also, it would be nice to use B-M(-H) for LIKE as well. Right offhand, that seems impossible, at least in patterns with %. Or were you thinking of trying to separate out the fixed substrings of a pattern and search for them with BMH? Yep, something like that. Even if it only handled the special case of '%foobar%', that would be nice, because that's a pretty common special case. Anyway, it's not material for this patch, since it'd involve pretty fundamental redesign of the LIKE code. Yep. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
It is nice test case. It should be part of hash index regression test. Zdenek Alex Hunsaker napsal(a): Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. For example (non lobotomized inthash8, I just brute forced some collisions for 0 so that anyone can try this) create table test_hash (num int8); insert into test_hash (num) values (0), (1805671691), (3294821164), (4294967297); create index test_hash_num_idx on test_hash using hash (num); select * from test_hash where num = 0; num 0 1805671691 3294821164 4294967297 set enable_bitmapscan to off; select * from test_hash where num = 0; num - 0 CVS HEAD, 8_3_STABLE obviously work if I force the index scan -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Tom Lane napsal(a): Alex Hunsaker [EMAIL PROTECTED] writes: On Fri, Sep 5, 2008 at 2:21 PM, Alex Hunsaker [EMAIL PROTECTED] wrote: Ok now that I made it so it actually *test* collisions, with the patch it always returns all rows that matched the hashed key. And here is the fix, we just forget to set the recheck flag for bitmap scans. For the convenience of anyone intending to test, here is an updated patch against CVS HEAD that incorporates Alex's fix. Tom, I attach cleanup patch which we discussed last commit fest. It introduce new macros HashGetMetaPage and HashGetBitmap and of course, it break backward on disk format compatibility which we will break it anyway when Xiao's patch will be committed. Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql bÄžné podadresáÅe: pgsql_hash.be486df40421/src a pgsql_hash/src bÄžné podadresáÅe: pgsql_hash.be486df40421/src/backend a pgsql_hash/src/backend bÄžné podadresáÅe: pgsql_hash.be486df40421/src/include a pgsql_hash/src/include bÄžné podadresáÅe: pgsql_hash.be486df40421/src/backend/access a pgsql_hash/src/backend/access bÄžné podadresáÅe: pgsql_hash.be486df40421/src/backend/access/hash a pgsql_hash/src/backend/access/hash diff -rc pgsql_hash.be486df40421/src/backend/access/hash/hash.c pgsql_hash/src/backend/access/hash/hash.c *** pgsql_hash.be486df40421/src/backend/access/hash/hash.c po záŠ8 16:14:00 2008 --- pgsql_hash/src/backend/access/hash/hash.c po záŠ8 16:14:00 2008 *** *** 527,533 * each bucket. */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); ! metap = (HashMetaPage) BufferGetPage(metabuf); orig_maxbucket = metap-hashm_maxbucket; orig_ntuples = metap-hashm_ntuples; memcpy(local_metapage, metap, sizeof(local_metapage)); --- 527,533 * each bucket. */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); ! metap = HashPageGetMeta(BufferGetPage(metabuf)); orig_maxbucket = metap-hashm_maxbucket; orig_ntuples = metap-hashm_ntuples; memcpy(local_metapage, metap, sizeof(local_metapage)); *** *** 629,635 /* Write-lock metapage and check for split since we started */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_WRITE, LH_META_PAGE); ! metap = (HashMetaPage) BufferGetPage(metabuf); if (cur_maxbucket != metap-hashm_maxbucket) { --- 629,635 /* Write-lock metapage and check for split since we started */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_WRITE, LH_META_PAGE); ! metap = HashPageGetMeta(BufferGetPage(metabuf)); if (cur_maxbucket != metap-hashm_maxbucket) { diff -rc pgsql_hash.be486df40421/src/backend/access/hash/hashinsert.c pgsql_hash/src/backend/access/hash/hashinsert.c *** pgsql_hash.be486df40421/src/backend/access/hash/hashinsert.c po záŠ8 16:14:00 2008 --- pgsql_hash/src/backend/access/hash/hashinsert.c po záŠ8 16:14:00 2008 *** *** 69,75 /* Read the metapage */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); ! metap = (HashMetaPage) BufferGetPage(metabuf); /* * Check whether the item can fit on a hash page at all. (Eventually, we --- 69,75 /* Read the metapage */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); ! metap = HashPageGetMeta(BufferGetPage(metabuf)); /* * Check whether the item can fit on a hash page at all. (Eventually, we diff -rc pgsql_hash.be486df40421/src/backend/access/hash/hashovfl.c pgsql_hash/src/backend/access/hash/hashovfl.c *** pgsql_hash.be486df40421/src/backend/access/hash/hashovfl.c po záŠ8 16:14:00 2008 --- pgsql_hash/src/backend/access/hash/hashovfl.c po záŠ8 16:14:00 2008 *** *** 187,193 _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE); _hash_checkpage(rel, metabuf, LH_META_PAGE); ! metap = (HashMetaPage) BufferGetPage(metabuf); /* start search at hashm_firstfree */ orig_firstfree = metap-hashm_firstfree; --- 187,193 _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE); _hash_checkpage(rel, metabuf, LH_META_PAGE); ! metap = HashPageGetMeta(BufferGetPage(metabuf)); /* start search at hashm_firstfree */ orig_firstfree = metap-hashm_firstfree; *** *** 450,456 /* Read the metapage so we can determine which bitmap page to use */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); ! metap = (HashMetaPage) BufferGetPage(metabuf); /* Identify which bit to set */ ovflbitno = blkno_to_bitno(metap, ovflblkno); --- 450,456 /* Read the metapage so we can determine which bitmap page to use */ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE); ! metap = HashPageGetMeta(BufferGetPage(metabuf)); /* Identify which bit to set */ ovflbitno = blkno_to_bitno(metap, ovflblkno);
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Hello Jaime, the same problem happens in joins, unions, hash, etc... so you have to look at those functions as well Great! Added to the list to check. I am planning to build regression tests for these types to catch these errors in the future. Thanks again for your testing and review! PS: Jaime not Jamie :) Sorry! I will spell your name correctly from now on! -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] hash index improving v3
Zdenek Kotala [EMAIL PROTECTED] writes: I attach cleanup patch which we discussed last commit fest. It introduce new macros HashGetMetaPage and HashGetBitmap and of course, it break backward on disk format compatibility which we will break it anyway when Xiao's patch will be committed. If we accept that patch, I'm okay with including this one too. Still hoping for some performance testing though. (Note that this one isn't going to affect performance, so no need to include it for that purpose.) regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: Included patch with the following changes: * new postmaster mode known as consistent recovery, entered only when recovery passes safe/consistent point. InRedo is now set in all processes when started/connected in consistent recovery mode. I looked at this and didn't like the InRedo signaling at all. In the first place, it looks like you're expecting the flag to be passed down via fork(), but that won't work in EXEC_BACKEND mode. (Yes, easily fixed, but it's wrong as-is.) In the second place, the method of signaling it to the bgwriter looks to have race conditions: in principle, at least, I think the startup process could try to clear the shared-memory InRedo flag before the bgwriter has gotten as far as setting it. (This might work if the startup process can't exit redo mode without the bgwriter having finished a checkpoint, but such a dependency is too rube goldbergian for me.) ISTM that it would probably be better if there were exactly one InRedo flag in shared memory, probably in xlog.c's shared state, with the postmaster not being responsible for setting or clearing it; rather the startup process should do those things. * bgwriter and stats process starts in consistent recovery mode. bgwriter changes mode when startup process completes. I'm not sure about the interaction of this. In particular, what about recovery restart points before we have reached the safe stop point? I don't think we want to give up the capability of having those. Also, it seems pretty bogus to update the in-memory ControlFile checkpoint values before the restart point is actually done. It looks to me like what you have done is to try to use those fields as signaling for the restart request in addition to their existing purposes, which I think is confusing and probably dangerous. I'd rather there were a different signaling path and ControlFile maintains its currrent definition. I found the changes in bgwriter.c unreadable. You've evidently moved blocks of code around, but exactly what did you mean to change? Why is there so much duplicated code now? I've kept the distinction between restartpoints and checkpoints in code, to avoid convoluted code. Hmm, I dunno, it seems like that might be a bad choice. Are you sure it's not cleaner to just use the regular checkpoint code? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Included patch with the following changes: * new postmaster mode known as consistent recovery, entered only when recovery passes safe/consistent point. InRedo is now set in all processes when started/connected in consistent recovery mode. I looked at this and didn't like the InRedo signaling at all. In the first place, it looks like you're expecting the flag to be passed down via fork(), but that won't work in EXEC_BACKEND mode. (Yes, easily fixed, but it's wrong as-is.) OK, thanks. I didn't check that. In the second place, the method of signaling it to the bgwriter looks to have race conditions: in principle, at least, I think the startup process could try to clear the shared-memory InRedo flag before the bgwriter has gotten as far as setting it. (This might work if the startup process can't exit redo mode without the bgwriter having finished a checkpoint, but such a dependency is too rube goldbergian for me.) We never interrupt what the bgwriter does. If it is in the middle of doing a checkpoint when recovery finishes, then Startup process just waits behind it and then issues a shutdown checkpoint. If there's loads of blocks to be written it doesn't matter who writes them. There's only one soup spoon, and it doesn't matter who stirs the soup. ISTM that it would probably be better if there were exactly one InRedo flag in shared memory, probably in xlog.c's shared state, with the postmaster not being responsible for setting or clearing it; rather the startup process should do those things. So replace InRedo with an IsInRedo() function that accesses XLogCtl? Sounds good * bgwriter and stats process starts in consistent recovery mode. bgwriter changes mode when startup process completes. I'm not sure about the interaction of this. In particular, what about recovery restart points before we have reached the safe stop point? I don't think we want to give up the capability of having those. Maybe. I felt it made the code cleaner to give that up. Realistically its a fairly short window of time. But shouldn't be too hard to put back. Also, it seems pretty bogus to update the in-memory ControlFile checkpoint values before the restart point is actually done. It looks to me like what you have done is to try to use those fields as signaling for the restart request in addition to their existing purposes, which I think is confusing and probably dangerous. I'd rather there were a different signaling path and ControlFile maintains its currrent definition. OK I found the changes in bgwriter.c unreadable. You've evidently moved blocks of code around, but exactly what did you mean to change? Why is there so much duplicated code now? The patch utility did that. Some code reuse confused it. It's real easy to read though if you apply the patch and then read the main loop. You'll see what I mean. I've kept the distinction between restartpoints and checkpoints in code, to avoid convoluted code. Hmm, I dunno, it seems like that might be a bad choice. Are you sure it's not cleaner to just use the regular checkpoint code? When I tried to write it, it just looked to my eyes like every single line had a caveat which looked ugly and multiplied the testing. You're the code dude, always happy to structure things as you suggest. If you're sure, that is. Thanks for the review. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] Infrastructure changes for recovery
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote: Hmm, I dunno, it seems like that might be a bad choice. Are you sure it's not cleaner to just use the regular checkpoint code? When I tried to write it, it just looked to my eyes like every single line had a caveat which looked ugly and multiplied the testing. You're the code dude, always happy to structure things as you suggest. If you're sure, that is. No, was just wondering if the other way would be better. If you're sure not, that's fine. regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [HACKERS] TODO item: Implement Boyer-Moore searching (First time hacker)
Heikki Linnakangas Wrote: Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Also, it would be nice to use B-M(-H) for LIKE as well. Right offhand, that seems impossible, at least in patterns with %. Or were you thinking of trying to separate out the fixed substrings of a pattern and search for them with BMH? Yep, something like that. Even if it only handled the special case of '%foobar%', that would be nice, because that's a pretty common special case. It would be a quick test to check for % at either end. But we'd also need to ensure there were none in the middle. That might slow it down. I guess while looping over the inner chars checking for more %'s we could be building a skiptable. We'd have to abort it if we found any %'s of course I think with out giving it much thought _'s could be handled by BMH, these could just be given a skip distance of 2. Only having a lossy skip table throws that out the window without having a special check for _ David. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
Hello Jaime, why i need the cast in this case? even if the cast is really necesary (the message seems realy ugly) contrib_regression=# select * from t1 where f1 35; ERROR: unsupported type: 16486 contrib_regression=# select * from t1 where f1 35::uint4; f1 - 36 37 38 Can you send me the test case that generates this error? My regression tests do not include a table t1 so I was not able to reproduce this error directly. I was unable to reproduce this error by guessing. I tried the following tests: contrib_regression=# create table t1 (f1 int4 not null); CREATE TABLE contrib_regression=# insert into t1 values (1), (5), (10), (20); INSERT 0 4 contrib_regression=# select * from t1 where f1 7; f1 10 20 (2 rows) contrib_regression=# drop table t1; DROP TABLE contrib_regression=# create table t1 (f1 uint4 not null); CREATE TABLE contrib_regression=# insert into t1 values (1), (5), (10), (20); INSERT 0 4 contrib_regression=# select * from t1 where f1 7; f1 10 20 (2 rows) contrib_regression=# drop table t1; DROP TABLE contrib_regression=# create table t1 (f1 numeric not null); CREATE TABLE contrib_regression=# insert into t1 values (1), (5), (10), (20); INSERT 0 4 contrib_regression=# select * from t1 where f1 7; f1 10 20 (2 rows) contrib_regression=# drop table t1; DROP TABLE contrib_regression=# create table t1 (f1 int4 primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index t1_pkey for table t1 CREATE TABLE contrib_regression=# insert into t1 select * from generate_series(1, 10); INSERT 0 10 contrib_regression=# analyze t1; ANALYZE contrib_regression=# explain select f1 from t1 where f1 8; QUERY PLAN --- Index Scan using t1_pkey on t1 (cost=0.00..8.43 rows=10 width=4) Index Cond: (f1 8) (2 rows) contrib_regression=# select f1 from t1 where f1 8; f1 9 10 (2 rows) My testing shows this is working correctly. I am very interested in your test case to help me figure out what I am missing! Thanks! - Ryan -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches
Re: [PATCHES] [PgFoundry] Unsigned Data Types [1 of 2]
On Mon, Sep 8, 2008 at 10:08 PM, Ryan Bradetich [EMAIL PROTECTED] wrote: Can you send me the test case that generates this error? My regression tests do not include a table t1 so I was not able to reproduce this error directly. yeah! that table is mine! here are the scripts... contrib_regression=# select f1 from t1 where f1 8; f1 9 10 (2 rows) My testing shows this is working correctly. mmm... i rebuild my test env and it works for me this time... until i execute an analyze. I guess autovacuum executed an auto analyze last time... -- regards, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. (593) 87171157 drop table t1_int4; create table t1_int4 (f1 int4 primary key); insert into t1_int4 select generate_series(1, 255); select * from t1_int4, generate_series(1, 10) as foo where t1_int4.f1 = foo; drop table t1_uint4; create table t1_uint4 (f1 uint4 primary key); insert into t1_uint4 select generate_series(1, 255); select * from t1_uint4, generate_series(1, 10) as foo where t1_uint4.f1 = foo; drop table if exists t1_int4; create table t1_int4 (f1 int4 primary key); insert into t1_int4 select generate_series(1, 255); analyze; select * from t1_int4 where f1 30; drop table if exists t1_uint4; create table t1_uint4 (f1 uint4 primary key); insert into t1_uint4 select generate_series(1, 255); analyze; select * from t1_uint4 where f1 30; explain analyze select * from t1_uint4 where f1 30; -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches