Re: [HACKERS] Identity projection
Hello, sorry for long absense, # I had unexpected and urgent time-consuming tasks... :-( I have had a bit more precise inspection by two aspects, and they seemd showing that the difference should be the execution time of ExecProject. I'll be able to back fully next week with reviesed patch, and to take some other pathes to review... Although I said as following, the gain seems a bit larger... I'll recheck the testing conditions... I had inspected more precisely on two aspects maginifying the effect of this patch by putting 300 columns into table. First, explain analyze says the difference caused by this patch is only in the actual time of Result node. orig$ psql -c 'explain analyze select * from parenta' QUERY PLAN -- Result (cost=0.00..176667.00 rows=101 width=1202) (actual time=0.013.. *2406.792* rows=100 loops=1) - Append (cost=0.00..176667.00 rows=101 width=1202) (actual time=0.011..412.749 rows=100 loops=1) - Seq Scan on parenta (cost=0.00..0.00 rows=1 width=1228) (actual time=0.001..0.001 rows=0 loops=1) - Seq Scan on childa000 parenta (cost=0.00..176667.00 rows=100 width=1202) (actual time=0.009..334.633 rows=100 loops=1) Total runtime: 2446.474 ms (5 rows) patched$ psql -c 'explain analyze select * from parenta' QUERY PLAN -- Result (cost=0.00..176667.00 rows=101 width=1202) (actual time=0.011.. *507.239* rows=100 loops=1) - Append (cost=0.00..176667.00 rows=101 width=1202) (actual time=0.011..419.420 rows=100 loops=1) - Seq Scan on parenta (cost=0.00..0.00 rows=1 width=1228) (actual time=0.000..0.000 rows=0 loops=1) - Seq Scan on childa000 parenta (cost=0.00..176667.00 rows=100 width=1202) (actual time=0.010..335.721 rows=100 loops=1) Total runtime: 545.879 ms (5 rows) Second, the results of configure --enable-profiling shows that the exectime of ExecProject chages greately. This is consistent with what explain showed. orig: % cumulative self self total time seconds secondscalls s/call s/call name 60.29 1.26 1.26 105 0.00 0.00 slot_deform_tuple ! 30.14 1.89 0.63 100 0.00 0.00 ExecProject 3.35 1.96 0.07 304 0.00 0.00 ExecProcNode 0.96 1.98 0.02 102 0.00 0.00 ExecScan 0.96 2.00 0.02 166379 0.00 0.00 TerminateBufferIO 0.48 2.01 0.01 304 0.00 0.00 InstrStartNode 0.48 2.02 0.01 304 0.00 0.00 InstrStopNode ! 0.48 2.03 0.01 101 0.00 0.00 ExecResult 0.48 2.04 0.01 830718 0.00 0.00 LWLockAcquire 0.48 2.05 0.01 506834 0.00 0.00 hash_search_with_hash_value 0.48 2.06 0.01 341656 0.00 0.00 LockBuffer 0.48 2.07 0.01 168383 0.00 0.00 ReadBuffer_common 0.48 2.08 0.014 0.00 0.00 InstrEndLoop 0.48 2.09 0.01 ExecCleanTargetListLength 0.00 2.09 0.00 205 0.00 0.00 MemoryContextReset patched: % cumulative self self total time seconds secondscalls ms/call ms/call name 23.08 0.03 0.03 304 0.00 0.00 ExecProcNode 15.38 0.05 0.02 102 0.00 0.00 heapgettup_pagemode 15.38 0.07 0.02 830718 0.00 0.00 LWLockAcquire 7.69 0.08 0.01 205 0.00 0.00 MemoryContextReset 7.69 0.09 0.01 102 0.00 0.00 ExecScan 7.69 0.10 0.01 100 0.00 0.00 ExecStoreTuple 7.69 0.11 0.01 841135 0.00 0.00 LWLockRelease 7.69 0.12 0.01 168383 0.00 0.00 ReadBuffer_common 7.69 0.13 0.01 168383 0.00 0.00 UnpinBuffer 0.00 0.13 0.00 304 0.00 0.00 InstrStartNode ... ! 0.00 0.13 0.00 101 0.00 0.00 ExecResult ! 0.00 0.13 0.00 100 0.00 0.00 ExecProject regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing OID define
Phil Sorber wrote: Thom Brown and I were doing some hacking the other day and came across this missing define. We argued over who was going to send the patch in and I lost. So here it is. +1 I have been missing this #define. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Deparsing DDL command strings
Hi, Working on the Event Trigger next patch series, one of the challenge to address is deparsing the DDL commands so that the User Defined Function used by the trigger definition has that information. I'm making good progress on that, it's some amount of code but pretty straightforward. The only road blocker for now is best summarized by the following comment in src/backend/commands/tablecmds.c * Now add any newly specified column default values and CHECK constraints * to the new relation. These are passed to us in the form of raw * parsetrees; we need to transform them to executable expression trees * before they can be added. The most convenient way to do that is to * apply the parser's transformExpr routine, but transformExpr doesn't * work unless we have a pre-existing relation. So, the transformation has * to be postponed to this final step of CREATE TABLE. So I have a Node *parsetree containing some CHECK and DEFAULT raw expressions to work with. Those can reference non existing tables, either to-be-created or already-dropped. Should I work on transformExpr() so that it knows how to work with a non existing relation, or should I write a new transformRawExpr() that knows how to handle this case? Or do we want to limit the deparsing feature and not accept some CHECK and DEFAULT expressions (though not being able to cope with T_A_Const is a bummer)? (I don't mean to do it, I still have to mention the choice). Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] FW: PQntuples and PQgetvalue problem.
Hello, In my C++ program, i want to partition sql table, there is sample code below. It is working right. the problem is after func called 2 or 3 times in func1, PQntuples results 0 although tablename exist. Also when i disabled PQntuples at if statement, other psql function PQgetvalue(res) return error that is row number 0 is out of range 0..-1there are a couple function call same as func1(to check tablename ), at func. when i disabled one of them another at PQntuples funcx return 0. i check so may times. it is so weird. So i think you can help me. Best regards. Zafer, Sample code : func1{ char cmd[100] = ; snprintf(cmd,sizeof(cmd),select tablename from pg_tables where tablename='%s', tablename); PGresult *res = PQexec(conn, cmd ); if (PQresultStatus(res) != PGRES_TUPLES_OK) { logger.error(PQexec: %s, PQerrorMessage(conn)); PQclear(res); wexit(1); } if (PQntuples(res) == 0) // } func { PGresult *res; func1(); res=PQexec(conn,Select count(*) from persons); if (PQresultStatus(res) != PGRES_TUPLES_OK) { printf(PQexec: %s, PQerrorMessage(conn)); PQclear(res); exit(1); } if( atoi(PQgetvalue(res,0,0)) != 0) // some code PQexec()}}
Re: [HACKERS] Deparsing DDL command strings
Dimitri Fontaine dimi...@2ndquadrant.fr writes: So I have a Node *parsetree containing some CHECK and DEFAULT raw expressions to work with. Those can reference non existing tables, either to-be-created or already-dropped. Why don't you just pass the original query string, instead of writing a mass of maintenance-requiring new code to reproduce it? This would require (1) making sure the query string is still available where needed. I think we are 99% of the way there but maybe not 100%. (2) being able to identify the substring corresponding to the current command, when we're processing a multi-command string. The parser could easily provide that, I think --- we've just never insisted that it do so before. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deparsing DDL command strings
On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote: Dimitri Fontaine dimi...@2ndquadrant.fr writes: So I have a Node *parsetree containing some CHECK and DEFAULT raw expressions to work with. Those can reference non existing tables, either to-be-created or already-dropped. Why don't you just pass the original query string, instead of writing a mass of maintenance-requiring new code to reproduce it? Its not easy to know which tables are referenced in e.g. an ALTER TABLE statement if the original statement didn't schema qualify everything. Its also not really possible to trigger cascading effects like the creating of a sequence from a serial column that way... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deparsing DDL command strings
Andres Freund and...@2ndquadrant.com writes: On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote: Why don't you just pass the original query string, instead of writing a mass of maintenance-requiring new code to reproduce it? Its not easy to know which tables are referenced in e.g. an ALTER TABLE statement if the original statement didn't schema qualify everything. What he's talking about is deparsing the raw grammar output, which by definition contains no more information than is in the query string. Deparsing post-parse-analysis trees is a different problem (for which code already exists, unlike the raw-tree case). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER command reworks
Kohei KaiGai escribió: 2012/8/31 Kohei KaiGai kai...@kaigai.gr.jp: 2012/8/30 Robert Haas robertmh...@gmail.com: On Mon, Aug 13, 2012 at 3:35 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: Was it a right decision to track dependency of large object using oid of pg_largeobject, instead of pg_largeobject_metadata? IIRC, the reason why we used oid of pg_largeobject is backward compatibility for applications that tries to reference pg_depend with built-in oids. I think it was a terrible decision and I'm pretty sure I said as much at the time, but I lost the argument, so I'm inclined to think we're stuck with continuing to support that kludge. OK, we will keep to implement carefully... After reviewing this patch, I think we need to revisit this decision. OK, I'll split the patch into three (isn't it?) portions; RENAME, SET OWNER and SET SCHEMA, with all above your suggestions. As attached, I split off the original one into three portions; for set-schema, set-owner and rename-to. Please apply them in order of patch filename. Regarding to the error message, RenameErrorMsgAlreadyExists() was added to handle per object type messages in case when aclcheck_error() is not available to utilize. Here's the remaining piece; the RENAME part. I have reworked it a bit, but it needs a bit more work yet. Note this: alvherre=# alter function foo.g(int, int) rename to g; ERROR: function foo.g(integer,integer) already exists in schema foo The previous code would have said alvherre=# alter function foo.g(int, int) rename to g; ERROR: function g(integer, integer) already exists in schema foo Note that the function name is not qualified here. The difference is that the original code was calling funcname_signature_string() to format the function name, and the new code is calling format_procedure(). This seems wrong to me; please rework. I haven't checked other object renames, but I think they should be okay because functions are the only objects for which we pass the OID instead of the name to the error message function. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services *** a/src/backend/commands/aggregatecmds.c --- b/src/backend/commands/aggregatecmds.c *** *** 206,269 DefineAggregate(List *name, List *args, bool oldstyle, List *parameters) transTypeId, /* transition data type */ initval); /* initial condition */ } - - - /* - * RenameAggregate - * Rename an aggregate. - */ - void - RenameAggregate(List *name, List *args, const char *newname) - { - Oid procOid; - Oid namespaceOid; - HeapTuple tup; - Form_pg_proc procForm; - Relation rel; - AclResult aclresult; - - rel = heap_open(ProcedureRelationId, RowExclusiveLock); - - /* Look up function and make sure it's an aggregate */ - procOid = LookupAggNameTypeNames(name, args, false); - - tup = SearchSysCacheCopy1(PROCOID, ObjectIdGetDatum(procOid)); - if (!HeapTupleIsValid(tup)) /* should not happen */ - elog(ERROR, cache lookup failed for function %u, procOid); - procForm = (Form_pg_proc) GETSTRUCT(tup); - - namespaceOid = procForm-pronamespace; - - /* make sure the new name doesn't exist */ - if (SearchSysCacheExists3(PROCNAMEARGSNSP, - CStringGetDatum(newname), - PointerGetDatum(procForm-proargtypes), - ObjectIdGetDatum(namespaceOid))) - ereport(ERROR, - (errcode(ERRCODE_DUPLICATE_FUNCTION), - errmsg(function %s already exists in schema \%s\, - funcname_signature_string(newname, - procForm-pronargs, - NIL, - procForm-proargtypes.values), - get_namespace_name(namespaceOid; - - /* must be owner */ - if (!pg_proc_ownercheck(procOid, GetUserId())) - aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_PROC, - NameListToString(name)); - - /* must have CREATE privilege on namespace */ - aclresult = pg_namespace_aclcheck(namespaceOid, GetUserId(), ACL_CREATE); - if (aclresult != ACLCHECK_OK) - aclcheck_error(aclresult, ACL_KIND_NAMESPACE, - get_namespace_name(namespaceOid)); - - /* rename */ - namestrcpyForm_pg_proc) GETSTRUCT(tup))-proname), newname); - simple_heap_update(rel, tup-t_self, tup); - CatalogUpdateIndexes(rel, tup); - - heap_close(rel, NoLock); - heap_freetuple(tup); - } --- 206,208 *** a/src/backend/commands/alter.c --- b/src/backend/commands/alter.c *** *** 45,50 --- 45,261 #include utils/syscache.h #include utils/tqual.h + static HeapTuple get_catalog_object_by_oid(Relation catalog, Oid objectId); + + /* + * errmsg_obj_already_exists + * + * Returns an error message, to be used as errmsg(), indicating that the + * supplied object name conflicts with an existing object in the given + * namespace, depending on the given object type. + * + * Because of message translatability, we don't use getObjectDescription() + * here. + */ +
Re: [HACKERS] Deparsing DDL command strings
On Friday, October 05, 2012 04:24:55 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, October 05, 2012 04:03:03 PM Tom Lane wrote: Why don't you just pass the original query string, instead of writing a mass of maintenance-requiring new code to reproduce it? Its not easy to know which tables are referenced in e.g. an ALTER TABLE statement if the original statement didn't schema qualify everything. What he's talking about is deparsing the raw grammar output, which by definition contains no more information than is in the query string. Deparsing post-parse-analysis trees is a different problem (for which code already exists, unlike the raw-tree case). Sure. I am not saying Dimitri's approach is perfect. I am just listing some of reasons why I think just using the raw input string isn't sufficient... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit API for large object
Hi Anzai-san, The latest patch is fair enough for me, so let me hand over its reviewing for comitters. Thanks, 2012/10/1 Nozomi Anzai an...@sraoss.co.jp: Here is 64-bit API for large object version 3 patch. I checked this patch. It looks good, but here are still some points to be discussed. * I have a question. What is the meaning of INT64_IS_BUSTED? It seems to me a marker to indicate a platform without 64bit support. However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce says as follows: | Remove all the special-case code for INT64_IS_BUSTED, per decision that | we're not going to support that anymore. Removed INT64_IS_BUSTED. * At inv_seek(), it seems to me it checks offset correctness with wrong way, as follows: | case SEEK_SET: | if (offset 0) | elog(ERROR, invalid seek offset: INT64_FORMAT, offset); | obj_desc-offset = offset; | break; It is a right assumption, if large object size would be restricted to 2GB. But the largest positive int64 is larger than expected limitation. So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE) instead. Fixed. * At inv_write(), it definitely needs a check to prevent data-write upper 4TB. In case when obj_desc-offset is a bit below 4TB, an additional 1GB write will break head of the large object because of pageno overflow. Added a such check. * Please also add checks on inv_read() to prevent LargeObjectDesc-offset unexpectedly overflows 4TB boundary. Added a such check. * At inv_truncate(), variable off is re-defined to int64. Is it really needed change? All its usage is to store the result of len % LOBLKSIZE. Fixed and back to int32. Thanks, 2012/9/24 Nozomi Anzai an...@sraoss.co.jp: Here is 64-bit API for large object version 2 patch. I checked this patch. It can be applied onto the latest master branch without any problems. My comments are below. 2012/9/11 Tatsuo Ishii is...@postgresql.org: Ok, here is the patch to implement 64-bit API for large object, to allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to 32KB). The patch is based on Jeremy Drake's patch posted on September 23, 2005 (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai for the backend part and Yugo Nagata for the rest(including documentation patch). Here are changes made in the patch: 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) lo_initialize() gathers backend 64-bit large object handling function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. If client calls lo_*64 functions and backend does not support them, lo_*64 functions return error to caller. There might be an argument since calls to lo_*64 functions can automatically be redirected to 32-bit older API. I don't know this is worth the trouble though. I think it should definitely return an error code when user tries to use lo_*64 functions towards the backend v9.2 or older, because fallback to 32bit API can raise unexpected errors if application intends to seek the area over than 2GB. Currently lo_initialize() throws an error if one of oids are not available. I doubt we do the same way for 64-bit functions since this will make 9.3 libpq unable to access large objects stored in pre-9.2 PostgreSQL servers. It seems to me the situation to split the case of pre-9.2 and post-9.3 using a condition of conn-sversion = 90300. Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc. To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr is a pointer to 64-bit integer and actual data is placed somewhere else. There might be other way: add new member to union u to store 64-bit integer: typedef struct { int len; int isint; union { int*ptr;/* can't use void (dec compiler barfs) */ int integer; int64 bigint; /* 64-bit integer */ } u; } PQArgBlock; I'm a little bit worried about this way because PQArgBlock is a public interface. I'm inclined to add a new field for the union; that seems to me straight forward approach. For example, the manner in lo_seek64() seems to me confusable. It set 1 on isint field even though pointer is delivered actually. + argv[1].isint = 1; + argv[1].len = 8; + argv[1].u.ptr = (int *) len; Your proposal was not adopted per discussion. Also we add new type pg_int64: #ifndef NO_PG_INT64 #define HAVE_PG_INT64 1 typedef long long int pg_int64; #endif in
Re: [HACKERS] Deparsing DDL command strings
Tom Lane t...@sss.pgh.pa.us writes: Why don't you just pass the original query string, instead of writing a mass of maintenance-requiring new code to reproduce it? Do we have that original query string in all cases, including EXECUTE like spi calls from any PL? What about commands that internally set a parsetree to feed ProcessUtility() directly? Do we want to refactor them all just now as a prerequisite? Also, we need to normalize that command string. Tools needing to look at it won't want to depend on random white spacing and other oddities. Those tools could also use the Node *parsetree and be written only in C, but then what about giving them a head start by having a parsetree walker in our code base? Then we want to qualify object names. Some type names have already been taken care of apparently by the parser here, relation names not yet and we need to cope with non existing relation names. My freshly grown limited understanding is that we currently only know how to produce a cooked parse tree from the raw one if all referenced objects do exist in the catalogs, so that we will postpone some cooking (transform*) until the main object in a CREATE command are defined, right? Is that something we want to revisit? Another option would be to capture search_path and other parse time impacting GUCs, call that the query environment, and have a way to serialize and pass in the environment and restore it either on the same host or on another (replication is an important use case here). Yet another option would be to output both the original query string and something that's meant for easy machine parsing yet is not the internal representation of the query, so that we're free to hack the parser at will in between releases, even minor. Building that new code friendly document will require about the same amount of code as spitting out normalized SQL, I believe. Yet another option would be to go the sax way rather than the dom one: instead of spitting out a new command string have the user register callbacks and only implement walking down the parsetree and calling those. I'm not sure how much maintenance work we would save here, and I'm not seeing another reason why going that way. Yet another option would be to only provide for a hook and some space in the EventTriggerData structure for extensions to register themselves and provide whatever deparsing they need. But then we need to figure out a way for the user defined function to use the resulting opaque data, from any PL language, if only to be able to call some extension's API to process it. Looks like a very messy way to punt the work outside of core. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deparsing DDL command strings
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Why don't you just pass the original query string, instead of writing a mass of maintenance-requiring new code to reproduce it? Do we have that original query string in all cases, including EXECUTE like spi calls from any PL? As I said, I believe we are pretty close to that if not there already. There are a number of utility statements that *require* the source string to be provided, because they do parse analysis and since 8.4 or so the original string has been required for that. So we certainly have the string available at ProcessUtility. I've not looked at the event trigger patch at all, so I don't know what amount of refactoring is going to be required below that ... but the point I'm trying to make is that it would be a one-and-done task. Adding a requirement to be able to decompile raw parse trees will be a permanent drag on every type of SQL feature addition. Also, we need to normalize that command string. Tools needing to look at it won't want to depend on random white spacing and other oddities. Instead, they'll get to depend on the oddities of parse transformations (ie, what's done in the raw grammar vs. what's done in parse_analyze), which is something we whack around regularly. Besides which, you've merely asserted this requirement without providing any evidence that it's important at all, let alone important enough to justify the kind of maintenance burden that you want to saddle us with. Those tools could also use the Node *parsetree and be written only in C, but then what about giving them a head start by having a parsetree walker in our code base? Well, as far as a raw parsetree *walker* goes, there already is one in nodeFuncs.c. It does not follow that we need something that tries to reconstruct SQL from that. It's not clear to me that there is any production-grade use-case for which reconstructed SQL is more useful than examining the parse tree. Now, if you're talking about half-baked code that gets only some cases right, then yeah grepping reconstructed SQL might serve. But I'm not excited about doing a lot of work to support partial solutions. Then we want to qualify object names. Some type names have already been taken care of apparently by the parser here, relation names not yet and we need to cope with non existing relation names. Which is exactly what you *won't* be able to do anything about when looking at a raw parse tree. It's just a different presentation of the query string. My freshly grown limited understanding is that we currently only know how to produce a cooked parse tree from the raw one if all referenced objects do exist in the catalogs, Well, yeah. Anything else is magic not code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Michael Paquier michael.paqu...@gmail.com writes: OK, so the problem here is that the relcache, as the syscache, are relying on SnapshotNow which cannot be used safely as the false index definition could be read by other backends. That's one problem. It's definitely not the only one, if we're trying to change an index's definition while an index-accessing operation is in progress. I assume that the switch phase is not the longest phase of the concurrent operation, as you also need to build and validate the new index at prior steps. I am just wondering if it is acceptable to you guys to take a stronger lock only during this switch phase. We might be forced to fall back on such a solution, but it's pretty undesirable. Even though the exclusive lock would only need to be held for a short time, it can create a big hiccup in processing. The key reason is that once the ex-lock request is queued, it blocks ordinary operations coming in behind it. So effectively it's stopping operations not just for the length of time the lock is *held*, but for the length of time it's *awaited*, which could be quite long. Note that allowing subsequent requests to jump the queue would not be a good fix for this; if you do that, it's likely the ex-lock will never be granted, at least not till the next system idle time. Which if you've got one, you don't need a feature like this at all; you might as well just reindex normally during your idle time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Tom Lane escribió: Note that allowing subsequent requests to jump the queue would not be a good fix for this; if you do that, it's likely the ex-lock will never be granted, at least not till the next system idle time. Which if you've got one, you don't need a feature like this at all; you might as well just reindex normally during your idle time. Not really. The time to run a complete reindex might be several hours. If the idle time is just a few minutes or seconds long, it may be more than enough to complete the switch operation, but not to run the complete reindex. Maybe another idea is that the reindexing is staged: the user would first run a command to create the replacement index, and leave both present until the user runs a second command (which acquires a strong lock) that executes the switch. Somehow similar to a constraint created as NOT VALID (which runs without a strong lock) which can be later validated separately. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Alvaro Herrera alvhe...@2ndquadrant.com writes: Maybe another idea is that the reindexing is staged: the user would first run a command to create the replacement index, and leave both present until the user runs a second command (which acquires a strong lock) that executes the switch. Somehow similar to a constraint created as NOT VALID (which runs without a strong lock) which can be later validated separately. Yeah. We could consider CREATE INDEX CONCURRENTLY (already exists) SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies; or maybe just implement as swap of relfilenodes?) DROP INDEX CONCURRENTLY The last might have some usefulness in its own right, anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Michael Paquier michael.paqu...@gmail.com writes: On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane t...@sss.pgh.pa.us wrote: CREATE INDEX CONCURRENTLY (already exists) SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies; or maybe just implement as swap of relfilenodes?) DROP INDEX CONCURRENTLY OK. That is a different approach and would limit strictly the amount of code necessary for the feature, but I feel that it breaks the nature of CONCURRENTLY which should run without any exclusive locks. Hm? The whole point is that the CONCURRENTLY commands don't require exclusive locks. Only the SWAP command would. Until now all the approaches investigated (switch of relfilenode, switch of index OID) need to have an exclusive lock because we try to maintain index OID as consistent. In the patch I submitted, the new index created has a different OID than the old index, and simply switches names. So after the REINDEX CONCURRENTLY the OID of index on the table is different, but seen from user the name is the same. Is it acceptable to consider that a reindex concurrently could change the OID of the index rebuild? That is not going to work without ex-lock somewhere. If you change the index's OID then you will have to change pg_constraint and pg_depend entries referencing it, and that creates race condition hazards for other processes looking at those catalogs. I'm not convinced that you can even do a rename safely without ex-lock. Basically, any DDL update on an active index is going to be dangerous and probably impossible without lock, IMO. To answer your question, I don't think anyone would object to the index's OID changing if the operation were safe otherwise. But I don't think that allowing that gets us to a safe solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bugs in CREATE/DROP INDEX CONCURRENTLY
1. These operations think they can use ordinary heap_update operations to change pg_index entries when they don't have exclusive lock on the parent table. The lack of ex-lock means that another backend could be currently loading up its list of index OIDs for the table --- and since it scans pg_index with SnapshotNow to do that, the heap_update could result in the other backend failing to see this index *at all*. That's okay if it causes the other backend to not use the index for scanning... but not okay if it causes the other backend to fail to make index entries it is supposed to make. I think this could possibly be fixed by using nontransactional update-in-place when we're trying to change indisvalid and/or indisready, but I've not really thought through the details. 2. DROP INDEX CONCURRENTLY doesn't bother to do TransferPredicateLocksToHeapRelation until long after it's invalidated the index. Surely that's no good? Is it even possible to do that correctly, when we don't have a lock that will prevent new predicate locks from being taken out meanwhile? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bad Data back Door
Hackers, I’ve discovered something a bit disturbing at $work. We’re migrating (slowly) from Oracle to PostgreSQL, and in some cases are using oracle_fdw to copy data over. Alas, there are a fair number of text values in the Oracle database that, although the database is UTF-8, are actually something else (CP1252 or Latin1). When we copy from an oracle_fdw foreign table into a PostgreSQL table, PostgreSQL does not complain, but ends up storing the mis-encoded strings, even though the database is UTF-8. I assume that this is because the foreign table, as a table, is assumed by the system to have valid data, and therefor additional character encoding validation is skipped, yes? If so, I’m wondering if it might be possible to add some sort of option to the CREATE FOREIGN TABLE statement to the effect that certain values should not be trusted to be in the encoding they say they are. At any rate, I’m spending some quality time re-encoding bogus values I never expected to see in our systems. :-( Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] why repl_gram.h?
src/backend/replication/Makefile calls bison with -d to create a repl_gram.h header file, but that is not used anywhere. Is this an oversight? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bad Data back Door
David E. Wheeler da...@justatheory.com writes: Ive discovered something a bit disturbing at $work. Were migrating (slowly) from Oracle to PostgreSQL, and in some cases are using oracle_fdw to copy data over. Alas, there are a fair number of text values in the Oracle database that, although the database is UTF-8, are actually something else (CP1252 or Latin1). When we copy from an oracle_fdw foreign table into a PostgreSQL table, PostgreSQL does not complain, but ends up storing the mis-encoded strings, even though the database is UTF-8. I assume that this is because the foreign table, as a table, is assumed by the system to have valid data, and therefor additional character encoding validation is skipped, yes? Probably not so much assumed as nobody thought about it. In e.g. plperl we expend the cycles to do encoding validity checking on *every* string entering the system from Perl. I'm not sure why foreign tables ought to get a pass on that, especially when you consider the communication overhead that the encoding check would be amortized against. Now, having said that, I think it has to be the reponsibility of the FDW to apply any required check ... which makes this a bug report against oracle_fdw, not the core system. (FWIW, contrib/file_fdw depends on the COPY code, which will check encoding.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit API for large object
As a committer, I have looked into the patch and it seems it's good to commit. However I want to make a small enhancement in the documentation part: 1) lo_open section needs to mention about new 64bit APIs. Also it should include description about lo_truncate, but this is not 64bit APIs author's fault since it should had been there when lo_truncate was added. 2) Add mention that 64bit APIs are only available in PostgreSQL 9.3 or later and if the API is requested against older version of servers it will fail. If there's no objection, I would like commit attached patches. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp Hi Anzai-san, The latest patch is fair enough for me, so let me hand over its reviewing for comitters. Thanks, 2012/10/1 Nozomi Anzai an...@sraoss.co.jp: Here is 64-bit API for large object version 3 patch. I checked this patch. It looks good, but here are still some points to be discussed. * I have a question. What is the meaning of INT64_IS_BUSTED? It seems to me a marker to indicate a platform without 64bit support. However, the commit 901be0fad4034c9cf8a3588fd6cf2ece82e4b8ce says as follows: | Remove all the special-case code for INT64_IS_BUSTED, per decision that | we're not going to support that anymore. Removed INT64_IS_BUSTED. * At inv_seek(), it seems to me it checks offset correctness with wrong way, as follows: | case SEEK_SET: | if (offset 0) | elog(ERROR, invalid seek offset: INT64_FORMAT, offset); | obj_desc-offset = offset; | break; It is a right assumption, if large object size would be restricted to 2GB. But the largest positive int64 is larger than expected limitation. So, it seems to me it should be compared with (INT_MAX * PAGE_SIZE) instead. Fixed. * At inv_write(), it definitely needs a check to prevent data-write upper 4TB. In case when obj_desc-offset is a bit below 4TB, an additional 1GB write will break head of the large object because of pageno overflow. Added a such check. * Please also add checks on inv_read() to prevent LargeObjectDesc-offset unexpectedly overflows 4TB boundary. Added a such check. * At inv_truncate(), variable off is re-defined to int64. Is it really needed change? All its usage is to store the result of len % LOBLKSIZE. Fixed and back to int32. Thanks, 2012/9/24 Nozomi Anzai an...@sraoss.co.jp: Here is 64-bit API for large object version 2 patch. I checked this patch. It can be applied onto the latest master branch without any problems. My comments are below. 2012/9/11 Tatsuo Ishii is...@postgresql.org: Ok, here is the patch to implement 64-bit API for large object, to allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to 32KB). The patch is based on Jeremy Drake's patch posted on September 23, 2005 (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai for the backend part and Yugo Nagata for the rest(including documentation patch). Here are changes made in the patch: 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) lo_initialize() gathers backend 64-bit large object handling function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. If client calls lo_*64 functions and backend does not support them, lo_*64 functions return error to caller. There might be an argument since calls to lo_*64 functions can automatically be redirected to 32-bit older API. I don't know this is worth the trouble though. I think it should definitely return an error code when user tries to use lo_*64 functions towards the backend v9.2 or older, because fallback to 32bit API can raise unexpected errors if application intends to seek the area over than 2GB. Currently lo_initialize() throws an error if one of oids are not available. I doubt we do the same way for 64-bit functions since this will make 9.3 libpq unable to access large objects stored in pre-9.2 PostgreSQL servers. It seems to me the situation to split the case of pre-9.2 and post-9.3 using a condition of conn-sversion = 90300. Fixed so, and tested it by deleteing the lo_tell64's row from pg_proc. To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr is a pointer to 64-bit integer and actual data is placed somewhere else. There might be other way: add new member to union u to store 64-bit integer: typedef struct { int len; int isint; union { int*ptr;/* can't use void (dec compiler barfs) */ int integer; int64
Re: [HACKERS] Bad Data back Door
On 10/05/12 6:12 PM, Tom Lane wrote: Now, having said that, I think it has to be the reponsibility of the FDW to apply any required check ... which makes this a bug report against oracle_fdw, not the core system. (FWIW, contrib/file_fdw depends on the COPY code, which will check encoding.) I'm not sure of that. what if the FDW is used to connect to (say) a postgres database that is in POSIX/C ? is that checked for? I'd like to see some encoding validation and substitution functions in postgres. for instance, one that can take any supported encoding and convert it to the database encoding and generate an error on any invalid character. this translation could be identity (eg, UTF8-UTF8) whereupon it would just validate.a 2nd function would do the same, but replace errors with the substitution character in the target charset and not error. -- john r pierceN 37, W 122 santa cruz ca mid-left coast -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers