Re: [HACKERS] Bug in comparison of empty jsonb arrays to scalars
2016-11-10 13:54 GMT+07:00 Michael Paquier <michael.paqu...@gmail.com>: > On Thu, Nov 10, 2016 at 7:37 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > > Given that nobody actually cares what that sort order is, I think that > > having to jump through hoops in pg_upgrade in order to fix it is not a > > great tradeoff. I suggest changing the documentation to match the code. > Don't you in this case think we should match sort order in javascript? > Yes, definitely. > =# create table json_data (a jsonb); > CREATE TABLE > =# INSERT INTO json_data values ('{}'::jsonb), ('[]'::jsonb), > ('null'::jsonb), ('true'::jsonb), ('1'::jsonb), ('""'::jsonb); > INSERT 0 6 > =# SELECT * FROM json_data ORDER BY 1 DESC; > a > -- > {} > true > 1 > "" > null > [] > (6 rows) > So that's object > boolean > integer > string > NULL > array. > > a = [{}, [], null, true, 1, '""'] [ {}, [], null, true, 1, '""' ] > a.sort() [ [], '""', 1, {}, null, true ] > a.reverse() [ true, null, {}, 1, '""', [] ] So in this case it's boolean > NULL > Object > integer > string > array (tried in Chromium 53, Firefox 49 and Node v6.9.1) When I tried to search for the ECMA Standard for this behavior, i found this: http://blog.rodneyrehm.de/archives/14-Sorting-Were-Doing-It-Wrong.html. There are problems about automatic conversion in javascript, like this: > a = [{}, [], null, true, 1, 'someotherstring'] [ {}, [], null, true, 1, 'someotherstring' ] > a.sort().reverse() [ true, 'someotherstring', null, {}, 1, [] ] versus this: > a = [{}, [], null, true, 1, 'SomeOtherString'] [ {}, [], null, true, 1, 'SomeOtherString' ] > a.sort().reverse() [ true, null, {}, 'SomeOtherString', 1, [] ] and this: > a = [{}, [], null, true, 1, '2'] [ {}, [], null, true, 1, '2' ] > a.sort().reverse() [ true, null, {}, '2', 1, [] ] So we can't replicate javascript sort order without emulating those. Regards, Ali Akbar
Re: [HACKERS] Postgres service stops when I kill client backend on Windows
Greetings, 2015-10-11 0:18 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>: > > 2015-10-10 18:04 GMT+02:00 Dmitry Vasilyev <d.vasil...@postgrespro.ru>: > >> >> On Сб, 2015-10-10 at 10:55 -0500, Tom Lane wrote: >> > Dmitry Vasilyev <d.vasil...@postgrespro.ru> writes: >> > > I have written, what service stopped. This action is repeatable. >> > > You can run command 'psql -c "do $$ unpack p,1x8 $$ language >> > > plperlu;"' >> > > and after this windows service will stop. >> > >> > > so it is expected behave. After any unexpected client fails, the server is > restarted > I can confirm this too. In linux (i use Fedora 22), this is what happens when a server is killed: === 1. before: $ sudo systemctl status postgresql.service postgresql.service - PostgreSQL database server Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled) Active: active (running) since Jum 2015-10-09 16:25:43 WIB; 1 day 14h ago Process: 778 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS) Process: 747 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS) Main PID: 783 (postgres) CGroup: /system.slice/postgresql.service ├─ 783 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432 ├─ 812 postgres: logger process ├─ 821 postgres: checkpointer process ├─ 822 postgres: writer process ├─ 823 postgres: wal writer process ├─ 824 postgres: autovacuum launcher process ├─ 825 postgres: stats collector process └─17181 postgres: postgres test [local] idle === 2. killing and attempt to reconnect: $ sudo kill 17181 test=# select 1; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. === 3. service status after: $ sudo systemctl status postgresql.service postgresql.service - PostgreSQL database server Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled) Active: active (running) since Jum 2015-10-09 16:25:43 WIB; 1 day 14h ago Process: 778 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS) Process: 747 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS) Main PID: 783 (postgres) CGroup: /system.slice/postgresql.service ├─ 783 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432 ├─ 812 postgres: logger process ├─ 821 postgres: checkpointer process ├─ 822 postgres: writer process ├─ 823 postgres: wal writer process ├─ 824 postgres: autovacuum launcher process ├─ 825 postgres: stats collector process └─17422 postgres: postgres test [local] idle === The service status is still active (running), and new process 17422 handles the client. But this is what happens in Windows (win 7 32 bit, postgres 9.4): === 1. before: C:\Windows\system32>sc queryex postgresql-9.4 SERVICE_NAME: postgresql-9.4 TYPE : 10 WIN32_OWN_PROCESS STATE : 4 RUNNING (STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN) WIN32_EXIT_CODE: 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 PID: 3716 FLAGS : === 2. killing & attempt to reconnect: postgres=# select pg_backend_pid(); pg_backend_pid 2080 (1 row) C:\Windows\system32>taskkill /F /PID 2080 SUCCESS: The process with PID 2080 has been terminated. postgres=# select 1; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> === 3. service status after: C:\Windows\system32>sc query postgresql-9.4 SERVICE_NAME: postgresql-9.4 TYPE : 10 WIN32_OWN_PROCESS STATE : 1 STOPPED WIN32_EXIT_CODE: 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 === The client cannot reconnect. The service is dead. This is nasty, because any client can exploit some segfault bug like the one in perl Dmitry mentoined upthread, and the postgresql service is down. Note: killing the server process with pg_terminate_backend isn't causing this behavior to happen. The client reconnects normally, and the service is still running. Regards, Ali Akbar
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2015-01-20 18:17 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2015-01-20 14:22 GMT+07:00 Jeff Davis pg...@j-davis.com: The current patch, which I am evaluating for commit, does away with per-group memory contexts (it uses a common context for all groups), and reduces the initial array allocation from 64 to 8 (but preserves doubling behavior). Jeff Tomas, spotted this comment in v8 patch: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory may be freed by an explicit pfree() + * call (unless it's meant to be freed by destroying the parent context). + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Simple pfree(astate) call is not enough to free the memory. If it's scalar accumulation (initArrayResult), the user must pfree(astate-dvalues) and pfree(astate-dnulls) before astate. If it's array accumulation, pfree(astate-data) and pfree(astate-nullbitmap), with both can be null if no array accumulated and some other cases. If its any (scalar or array) accumulation, it's more complicated. I suggest it's simpler to just force the API user to destroy the parent context instead. So the comment become like this: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory is meant to be freed by destroying + * the parent context. + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Sorry, there is another comment of makeMdArrayResult, i suggest also changing it like this: @@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate, * beware: no check that specified dimensions match the number of values * accumulated. * + * beware: if the astate was not initialized within a separate memory + * context (i.e. using subcontext=true when calling initArrayResult), + * using release=true is illegal as it releases the whole context, + * and that may include other memory still used elsewhere (instead use + * release=false and release the memory with the parent context later) + * * astate is working state (must not be NULL) * rcontext is where to construct result
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2015-01-20 14:22 GMT+07:00 Jeff Davis pg...@j-davis.com: The current patch, which I am evaluating for commit, does away with per-group memory contexts (it uses a common context for all groups), and reduces the initial array allocation from 64 to 8 (but preserves doubling behavior). Jeff Tomas, spotted this comment in v8 patch: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory may be freed by an explicit pfree() + * call (unless it's meant to be freed by destroying the parent context). + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Simple pfree(astate) call is not enough to free the memory. If it's scalar accumulation (initArrayResult), the user must pfree(astate-dvalues) and pfree(astate-dnulls) before astate. If it's array accumulation, pfree(astate-data) and pfree(astate-nullbitmap), with both can be null if no array accumulated and some other cases. If its any (scalar or array) accumulation, it's more complicated. I suggest it's simpler to just force the API user to destroy the parent context instead. So the comment become like this: @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate, /* * makeArrayResult - produce 1-D final result of accumArrayResult * + * If the array build state was initialized with a separate memory context, + * this also frees all the memory (by deleting the subcontext). If a parent + * context was used directly, the memory is meant to be freed by destroying + * the parent context. + * * astate is working state (must not be NULL) * rcontext is where to construct result */ Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2015-01-18 10:44 GMT+07:00 Peter Eisentraut pete...@gmx.net: On 1/7/15 8:51 PM, Ali Akbar wrote: So now +1 for back-patching this. Done. Was a bit of work to get it into all the old versions. Wow. Thanks. Btw, for bug-fix patches like this, should the patch creator (me) also create patches for back branches? Regards, -- Ali Akbar
Re: [HACKERS] Add generate_series(numeric, numeric)
2014-12-18 19:35 GMT+07:00 Fujii Masao masao.fu...@gmail.com: On Mon, Dec 15, 2014 at 2:38 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: I was thinking something like this, added just after that para: warning para While the actual arguments to the function remain unchanged between calls, if you detoast the argument values (which is normally done transparently by the functionPG_GETARG_replaceablexxx/replaceable/function macro) in the transient context then the detoasted copies will be freed on each cycle. Accordingly, if you keep references to such values in your structfielduser_fctx/, you must either copy them into the structfieldmulti_call_memory_ctx/ after detoasting, or ensure that you detoast the values only in that context. /para /warning I'm OK with this. Wrapping the doc changes in a patch. Will add to next commitfest so it won't be lost. -- Ali Akbar *** a/doc/src/sgml/xfunc.sgml --- b/doc/src/sgml/xfunc.sgml *** *** 2986,2991 SRF_RETURN_DONE(funcctx) --- 2986,3005 structfieldmulti_call_memory_ctx/ while doing the first-call setup. /para + warning + para + While the actual arguments to the function remain unchanged between + calls, if you detoast the argument values (which is normally done + transparently by the + functionPG_GETARG_replaceablexxx/replaceable/function macro) + in the transient context then the detoasted copies will be freed on + each cycle. Accordingly, if you keep references to such values in + your structfielduser_fctx/, you must either copy them into the + structfieldmulti_call_memory_ctx/ after detoasting, or ensure + that you detoast the values only in that context. + /para + /warning + para A complete pseudo-code example looks like the following: programlisting -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Or else we implement what you suggest below (more comments below): Thinking about the 'release' flag a bit more - maybe we could do this instead: if (release astate-private_cxt) MemoryContextDelete(astate-mcontext); else if (release) { pfree(astate-dvalues); pfree(astate-dnulls); pfree(astate); } i.e. either destroy the whole context if possible, and just free the memory when using a shared memory context. But I'm afraid this would penalize the shared memory context, because that's intended for cases where all the build states coexist in parallel and then at some point are all converted into a result and thrown away. Adding pfree() calls is no improvement here, and just wastes cycles. As per Tom's comment, i'm using parent memory context instead of shared memory context below. In the future, if some code writer decided to use subcontext=false, to save memory in cases where there are many array accumulation, and the parent memory context is long-living, current code can cause memory leak. So i think we should implement your suggestion (pfreeing astate), and warn the implication in the API comment. The API user must choose between release=true, wasting cycles but preventing memory leak, or managing memory from the parent memory context. I'm wondering whether this is necessary after fixing makeArrayResult to use the privat_cxt flag. It's still possible to call makeMdArrayResult directly (with the wrong 'release' value). Another option might be to get rid of the 'release' flag altogether, and just use the 'private_cxt' - I'm not aware of a code using release=false with private_cxt=true (e.g. to build the same array twice from the same astate). But maybe there's such code, and another downside is that it'd break the existing API. In one possible use case, for efficiency maybe the caller will create a special parent memory context for all array accumulation. Then uses makeArrayResult* with release=false, and in the end releasing the parent memory context once for all. Yeah, although I'd much rather not break the existing code at all. That is - my goal is not to make it slower unless absolutely necessary (and in that case the code may be fixed per your suggestion). But I'm not convinced it's worth it. OK. Do you think we need to note this in the comments? Something like this: If using subcontext=false, the caller must be careful about memory usage, because makeArrayResult* will not free the memory used. But I think it makes sense to move the error handling into initArrayResultArr(), including the get_element_type() call, and remove the element_type from the signature. This means initArrayResultAny() will call the get_element_type() twice, but I guess that's negligible. And everyone who calls initArrayResultArr() will get the error handling for free. Patch v7 attached, implementing those two changes, i.e. * makeMdArrayResult(..., astate-private_cxt) * move error handling into initArrayResultArr() * remove element_type from initArrayResultArr() signature Reviewing the v7 patch: - applies cleanly to current master. patch format, whitespace, etc is good - make check runs without error - performance memory usage still consistent If you think we don't have to add the comment (see above), i'll mark this as ready for committer Regards, -- Ali Akbar
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2014-12-29 14:38 GMT+07:00 Jeff Davis pg...@j-davis.com: Just jumping into this patch now. Do we think this is worth changing the signature of functions in array.h, which might be used from a lot of third-party code? We might want to provide new functions to avoid a breaking change. V6 patch from Tomas only change initArrayResult* functions. initArrayResult is new API in 9.5 (commit bac2739), with old API still works as-is. -- Ali Akbar
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
Another positive benefit is that this won't break the code unless it uses the new API. This is a problem especially with external code (e.g. extensions), but the new API (initArray*) is not part of 9.4 so there's no such code. So that's nice. The one annoying thing is that this makes the API slighly unbalanced. With the new API you can use a shared memory context, which with the old one (not using the initArray* methods) you can't. But I'm OK with that, and it makes the patch smaller (15kB - 11kB). Yes, with this API, we can backpatch this patch to 9.4 (or below) if we need it there. I think this API is a good compromise of old API and new API. Ideally if we can migrate all code to new API (all code must call initArrayResult* before accumArrayResult*), we can remove parameter MemoryContext rcontext from accumArrayResult. Currently, the code isn't using the rcontext for anything except for old API calls (in first call to accumArrayResult). 2014-12-21 20:38 GMT+07:00 Tomas Vondra t...@fuzzy.cz: On 21.12.2014 02:54, Alvaro Herrera wrote: Tomas Vondra wrote: Attached is v5 of the patch, fixing an error with releasing a shared memory context (invalid flag values in a few calls). The functions that gain a new argument should get their comment updated, to explain what the new argument is for. Right. I've added a short description of the 'subcontext' parameter to all three variations of the initArray* function, and a more thorough explanation to initArrayResult(). With this API, i think we should make it clear if we call initArrayResult with subcontext=false, we can't call makeArrayResult, but we must use makeMdArrayResult directly. Or better, we can modify makeArrayResult to release according to astate-private_cxt: @@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate, dims[0] = astate-nelems; lbs[0] = 1; - return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true); + return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, astate-private_cxt); Or else we implement what you suggest below (more comments below): Thinking about the 'release' flag a bit more - maybe we could do this instead: if (release astate-private_cxt) MemoryContextDelete(astate-mcontext); else if (release) { pfree(astate-dvalues); pfree(astate-dnulls); pfree(astate); } i.e. either destroy the whole context if possible, and just free the memory when using a shared memory context. But I'm afraid this would penalize the shared memory context, because that's intended for cases where all the build states coexist in parallel and then at some point are all converted into a result and thrown away. Adding pfree() calls is no improvement here, and just wastes cycles. As per Tom's comment, i'm using parent memory context instead of shared memory context below. In the future, if some code writer decided to use subcontext=false, to save memory in cases where there are many array accumulation, and the parent memory context is long-living, current code can cause memory leak. So i think we should implement your suggestion (pfreeing astate), and warn the implication in the API comment. The API user must choose between release=true, wasting cycles but preventing memory leak, or managing memory from the parent memory context. In one possible use case, for efficiency maybe the caller will create a special parent memory context for all array accumulation. Then uses makeArrayResult* with release=false, and in the end releasing the parent memory context once for all. Regards, -- Ali Akbar
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2014-12-16 11:01 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2014-12-16 10:47 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2014-12-16 6:27 GMT+07:00 Tomas Vondra t...@fuzzy.cz: Just fast-viewing the patch. The patch is not implementing the checking for not creating new context in initArrayResultArr. I think we should implement it also there for consistency (and preventing future problems). Testing the performance with your query, looks promising: speedup is between 12% ~ 15%. Because i'm using 32-bit systems, setting work_mem to 1024GB failed: ERROR: 1073741824 is outside the valid range for parameter work_mem (64 .. 2097151) STATEMENT: SET work_mem = '1024GB'; psql:/media/truecrypt1/oss/postgresql/postgresql/../patches/array-agg.sql:20: ERROR: 1073741824 is outside the valid range for parameter work_mem (64 .. 2097151) Maybe because of that, in the large groups a test, the speedup is awesome: master: 16,819 ms with patch: 1,720 ms Looks like with master, postgres resort to disk, but with the patch it fits in memory. Note: I hasn't tested the large dataset. As expected, testing array_agg(anyarray), the performance is still the same, because the subcontext hasn't implemented there (test script modified from Tomas', attached). I implemented the subcontext checking in initArrayResultArr by changing the v3 patch like this: +++ b/src/backend/utils/adt/arrayfuncs.c @@ -4797,10 +4797,11 @@ initArrayResultArr(Oid array_type, Oid element_type, MemoryContext rcontext, bool subcontext) { ArrayBuildStateArr *astate; - MemoryContext arr_context; + MemoryContext arr_context = rcontext; /* by default use the parent ctx */ /* Make a temporary context to hold all the junk */ - arr_context = AllocSetContextCreate(rcontext, + if (subcontext) + arr_context = AllocSetContextCreate(rcontext, accumArrayResultArr, ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, Testing the performance, it got the 12%~15% speedup. Good. (patch attached) Looking at the modification in accumArrayResult* functions, i don't really comfortable with: 1. Code that calls accumArrayResult* after explicitly calling initArrayResult* must always passing subcontext, but it has no effect. 2. All existing codes that calls accumArrayResult must be changed. Just an idea: why don't we minimize the change in API like this: 1. Adding parameter bool subcontext, only in initArrayResult* functions but not in accumArrayResult* 2. Code that want to not creating subcontext must calls initArrayResult* explicitly. Other codes that calls directly to accumArrayResult can only be changed in the call to makeArrayResult* (with release=true parameter). In places that we don't want to create subcontext (as in array_agg_transfn), modify it to use initArrayResult* before calling accumArrayResult*. What do you think? As per your concern about calling initArrayResult* with subcontext=false, while makeArrayResult* with release=true: Also, it seems that using 'subcontext=false' and then 'release=true' would be a bug. Maybe it would be appropriate to store the 'subcontext' value into the ArrayBuildState and then throw an error if makeArrayResult* is called with (release=true subcontext=false). Yes, i think we should do that to minimize unexpected coding errors. In makeArrayResult*, i think its better to not throwing an error, but using assertions: Assert(release == false || astate-subcontext == true); Regards, -- Ali Akbar array-agg-anyarray.sql Description: application/sql diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 18ae318..48e66bf 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1271,13 +1271,13 @@ dblink_get_connections(PG_FUNCTION_ARGS) /* stash away current value */ astate = accumArrayResult(astate, CStringGetTextDatum(hentry-name), - false, TEXTOID, CurrentMemoryContext); + false, TEXTOID, CurrentMemoryContext, true); } } if (astate) PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, - CurrentMemoryContext)); + CurrentMemoryContext, true)); else PG_RETURN_NULL(); } diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index c16b38e..bd5eb32 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -671,7 +671,7 @@ transformRelOptions(Datum oldOptions, List *defList, char *namspace, /* No match, so keep old option */ astate = accumArrayResult(astate, oldoptions[i], false, TEXTOID, - CurrentMemoryContext); + CurrentMemoryContext, true); } } } @@ -758,12 +758,12 @@ transformRelOptions(Datum oldOptions, List *defList, char *namspace, astate
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2014-12-16 6:27 GMT+07:00 Tomas Vondra t...@fuzzy.cz: On 15.12.2014 22:35, Jeff Janes wrote: On Sat, Nov 29, 2014 at 8:57 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Hi, Attached is v2 of the patch lowering array_agg memory requirements. Hopefully it addresses the issues issues mentioned by TL in this thread (not handling some of the callers appropriately etc.). Hi Tomas, When configured --with-libxml I get compilation errors: xml.c: In function 'xml_xpathobjtoxmlarray': xml.c:3684: error: too few arguments to function 'accumArrayResult' xml.c:3721: error: too few arguments to function 'accumArrayResult' xml.c: In function 'xpath': xml.c:3933: error: too few arguments to function 'initArrayResult' xml.c:3936: error: too few arguments to function 'makeArrayResult' And when configured --with-perl, I get: plperl.c: In function 'array_to_datum_internal': plperl.c:1196: error: too few arguments to function 'accumArrayResult' plperl.c: In function 'plperl_array_to_datum': plperl.c:1223: error: too few arguments to function 'initArrayResult' Cheers, Thanks, attached is a version that fixes this. Just fast-viewing the patch. The patch is not implementing the checking for not creating new context in initArrayResultArr. I think we should implement it also there for consistency (and preventing future problems). Regards, -- Ali Akbar
Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg
2014-12-16 10:47 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2014-12-16 6:27 GMT+07:00 Tomas Vondra t...@fuzzy.cz: On 15.12.2014 22:35, Jeff Janes wrote: On Sat, Nov 29, 2014 at 8:57 AM, Tomas Vondra t...@fuzzy.cz mailto:t...@fuzzy.cz wrote: Hi, Attached is v2 of the patch lowering array_agg memory requirements. Hopefully it addresses the issues issues mentioned by TL in this thread (not handling some of the callers appropriately etc.). Hi Tomas, When configured --with-libxml I get compilation errors: xml.c: In function 'xml_xpathobjtoxmlarray': xml.c:3684: error: too few arguments to function 'accumArrayResult' xml.c:3721: error: too few arguments to function 'accumArrayResult' xml.c: In function 'xpath': xml.c:3933: error: too few arguments to function 'initArrayResult' xml.c:3936: error: too few arguments to function 'makeArrayResult' And when configured --with-perl, I get: plperl.c: In function 'array_to_datum_internal': plperl.c:1196: error: too few arguments to function 'accumArrayResult' plperl.c: In function 'plperl_array_to_datum': plperl.c:1223: error: too few arguments to function 'initArrayResult' Cheers, Thanks, attached is a version that fixes this. Just fast-viewing the patch. The patch is not implementing the checking for not creating new context in initArrayResultArr. I think we should implement it also there for consistency (and preventing future problems). Looking at the modification in accumArrayResult* functions, i don't really comfortable with: 1. Code that calls accumArrayResult* after explicitly calling initArrayResult* must always passing subcontext, but it has no effect. 2. All existing codes that calls accumArrayResult must be changed. Just an idea: why don't we minimize the change in API like this: 1. Adding parameter bool subcontext, only in initArrayResult* functions but not in accumArrayResult* 2. Code that want to not creating subcontext must calls initArrayResult* explicitly. Other codes that calls directly to accumArrayResult can only be changed in the call to makeArrayResult* (with release=true parameter). In places that we don't want to create subcontext (as in array_agg_transfn), modify it to use initArrayResult* before calling accumArrayResult*. What do you think? Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
I ran a test using postgres-US.fo built in the PostgreSQL source tree, which is 38 MB, and ran select unnest(xpath('//fo:bookmark-title', b, array[array['fo', 'http://www.w3.org/1999/XSL/Format']])) from data; (Table contains one row only.) The timings were basically indistinguishable between the three code versions. I'll try to reproduce your test. How big is your file? Do you have a link to the actual file? Could you share your load script? I use this xml sample: http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml Basically i loaded the xml to table u 100 times. Load script attached. Regards, -- Ali Akbar load_test.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-14 22:18 GMT+07:00 Ali Akbar the.ap...@gmail.com: I ran a test using postgres-US.fo built in the PostgreSQL source tree, which is 38 MB, and ran select unnest(xpath('//fo:bookmark-title', b, array[array['fo', 'http://www.w3.org/1999/XSL/Format']])) from data; (Table contains one row only.) The timings were basically indistinguishable between the three code versions. I'll try to reproduce your test. How big is your file? Do you have a link to the actual file? Could you share your load script? I use this xml sample: http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml Basically i loaded the xml to table u 100 times. Load script attached. Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think i'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. Thanks Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-15 11:02 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! Sorry, typo here. What i mean isn't invalid xml, but incomplete xml. Hence the next call to xpath with the previous result as its input will become a problem because the namespace will not match. What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. Thanks Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-15 11:06 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar the.ap...@gmail.com wrote: 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. OK. Could you then move this patch to the new CF with Needs Review with Peter as reviewer then? He seems to be looking at it anyway seeing the update from 12/11. OK. Moved to the new CF. Regards, -- Ali Akbar
Re: [HACKERS] Add generate_series(numeric, numeric)
2014-12-15 10:25 GMT+07:00 Andrew Gierth and...@tao11.riddles.org.uk: Fujii == Fujii Masao masao.fu...@gmail.com writes: Fujii Pushed. Bug found: regression=# select count(*) from generate_series(1::numeric,10) v, generate_series(1,v) w; count --- 0 (1 row) regression=# select count(*) from generate_series(1::numeric,10) v, generate_series(1,v+0) w; count --- 55 (1 row) The error is in the use of PG_GETARG_NUMERIC and init_var_from_num when setting up the multi-call state; init_var_from_num points at the original num's digits rather than copying them, but start_num and stop_num have just been (potentially) detoasted in the per-call context, in which case the storage will have been freed by the next call. Oops. Obviously this could also be fixed by not detoasting the input until after switching to the multi-call context, but it looks to me like that would be unnecessarily complex. Suggested patch attached. Thanks (Is it also worth putting an explicit warning about this kind of thing in the SRF docs?) I think yes, it will be good. The alternative is restructuring this paragraph in the SRF docs: The memory context that is current when the SRF is called is a transient context that will be cleared between calls. This means that you do not need to call pfree on everything you allocated using palloc; it will go away anyway. However, if you want to allocate any data structures to live across calls, you need to put them somewhere else. The memory context referenced by multi_call_memory_ctx is a suitable location for any data that needs to survive until the SRF is finished running. In most cases, this means that you should switch into multi_call_memory_ctx while doing the first-call setup. The important part However, if you want to allocate any data structures to live across calls, you need to put them somewhere else. is buried in the docs. But i think explicit warning is more noticeable for new developer like me. Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-11-26 0:38 GMT+07:00 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: 2014-10-27 11:20 GMT+01:00 Ali Akbar the.ap...@gmail.com: [ array_agg_anyarray-13.patch ] This patch is ready for commit I've committed this after some significant modifications. I did not like the API chosen, specifically the business about callers needing to deal with both accumArrayResult and accumMdArray, because that seemed pretty messy and error-prone. Thanks! When in the reviewing process, we tried to implement in existing API, and it was messy, so the last patch is with two API. We didn't think what you eventually did. 3 API: existing for scalar, a new API for array, _and_ a new API for both. Great!! Just curious, in accumArrayResultArr, while enlarging array and nullbitmaps, why it's implemented with: astate-abytes = Max(astate-abytes * 2, astate-nbytes + ndatabytes); and astate-aitems = Max(astate-aitems * 2, newnitems); won't it be more consistent if it's implemented just like in the first allocation?: while (astate-aitems = newnitems) astate-aitems *= 2; Anyway, thanks for the big modifications. I learned a lot from that. Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-11-04 22:16 GMT+07:00 Peter Eisentraut pete...@gmx.net: On 10/6/14 10:24 PM, Ali Akbar wrote: While reviewing the patch myself, i spotted some formatting problems in the code. Fixed in this v5 patch. Also, this patch uses context patch format (in first versions, because of the small modification, context patch format obfucates the changes. After reimplementation this isn't the case anymore) I think the problem this patch is addressing is real, and your approach is sound, but I'd ask you to go back to the xmlCopyNode() version, and try to add a test case for why the second argument = 1 is necessary. I don't see any other problems. OK. Because upstream code is fixed in current version, i'll revert to the previous version. Test case added to regression test. With =1 argument, the result is correct: local:piece xmlns:local=\http://127.0.0.1\; xmlns=\http://127.0.0.2\; id=\1\ internalnumber one/internal internal2/ /local:piece without the argument, the result is not correct, all children will be lost. Because of that, the other regression test will fail too because the children is not copied: *** 584,593 -- Text XPath expressions evaluation SELECT xpath('/value', data) FROM xmltest; ! xpath ! -- ! {valueone/value} ! {valuetwo/value} (2 rows) SELECT xpath(NULL, NULL) IS NULL FROM xmltest; --- 584,593 -- Text XPath expressions evaluation SELECT xpath('/value', data) FROM xmltest; !xpath ! ! {value/} ! {value/} (2 rows) SELECT xpath(NULL, NULL) IS NULL FROM xmltest; *** ... cut updated patch attached. I noticed somewhat big performance regression if the xml is large (i use PRODML Product Volume sample from energistics.org): * Without patch (tested 3 times): select unnest(xpath('//a:flow', x, ARRAY[['a',' http://www.prodml.org/schemas/1series']])) from u; unnest --- flow + kindgas lift/kind+ ... Time: 84,012 ms Time: 85,683 ms Time: 88,766 ms * With latest v6 patch (notice the correct result with namespace definition): select unnest(xpath('//a:flow', x, ARRAY[['a',' http://www.prodml.org/schemas/1series']])) from u; unnest --- flow xmlns=http://www.prodml.org/schemas/1series; + ... Time: 108,912 ms Time: 108,267 ms Time: 114,848 ms It's 23% performance regression. * Just curious, i'm also testing v5 patch performance (notice the namespace in the result): select unnest(xpath('//a:flow', x, ARRAY[['a',' http://www.prodml.org/schemas/1series']])) from u; unnest --- flow xmlns=http://www.prodml.org/schemas/1series; + kindgas lift/kind+ Time: 92,552 ms Time: 97,440 ms Time: 99,309 ms The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is much more cleaner than v5patch, should we consider the performance benefit? Anyway, thanks for the review! :) Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-10-27 9:11 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2014-10-27 1:38 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: Hi My idea is using new ArrayBuilder optimized for building multidimensional arrays with own State type. I think so casting to ArrayBuildState is base of our problems, so I don't would to do. Code in array_agg_* is simple, little bit more complex code is in nodeSubplan.c. Some schematic changes are in attachments. Thanks! The structure looks clear, and thanks for the example on nodeSubplan.c. I will restructure the v10 of the patch to this structure. Patch attached. Regards, -- Ali Akbar diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7e5bcd9..f59738a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry row entry indexterm +primaryarray_agg/primary + /indexterm + functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry + any + /entry + entry + the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry + indexterm primaryaverage/primary /indexterm indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680f..8c182a4 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + +SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array +--- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index 401bad4..eb4de3b 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -231,7 +231,10 @@ ExecScanSubPlan(SubPlanState *node, bool found = false; /* TRUE if got at least one subplan tuple */ ListCell *pvar; ListCell *l; + bool use_md_array_builder; + Oid md_array_element_type; ArrayBuildState *astate = NULL; + MdArrayBuildState *mdastate = NULL; /* * MULTIEXPR subplans, when executed, just return NULL; but first we @@ -366,8 +369,25 @@ ExecScanSubPlan(SubPlanState *node, /* stash away current value */ Assert(subplan-firstColType == tdesc-attrs[0]-atttypid); dvalue = slot_getattr(slot, 1, disnull); - astate = accumArrayResult(astate, dvalue, disnull, - subplan-firstColType, oldcontext); + /* + * use a fast array multidimensional builder when input is a array + * only check on first iteration. On subsequent, use the cached values + */ + if (astate == NULL mdastate == NULL) + { +md_array_element_type = get_element_type(subplan-firstColType); +use_md_array_builder = OidIsValid(md_array_element_type); + } + + if (use_md_array_builder) +mdastate = accumMdArray(mdastate, + disnull? NULL : + DatumGetArrayTypeP(dvalue), + disnull, md_array_element_type, + oldcontext); + else +astate = accumArrayResult(astate, dvalue, disnull, + subplan-firstColType, oldcontext); /* keep scanning subplan to collect all values */ continue; } @@ -439,6 +459,8 @@ ExecScanSubPlan(SubPlanState *node, /* We return the result in the caller's context */ if (astate != NULL) result = makeArrayResult(astate, oldcontext); + else if (mdastate != NULL) + result = makeMdArray(mdastate, oldcontext); else result = PointerGetDatum(construct_empty_array(subplan-firstColType)); } @@ -951,7 +973,10 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext) ListCell *pvar; ListCell *l; bool found = false; + bool use_md_array_builder; + Oid md_array_element_type; ArrayBuildState *astate = NULL; + MdArrayBuildState *mdastate = NULL; if (subLinkType == ANY_SUBLINK || subLinkType == ALL_SUBLINK) @@ -1018,8 +1043,25 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext) /* stash away current value */ Assert(subplan-firstColType == tdesc-attrs[0]-atttypid); dvalue = slot_getattr(slot, 1, disnull); - astate = accumArrayResult(astate, dvalue, disnull, - subplan-firstColType, oldcontext); + /* + * use a fast array multidimensional builder when input is a array + * only check on first iteration. On subsequent, use the cached values + */ + if (astate == NULL mdastate == NULL) + { +md_array_element_type = get_element_type(subplan-firstColType); +use_md_array_builder = OidIsValid
Re: [HACKERS] Function array_agg(array)
2014-10-27 16:15 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: Hi I did some minor changes in code * move tests of old or new builder style for array sublink out of main cycles * some API simplification of new builder - we should not to create identical API, mainly it has no sense minor changes in the patch: * remove array_agg_finalfn_internal declaration in top of array_userfuncs.c * fix comment of makeMdArray * fix return of makeMdArray * remove unnecesary changes to array_agg_finalfn Regards, -- Ali Akbar diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7e5bcd9..f59738a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry row entry indexterm +primaryarray_agg/primary + /indexterm + functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry + any + /entry + entry + the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry + indexterm primaryaverage/primary /indexterm indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680f..8c182a4 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + +SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array +--- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c index 401bad4..21b8de1 100644 --- a/src/backend/executor/nodeSubplan.c +++ b/src/backend/executor/nodeSubplan.c @@ -231,7 +231,10 @@ ExecScanSubPlan(SubPlanState *node, bool found = false; /* TRUE if got at least one subplan tuple */ ListCell *pvar; ListCell *l; + bool use_md_array_builder = false; + Oid md_array_element_type = InvalidOid; ArrayBuildState *astate = NULL; + MdArrayBuildState *mdastate = NULL; /* * MULTIEXPR subplans, when executed, just return NULL; but first we @@ -260,6 +263,16 @@ ExecScanSubPlan(SubPlanState *node, } /* + * use a fast array multidimensional builder when input is a array + * only check on first iteration. On subsequent, use the cached values + */ + if (subLinkType == ARRAY_SUBLINK) + { + md_array_element_type = get_element_type(subplan-firstColType); + use_md_array_builder = OidIsValid(md_array_element_type); + } + + /* * We are probably in a short-lived expression-evaluation context. Switch * to the per-query context for manipulating the child plan's chgParam, * calling ExecProcNode on it, etc. @@ -366,8 +379,16 @@ ExecScanSubPlan(SubPlanState *node, /* stash away current value */ Assert(subplan-firstColType == tdesc-attrs[0]-atttypid); dvalue = slot_getattr(slot, 1, disnull); - astate = accumArrayResult(astate, dvalue, disnull, - subplan-firstColType, oldcontext); + + if (use_md_array_builder) +mdastate = accumMdArray(mdastate, + disnull? NULL : + DatumGetArrayTypeP(dvalue), + disnull, md_array_element_type, + oldcontext); + else +astate = accumArrayResult(astate, dvalue, disnull, + subplan-firstColType, oldcontext); /* keep scanning subplan to collect all values */ continue; } @@ -439,6 +460,8 @@ ExecScanSubPlan(SubPlanState *node, /* We return the result in the caller's context */ if (astate != NULL) result = makeArrayResult(astate, oldcontext); + else if (mdastate != NULL) + result = PointerGetDatum(makeMdArray(mdastate, oldcontext, true)); else result = PointerGetDatum(construct_empty_array(subplan-firstColType)); } @@ -951,7 +974,10 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext) ListCell *pvar; ListCell *l; bool found = false; + bool use_md_array_builder = false; + Oid md_array_element_type = InvalidOid; ArrayBuildState *astate = NULL; + MdArrayBuildState *mdastate = NULL; if (subLinkType == ANY_SUBLINK || subLinkType == ALL_SUBLINK) @@ -960,6 +986,16 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext) elog(ERROR, CTE subplans should not be executed via ExecSetParamPlan); /* + * use a fast array multidimensional builder when input is a array + * only check on first iteration. On subsequent, use the cached values + */ + if (subLinkType == ARRAY_SUBLINK) + { + md_array_element_type = get_element_type(subplan-firstColType); + use_md_array_builder = OidIsValid
Re: [HACKERS] Function array_agg(array)
super I tested last version and I have not any objections. 1. We would to have this feature - it is long time number of our ToDo List 2. Proposal and design of multidimensional aggregation is clean and nobody has objection here. 3. There is zero impact on current implementation. From performance reasons it uses new array optimized aggregator - 30% faster for this purpose than current (scalar) array aggregator 4. Code is clean and respect PostgreSQL coding rules 5. There are documentation and necessary regress tests 6. Patching and compilation is clean without warnings. This patch is ready for commit Thank you for patch Thank you for the thorough review process. Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-10-27 1:38 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: Hi My idea is using new ArrayBuilder optimized for building multidimensional arrays with own State type. I think so casting to ArrayBuildState is base of our problems, so I don't would to do. Code in array_agg_* is simple, little bit more complex code is in nodeSubplan.c. Some schematic changes are in attachments. Thanks! The structure looks clear, and thanks for the example on nodeSubplan.c. I will restructure the v10 of the patch to this structure. Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-10-25 10:29 GMT+07:00 Ali Akbar the.ap...@gmail.com: I fixed small issue in regress tests and I enhanced tests for varlena types and null values. Thanks. it is about 15% faster than original implementation. 15% faster than array_agg(scalar)? I haven't verify the performance, but because the internal array data and null bitmap is copied as-is, that will be faster. 2014-10-25 1:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: Hi Ali I checked a code. I am thinking so code organization is not good. accumArrayResult is too long now. makeMdArrayResult will not work, when arrays was joined (it is not consistent now). I don't like a usage of state-is_array_accum in array_userfunc.c -- it is signal of wrong wrapping. Yes, i was thinking the same. Attached WIP patch to reorganizate the code. makeMdArrayResult works now, with supplied arguments act as override from default values calculated from ArrayBuildStateArray. In array_userfunc.c, because we don't want to override ndims, dims and lbs, i copied array_agg_finalfn and only change the call to makeMdArrayResult (we don't uses makeArrayResult because we want to set release to false). Another alternative is to create new makeArrayResult-like function that has parameter bool release. adding makeArrayResult1 (do you have better name for this?), that accepts bool release parameter. array_agg_finalfn becomes more clean, and no duplicate code in array_agg_anyarray_finalfn. next question: there is function array_append(anyarray, anyelement). Isn't time to define array_append(anyarray, anyarray) now? There is array_cat(anyarray, anyarray): /*- * array_cat : * concatenate two nD arrays to form an nD array, or * push an (n-1)D array onto the end of an nD array * */ Regards, -- Ali Akbar diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7e5bcd9..f59738a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry row entry indexterm +primaryarray_agg/primary + /indexterm + functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry + any + /entry + entry + the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry + indexterm primaryaverage/primary /indexterm indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680f..8c182a4 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + +SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array +--- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 41e973b..0261fcb 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -108,12 +108,16 @@ exprType(const Node *expr) type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(exprType((Node *) tent-expr); + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, +format_type_be(exprType((Node *) tent-expr); + } } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) @@ -139,12 +143,16 @@ exprType(const Node *expr) type = subplan-firstColType; if (subplan-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(subplan-firstColType; + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR
Re: [HACKERS] Function array_agg(array)
makeArrayResult1 - I have no better name now I found one next minor detail. you reuse a array_agg_transfn function. Inside is a message array_agg_transfn called in non-aggregate context. It is not correct for array_agg_anyarray_transfn Fixed. probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too. makeMdArrayResultArray and appendArrayDatum should be marked as static ok, marked all of that as static. I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming. I'm thinking it like this: - if we want to accumulate array normally, use accumArrayResult and makeArrayResult. If we accumulate scalar the result will be 1D array. if we accumulate array, the resulting dimension is incremented by 1. - if we want, somehow to affect the normal behavior, and change the dimensions, use makeMdArrayResult. Searching through the postgres' code, other than internal use in arrayfuncs.c, makeMdArrayResult is used only in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl array to postgres array. So if somehow we will accumulate array other than in array_agg, i think the most natural way is using accumArrayResult and then makeArrayResult. CMIIW Regards, -- Ali Akbar diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7e5bcd9..f59738a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry row entry indexterm +primaryarray_agg/primary + /indexterm + functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry + any + /entry + entry + the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry + indexterm primaryaverage/primary /indexterm indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680f..8c182a4 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + +SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array +--- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 41e973b..0261fcb 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -108,12 +108,16 @@ exprType(const Node *expr) type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(exprType((Node *) tent-expr); + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, +format_type_be(exprType((Node *) tent-expr); + } } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) @@ -139,12 +143,16 @@ exprType(const Node *expr) type = subplan-firstColType; if (subplan-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(subplan-firstColType; + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, + format_type_be(subplan-firstColType; + } } } else if (subplan-subLinkType == MULTIEXPR_SUBLINK) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 3e7dc85
Re: [HACKERS] Function array_agg(array)
2014-10-25 15:43 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: 2014-10-25 10:16 GMT+02:00 Ali Akbar the.ap...@gmail.com: makeArrayResult1 - I have no better name now I found one next minor detail. you reuse a array_agg_transfn function. Inside is a message array_agg_transfn called in non-aggregate context. It is not correct for array_agg_anyarray_transfn Fixed. probably specification dim and lbs in array_agg_finalfn is useless, when you expect a natural result. A automatic similar to array_agg_anyarray_finalfn should work there too. makeMdArrayResultArray and appendArrayDatum should be marked as static ok, marked all of that as static. I am thinking so change of makeArrayResult is wrong. It is developed for 1D array. When we expect somewhere ND result now, then we should to use makeMdArrayResult there. So makeArrayResult should to return 1D array in all cases. Else a difference between makeArrayResult and makeMdArrayResult hasn't big change and we have a problem with naming. I'm thinking it like this: - if we want to accumulate array normally, use accumArrayResult and makeArrayResult. If we accumulate scalar the result will be 1D array. if we accumulate array, the resulting dimension is incremented by 1. - if we want, somehow to affect the normal behavior, and change the dimensions, use makeMdArrayResult. Searching through the postgres' code, other than internal use in arrayfuncs.c, makeMdArrayResult is used only in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl array to postgres array. So if somehow we will accumulate array other than in array_agg, i think the most natural way is using accumArrayResult and then makeArrayResult. ok, there is more variants and I can't to decide. But I am not satisfied with this API. We do some wrong in structure. makeMdArrayResult is now ugly. One approach that i can think is we cleanly separate the structures and API. We don't touch existing ArrayBuildState, accumArrayResult, makeArrayResult and makeMdArrayResult, and we create new functions: ArrayBuildStateArray, accumArrayResultArray, makeArrayResultArray and makeArrayResultArrayCustom. But if we do that, the array(subselect) implementation will not be as simple as current patch. We must also change all the ARRAY_SUBLINK-related accumArrayResult call. Regarding current patch implementation, the specific typedefs are declared as: +typedef struct ArrayBuildStateScalar +{ + ArrayBuildState astate; ... +} ArrayBuildStateArray; so it necessities rather ugly code like this: + result-elemtype = astate-astate.element_type; Can we use C11 feature of unnamed struct like this? : +typedef struct ArrayBuildStateScalar +{ + ArrayBuildState; ... +} ArrayBuildStateArray; so the code can be a little less ugly by doing it like this: + result-elemtype = astate-element_type; I don't know whether all currently supported compilers implements this feature.. Can you suggest a better structure for this? If we can't settle on a better structure, i think i'll reimplement array_agg without the performance improvement, using deconstruct_array and unchanged accumArrayResult makeMdArrayResult. Thanks, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
you can check it? We can test, how performance lost we get. As second benefit we can get numbers for introduction new optimized array builder array_agg(anyarray) with deconstruct_array, unchanged accumArrayResult and makeMdArrayResult: INSERT 0 1 Time: 852,527 ms INSERT 0 1 Time: 844,275 ms INSERT 0 1 Time: 858,855 ms INSERT 0 1 Time: 861,072 ms INSERT 0 1 Time: 952,006 ms INSERT 0 1 Time: 953,918 ms INSERT 0 1 Time: 926,945 ms INSERT 0 1 Time: 923,692 ms INSERT 0 1 Time: 940,916 ms INSERT 0 1 Time: 948,700 ms INSERT 0 1 Time: 933,333 ms INSERT 0 1 Time: 948,869 ms INSERT 0 1 Time: 847,113 ms INSERT 0 1 Time: 908,572 ms Total: 12776.83 Avg: 912,63 with last patch (v10): INSERT 0 1 Time: 643,339 ms INSERT 0 1 Time: 608,010 ms INSERT 0 1 Time: 610,465 ms INSERT 0 1 Time: 613,931 ms INSERT 0 1 Time: 616,466 ms INSERT 0 1 Time: 634,754 ms INSERT 0 1 Time: 683,566 ms INSERT 0 1 Time: 656,665 ms INSERT 0 1 Time: 630,096 ms INSERT 0 1 Time: 607,564 ms INSERT 0 1 Time: 610,353 ms INSERT 0 1 Time: 626,816 ms INSERT 0 1 Time: 610,450 ms INSERT 0 1 Time: 614,342 ms Total: 8842,7 Avg: 631,6 It's 30% faster (i tried varlena element - text). I tried several times and it's consistent in +/- 30%. quick dirty non-optimized patch and the test script attached. Regards, -- Ali Akbar diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7e5bcd9..f59738a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry row entry indexterm +primaryarray_agg/primary + /indexterm + functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry + any + /entry + entry + the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry + indexterm primaryaverage/primary /indexterm indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680f..8c182a4 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + +SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array +--- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 41e973b..0261fcb 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -108,12 +108,16 @@ exprType(const Node *expr) type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(exprType((Node *) tent-expr); + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, +format_type_be(exprType((Node *) tent-expr); + } } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) @@ -139,12 +143,16 @@ exprType(const Node *expr) type = subplan-firstColType; if (subplan-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(subplan-firstColType; + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, + format_type_be(subplan-firstColType; + } } } else if (subplan-subLinkType == MULTIEXPR_SUBLINK) diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 3e7dc85..8fc8b49 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -668,10 +668,16 @@ build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, Assert(!te-resjunk); Assert(testexpr == NULL
Re: [HACKERS] Function array_agg(array)
Updated patch attached. 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. i've added some regression tests in arrays.sql and aggregate.sql. I've only found the documentation of array_agg. Where is the doc for array(subselect) defined? 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. aha, so isn't better to fix a performance for accumArrayResult ? Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter. implemented it by modifying ArrayBuildState to ArrayBuildStateArray and ArrayBuildStateScalar (do you have any idea for better struct naming?) In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? where you can start? postgres=# explain select array(select a from test); ERROR: 42704: could not find array type for data type integer[] LOCATION: exprType, nodeFuncs.c:116 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword attention: probably we don't would to allow arrays everywhere. I've changed the places where i think it's appropriate. 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable. You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. this patch allocates 1KB (1024 bytes) if the ndatabytes is 512bytes. If it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 * number of items in array. Regards, -- Ali Akbar *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 12046,12051 NULL baz/literallayout(3 rows)/entry --- 12046,12067 row entry indexterm + primaryarray_agg/primary +/indexterm +functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry +any + /entry + entry +the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry +indexterm primaryaverage/primary /indexterm indexterm *** a/src/backend/nodes/nodeFuncs.c --- b/src/backend/nodes/nodeFuncs.c *** *** 108,119 exprType(const Node *expr) type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg(could not find array type for data type %s, ! format_type_be(exprType((Node *) tent-expr); } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) --- 108,123 type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { ! if (!OidIsValid
Re: [HACKERS] Function array_agg(array)
2014-10-24 15:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: Hi it looks well doc: http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS it should be fixed too Regards Pavel doc updated with additional example for array(subselect). patch attached. Regards, -- Ali Akbar *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 12046,12051 NULL baz/literallayout(3 rows)/entry --- 12046,12067 row entry indexterm + primaryarray_agg/primary +/indexterm +functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry +any + /entry + entry +the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry +indexterm primaryaverage/primary /indexterm indexterm *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *** *** 2238,2243 SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); --- 2238,2248 array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + + SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array + --- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting *** a/src/backend/nodes/nodeFuncs.c --- b/src/backend/nodes/nodeFuncs.c *** *** 108,119 exprType(const Node *expr) type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg(could not find array type for data type %s, ! format_type_be(exprType((Node *) tent-expr); } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) --- 108,123 type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { ! if (!OidIsValid(get_element_type(type))) ! { ! /* not array, so check for its array type */ ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg(could not find array type for data type %s, ! format_type_be(exprType((Node *) tent-expr); ! } } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) *** *** 139,150 exprType(const Node *expr) type = subplan-firstColType; if (subplan-subLinkType == ARRAY_SUBLINK) { ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg(could not find array type for data type %s, ! format_type_be(subplan-firstColType; } } else if (subplan-subLinkType == MULTIEXPR_SUBLINK) --- 143,158 type = subplan-firstColType; if (subplan-subLinkType == ARRAY_SUBLINK) { ! if (!OidIsValid(get_element_type(type))) ! { ! /* not array, so check for its array type */ ! type = get_array_type(type); ! if (!OidIsValid(type)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_OBJECT), ! errmsg(could not find array type for data type %s, ! format_type_be(subplan-firstColType; ! } } } else if (subplan-subLinkType == MULTIEXPR_SUBLINK) *** a/src/backend/optimizer/plan/subselect.c --- b/src/backend/optimizer/plan/subselect.c *** *** 668,677 build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot, Assert(!te-resjunk); Assert(testexpr == NULL); ! arraytype = get_array_type(exprType((Node *) te-expr)); ! if (!OidIsValid(arraytype)) ! elog(ERROR, could not find array type for datatype %s, ! format_type_be(exprType((Node *) te-expr))); prm = generate_new_param(root, arraytype, exprTypmod((Node *) te-expr), --- 668,683 Assert(!te-resjunk); Assert(testexpr == NULL); ! ! arraytype = exprType((Node *) te-expr); ! if (!OidIsValid(get_element_type(arraytype))) ! { ! /* not array, so get the array type */ ! arraytype = get_array_type(exprType((Node *) te-expr)); ! if (!OidIsValid(arraytype)) ! elog(ERROR, could not find array type for datatype %s, ! format_type_be(exprType((Node *) te-expr))); ! } prm = generate_new_param(root, arraytype, exprTypmod((Node *) te-expr), *** a/src/backend/utils/adt/array_userfuncs.c --- b/src
Re: [HACKERS] Function array_agg(array)
2014-10-24 16:26 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: Hi some in last patch is wrong, I cannot to compile it: arrayfuncs.c: In function ‘accumArrayResult’: arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’ astate-alen = 64; /* arbitrary starting array size */ ^ arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’ astate-dvalues = (Datum *) palloc(astate-alen * sizeof(Datum)); ^ arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’ astate-dvalues = (Datum *) palloc(astate-alen * sizeof(Datum)); ^ arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’ astate-dnulls = (bool *) palloc(astate-alen * sizeof(bool)); ^ arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’ astate-dnulls = (bool *) palloc(astate-alen * sizeof(bool)); ^ arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’ astate-nelems = 0; ^ arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’ if (astate-nelems = astate-alen) ^ arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’ if (astate-nelems = astate-alen) ^ arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’ astate-alen *= 2; Sorry, correct patch attached. This patch is in patience format (git --patience ..). In previous patches, i use context format (git --patience ... | filterdiff --format=context), but it turns out that some modification is lost. -- Ali Akbar diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7e5bcd9..f59738a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry row entry indexterm +primaryarray_agg/primary + /indexterm + functionarray_agg(replaceable class=parameteranyarray/replaceable)/function + /entry + entry + any + /entry + entry + the same array type as input type + /entry + entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry + /row + + row + entry + indexterm primaryaverage/primary /indexterm indexterm diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 2f0680f..8c182a4 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%'); array --- {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413} + +SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i)); + array +--- + {{1},{2},{3},{4},{5}} (1 row) /programlisting The subquery must return a single column. The resulting diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 41e973b..0261fcb 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -108,12 +108,16 @@ exprType(const Node *expr) type = exprType((Node *) tent-expr); if (sublink-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(exprType((Node *) tent-expr); + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, +format_type_be(exprType((Node *) tent-expr); + } } } else if (sublink-subLinkType == MULTIEXPR_SUBLINK) @@ -139,12 +143,16 @@ exprType(const Node *expr) type = subplan-firstColType; if (subplan-subLinkType == ARRAY_SUBLINK) { - type = get_array_type(type); - if (!OidIsValid(type)) - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(could not find array type for data type %s, - format_type_be(subplan-firstColType; + if (!OidIsValid(get_element_type(type))) + { + /* not array, so check for its array type */ + type = get_array_type(type); + if (!OidIsValid(type)) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg(could not find array type for data type %s, + format_type_be(subplan-firstColType; + } } } else if (subplan-subLinkType
Re: [HACKERS] Function array_agg(array)
Thanks for the review 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. OK, i'll add the documentation and regression test 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-10-22 22:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: 2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com: Thanks for the review 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com: I agree with your proposal. I have a few comments to design: 1. patch doesn't hold documentation and regress tests, please append it. OK, i'll add the documentation and regression test 2. this functionality (multidimensional aggregation) can be interesting more times, so maybe some interface like array builder should be preferred. We already have accumArrayResult and makeArrayResult/makeMdArrayResult, haven't we? Actually array_agg(anyarray) can be implemented by using accumArrayResult and makeMdArrayResult, but in the process we will deconstruct the array data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we will reconstruct it through makeMdArrayResult. I think this way it's not efficient, so i decided to reimplement it and memcpy the array data and null flags as-is. aha, so isn't better to fix a performance for accumArrayResult ? Ok, i'll go this route. While reading the code of array(subselect) as you pointed below, i think the easiest way to implement support for both array_agg(anyarray) and array(subselect) is to change accumArrayResult so it operates both with scalar datum(s) and array datums, with performance optimization for the latter. In other places, i think it's clearer if we just use accumArrayResult and makeArrayResult/makeMdArrayResult as API for building (multidimensional) arrays. 3. array_agg was consistent with array(subselect), so it should be fixed too postgres=# select array_agg(a) from test; array_agg --- {{1,2,3,4},{1,2,3,4}} (1 row) postgres=# select array(select a from test); ERROR: could not find array type for data type integer[] I'm pretty new in postgresql development. Can you point out where is array(subselect) implemented? where you can start? postgres=# explain select array(select a from test); ERROR: 42704: could not find array type for data type integer[] LOCATION: exprType, nodeFuncs.c:116 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in postgresql sources this keyword Found it. Thanks. On a side note in postgres array type for integer[] is still integer[], but in pg_type, integer[] has no array type. I propose we consider to change it so array type of anyarray is itself (not in this patch, of course, because it is a big change). Consider the following code in nodeFuncs.c:109 if (sublink-subLinkType == ARRAY_SUBLINK) { type = get_array_type(type); if (!OidIsValid(type)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(could not find array type for data type %s, format_type_be(exprType((Node *) tent-expr); } to implement array(subselect) you pointed above, we must specially checks for array types. Quick search on get_array_type returns 10 places. I'll think more about this later. For this patch, i'll go without changes in pg_type.h. 4. why you use a magic constant (64) there? + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-nullbitmap = (bits8 *) + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8); just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c): astate-alen = 64; /* arbitrary starting array size */ it can be any number not too small and too big. Too small, and we will realloc shortly. Too big, we will end up wasting memory. you can try to alloc 1KB instead as start -- it is used more times in Postgres. Then a overhead is max 1KB per agg call - what is acceptable. You take this value from accumArrayResult, but it is targeted for shorted scalars - you should to expect so any array will be much larger. Ok. Regards, -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
So, is there any idea how we will handle NULL and empty array in array_agg(anyarray)? I propose we just reject those input because the output will make no sense: - array_agg(NULL::int[]) -- the result will be indistinguished from array_agg of NULL ints. - array_agg('{}'::int[]) -- how we determine the dimension of the result? is it 0? Or the result will be just an empty array {} ? This updated patch rejects NULL and {} arrays as noted above. Regards, -- Ali Akbar *** a/src/backend/utils/adt/array_userfuncs.c --- b/src/backend/utils/adt/array_userfuncs.c *** *** 16,21 --- 16,51 #include utils/builtins.h #include utils/lsyscache.h + #include utils/memutils.h + + /*- + * ArrayAggAnyArrayState: + * aggregate state for array_agg(anyarray) + *- + */ + typedef struct + { + MemoryContext mcontext; /* where all the temp stuff is kept */ + char *data; /* array of accumulated data */ + bits8 *nullbitmap; /* bitmap of is-null flags for data */ + + int abytes; /* allocated length of above arrays */ + int aitems; /* allocated length of above arrays */ + int nbytes; /* number of used bytes in above arrays */ + int nitems; /* number of elements in above arrays */ + int narray; /* number of array accumulated */ + Oid element_type; /* data type of the Datums */ + int16 typlen; /* needed info about datatype */ + bool typbyval; + char typalign; + + int ndims; /* element dimensions */ + int *dims; + int *lbs; + + bool hasnull; /* any element has null */ + } ArrayAggAnyArrayState; + /*- * array_push : *** *** 544,546 array_agg_finalfn(PG_FUNCTION_ARGS) --- 574,821 PG_RETURN_DATUM(result); } + + /* + * ARRAY_AGG(anyarray) aggregate function + */ + Datum + array_agg_anyarray_transfn(PG_FUNCTION_ARGS) + { + MemoryContext aggcontext, + arr_context, + oldcontext; + ArrayAggAnyArrayState *astate; + + Oid arg_typeid = get_fn_expr_argtype(fcinfo-flinfo, 1); + Oid arg_elemtype = get_element_type(arg_typeid); + ArrayType *arg; + int *dims, + *lbs, + ndims, + nitems, + ndatabytes; + char *data; + + int i; + + if (arg_elemtype == InvalidOid) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(could not determine input data type))); + + if (!AggCheckCallContext(fcinfo, aggcontext)) + elog(ERROR, array_agg_anyarray_transfn called in non-aggregate context); + + if (PG_ARGISNULL(1)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(cannot aggregate null arrays))); + + astate = PG_ARGISNULL(0) ? NULL : (ArrayAggAnyArrayState *) PG_GETARG_POINTER(0); + + if (astate == NULL) + { + arr_context = AllocSetContextCreate(aggcontext, + array_agg_anyarray_transfn, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + oldcontext = MemoryContextSwitchTo(arr_context); + astate = (ArrayAggAnyArrayState *) palloc(sizeof(ArrayAggAnyArrayState)); + + astate-mcontext = arr_context; + astate-abytes = 0; + astate-aitems = 0; + astate-data = NULL; + astate-nullbitmap = NULL; + astate-nitems = 0; + astate-narray = 0; + astate-element_type = arg_elemtype; + get_typlenbyvalalign(arg_elemtype, + astate-typlen, + astate-typbyval, + astate-typalign); + } + else + { + oldcontext = MemoryContextSwitchTo(astate-mcontext); + Assert(astate-element_type == arg_elemtype); + } + + arg = PG_GETARG_ARRAYTYPE_P(1); + + ndims = ARR_NDIM(arg); + dims = ARR_DIMS(arg); + lbs = ARR_LBOUND(arg); + data = ARR_DATA_PTR(arg); + nitems = ArrayGetNItems(ndims, dims); + + ndatabytes = ARR_SIZE(arg) - ARR_DATA_OFFSET(arg); + + if (astate-data == NULL) + { + if (ndims == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(cannot aggregate empty arrays))); + + if (ndims + 1 MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg(number of array dimensions (%d) exceeds the maximum allowed (%d), + ndims + 1, MAXDIM))); + + astate-ndims = ndims; + astate-dims = (int *) palloc(ndims * sizeof(int)); + astate-lbs = (int *) palloc(ndims * sizeof(int)); + memcpy(astate-dims, dims, ndims * sizeof(int)); + memcpy(astate-lbs, lbs, ndims * sizeof(int)); + + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes); + astate-aitems = 64 * nitems; + astate-data = (char *) palloc(astate-abytes); + + memcpy(astate-data, data, ndatabytes); + astate-nbytes = ndatabytes; + astate-nitems = nitems; + + if (ARR_HASNULL(arg)) + { + astate-hasnull = true; + astate
Re: [HACKERS] Allow format 0000-0000-0000 in postgresql MAC parser
It has been registered now (https://commitfest.postgresql.org/action/patch_view?id=1585). I've got an updated version of the patch with the documentation fix. Looks like the patch is all good. I'm marking as ready for commiter. On a side note, i'm noticing from http://en.wikipedia.org/wiki/MAC_address, that there is three numbering namespace for MAC: MAC-48, EUI-48 and EUI-64. The last one is 64 bits long (8 bytes). Currently PostgreSQL's macaddr is only 6 bytes long. Should we change it to 8 bytes (not in this patch, of course)? Regards, -- Ali Akbar
Re: [HACKERS] proposal: plpgsql - Assert statement
2014-09-30 10:04 GMT+07:00 Jim Nasby j...@nasby.net: On 9/17/14, 7:40 PM, Jan Wieck wrote: Exactly. Doing something like ASSERT (select count(*) from foo where fk not in (select pk from bar)) = 0; is a perfectly fine, arbitrary boolean expression. It will probably work well in a development environment too. And I am very sure that it will not scale well once that code gets deployed. And I know how DBAs react to the guaranteed following performance problem. They will disable ALL assert ... or was there some sort of assert class system proposed that I missed? Actually, compared with for example Java or C, in production systems, usually all asserts are disabled for performance (in java removed by JIT, in C we define NDEBUG). We're also putting too much weight on the term assert here. C-style asserts are generally not nearly as useful in a database as general sanity-checking or error handling, especially if you're trying to use the database to enforce data sanity. +1. without any query capability, assert will become much less useful. If we cannot query in assert, we will code: -- perform some query ASSERT WHEN some_check_on_query_result; .. and disabling the query in production system will become another trouble. My wish-list for asserts is: - Works at a SQL level - Unique/clear way to identify asserts (so you're not guessing where the assert came from) +1 - Allows for over-riding individual asserts (so if you need to do something you're not supposed to do you still have the protection of all other asserts) - Less verbose than IF THEN RAISE END IF +1 -- Ali Akbar
Re: [HACKERS] Function array_agg(array)
2014-10-11 22:28 GMT+07:00 Tom Lane t...@sss.pgh.pa.us: Seems dangerous as heck; certainly it would have side-effects far more wide-ranging than just making this particular function work. A safer answer is to split array_agg into two functions, array_agg(anynonarray) - anyarray array_agg(anyarray) - anyarray I rather imagine you should do that anyway, because I really doubt that this hack is operating quite as intended. I suspect you are producing arrays containing arrays as elements, not true 2-D arrays. That's not a direction we want to go in I think; certainly there are no other operations that produce such things. Thanks for the review. Yes, it looks like the patch produced array as the elements. So, all array operations behaves wierdly. In this quick dirty patch, I am trying to implement the array_agg(anyarray), introducing two new functions: - array_agg_anyarray_transfn - array_agg_anyarray_finalfn At first, i want to use accumArrayResult and makeMdArrayResult, but it's complicated to work with multi-dimensional arrays with those two functions. So i combined array_cat with those function. Currently, it cannot handle NULL arrays: backend select array_agg(a) from (values(null::int[])) a(a); 1: array_agg(typeid = 1007, len = -1, typmod = -1, byval = f) ERROR: cannot aggregate null arrays Regards, -- Ali Akbar *** a/src/backend/utils/adt/array_userfuncs.c --- b/src/backend/utils/adt/array_userfuncs.c *** *** 16,21 --- 16,51 #include utils/builtins.h #include utils/lsyscache.h + #include utils/memutils.h + + /*- + * ArrayAggAnyArrayState: + * aggregate state for array_agg(anyarray) + *- + */ + typedef struct + { + MemoryContext mcontext; /* where all the temp stuff is kept */ + char *data; /* array of accumulated data */ + bits8 *nullbitmap; /* bitmap of is-null flags for data */ + + int abytes; /* allocated length of above arrays */ + int aitems; /* allocated length of above arrays */ + int nbytes; /* number of used bytes in above arrays */ + int nitems; /* number of elements in above arrays */ + int narray; /* number of array accumulated */ + Oid element_type; /* data type of the Datums */ + int16 typlen; /* needed info about datatype */ + bool typbyval; + char typalign; + + int ndims; /* element dimensions */ + int *dims; + int *lbs; + + bool hasnull; /* any element has null */ + } ArrayAggAnyArrayState; + /*- * array_push : *** *** 544,546 array_agg_finalfn(PG_FUNCTION_ARGS) --- 574,814 PG_RETURN_DATUM(result); } + + /* + * ARRAY_AGG(anyarray) aggregate function + */ + Datum + array_agg_anyarray_transfn(PG_FUNCTION_ARGS) + { + MemoryContext aggcontext, + arr_context, + oldcontext; + ArrayAggAnyArrayState *astate; + + Oid arg_typeid = get_fn_expr_argtype(fcinfo-flinfo, 1); + Oid arg_elemtype = get_element_type(arg_typeid); + ArrayType *arg; + int *dims, + *lbs, + ndims, + nitems, + ndatabytes; + char *data; + + int i; + + if (arg_elemtype == InvalidOid) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(could not determine input data type))); + + if (!AggCheckCallContext(fcinfo, aggcontext)) + elog(ERROR, array_agg_anyarray_transfn called in non-aggregate context); + + if (PG_ARGISNULL(1)) + elog(ERROR, cannot aggregate null arrays); + + astate = PG_ARGISNULL(0) ? NULL : (ArrayAggAnyArrayState *) PG_GETARG_POINTER(0); + + if (astate == NULL) + { + arr_context = AllocSetContextCreate(aggcontext, + array_agg_anyarray_transfn, + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + oldcontext = MemoryContextSwitchTo(arr_context); + astate = (ArrayAggAnyArrayState *) palloc(sizeof(ArrayAggAnyArrayState)); + + astate-mcontext = arr_context; + astate-abytes = 0; + astate-aitems = 0; + astate-data = NULL; + astate-nullbitmap = NULL; + astate-nitems = 0; + astate-narray = 0; + astate-element_type = arg_elemtype; + get_typlenbyvalalign(arg_elemtype, + astate-typlen, + astate-typbyval, + astate-typalign); + } + else + { + oldcontext = MemoryContextSwitchTo(astate-mcontext); + Assert(astate-element_type == arg_elemtype); + } + + arg = PG_GETARG_ARRAYTYPE_P(1); + + ndims = ARR_NDIM(arg); + dims = ARR_DIMS(arg); + lbs = ARR_LBOUND(arg); + data = ARR_DATA_PTR(arg); + nitems = ArrayGetNItems(ndims, dims); + + ndatabytes = ARR_SIZE(arg) - ARR_DATA_OFFSET(arg); + + if (astate-data == NULL) + { + if (ndims + 1 MAXDIM) + ereport(ERROR, + (errcode
[HACKERS] Function array_agg(array)
Greetings, While looking for easier items in PostgreSQL Wiki's Todo List (this will be my 3rd patch), i found this TODO: Add a built-in array_agg(anyarray) or similar, that can aggregate 1-dimensional arrays into a 2-dimensional array. I've stumbled by this lack of array_agg(anyarray) sometimes ago in my work, so i decided to explore this. Currently, if we try array_agg(anyarray), PostgreSQL behaves like this: # select array_agg('{1,2}'::int[]); ERROR: could not find array type for data type integer[] Reading implementation of array_agg, it looks like the array_agg function is generic, and can process any input. The error comes from PostgreSQL not finding array type for int[] (_int4 in pg_proc). In PostgreSQL, any array is multidimensional, array type for any array is the same: - the type of {1,2} is int[] - {{1,2}, {3,4}} is int[] - {{{1},{2}, {3} ,{4}}} is still int[] So, can't we just set the typarray of array types to its self oid? (patch attached). So far: - the array_agg is working and returning correct types: backend select array_agg('{1,2}'::int[]); 1: array_agg(typeid = 1007, len = -1, typmod = -1, byval = f) 1: array_agg = {{1,2}}(typeid = 1007, len = -1, typmod = -1, byval = f) select array_agg('{''a'',''b''}'::varchar[]); 1: array_agg(typeid = 1015, len = -1, typmod = -1, byval = f) 1: array_agg = {{'a','b'}}(typeid = 1015, len = -1, typmod = -1, byval = f) - Regression tests passed except for the pg_type sanity check while checking typelem relation with typarray: SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype, p2.typelem, p2.typlen FROM pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid) WHERE p1.typarray 0 AND (p2.oid IS NULL OR p2.typelem p1.oid OR p2.typlen -1); ! oid |basetype| arraytype| typelem | typlen ! --+++-+ ! 143 | _xml | _xml | 142 | -1 ! 199 | _json | _json | 114 | -1 ! 629 | _line | _line | 628 | -1 ! 719 | _circle| _circle| 718 | -1 ... (cut) Aside from the sanity check complaints, I don't see any problems in the resulting array operations. So, back to the question: Can't we just set the typarray of array types to its self oid? Regards, -- Ali Akbar diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h index b7d9256..7934874 100644 --- a/src/include/catalog/pg_type.h +++ b/src/include/catalog/pg_type.h @@ -357,8 +357,8 @@ DATA(insert OID = 114 ( json PGNSP PGUID -1 f b U f t \054 0 0 199 json_in j DATA(insert OID = 142 ( xml PGNSP PGUID -1 f b U f t \054 0 0 143 xml_in xml_out xml_recv xml_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ )); DESCR(XML content); #define XMLOID 142 -DATA(insert OID = 143 ( _xml PGNSP PGUID -1 f b A f t \054 0 142 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ )); -DATA(insert OID = 199 ( _json PGNSP PGUID -1 f b A f t \054 0 114 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ )); +DATA(insert OID = 143 ( _xml PGNSP PGUID -1 f b A f t \054 0 142 143 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ )); +DATA(insert OID = 199 ( _json PGNSP PGUID -1 f b A f t \054 0 114 199 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ )); DATA(insert OID = 194 ( pg_node_tree PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node_tree_in pg_node_tree_out pg_node_tree_recv pg_node_tree_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ )); DESCR(string representing an internal node tree); @@ -395,7 +395,7 @@ DESCR(geometric polygon '(pt1,...)'); DATA(insert OID = 628 ( line PGNSP PGUID 24 f b G f t \054 0 701 629 line_in line_out line_recv line_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ )); DESCR(geometric line); #define LINEOID 628 -DATA(insert OID = 629 ( _line PGNSP PGUID -1 f b A f t \054 0 628 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ )); +DATA(insert OID = 629 ( _line PGNSP PGUID -1 f b A f t \054 0 628 629 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ )); /* OIDS 700 - 799 */ @@ -421,11 +421,11 @@ DESCR(); DATA(insert OID = 718 ( circlePGNSP PGUID 24 f b G f t \054 0 0 719 circle_in circle_out circle_recv circle_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ )); DESCR(geometric circle '(center,radius)'); #define CIRCLEOID 718 -DATA(insert OID = 719 ( _circle PGNSP PGUID -1 f b A f t \054 0 718 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ )); +DATA(insert OID = 719 ( _circle PGNSP PGUID -1 f b A f t \054 0 718 719
Re: [HACKERS] Add generate_series(numeric, numeric)
Thanks for the review. Attached the formatted patch according to your suggestion. - numeric datatype is large, but there are limitations. According to doc, the limit is: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point. How can we check if the next step overflows? As a comparison, in int.c, generate_series_step_int4 checks if its overflows, and stop the next call by setting step to 0. Should we do that? Yes we should. how can we check the overflow after add_var? (in int.c, the code checks for integer calculation overflow, that wraps the result to negative value) in numeric.sql regression test, i've added this query: select (i / (10::numeric ^ 131071))::numeric(1,0) from generate_series(-9*(10::numeric ^ 131071), 9*(10::numeric ^ 131071), (10::numeric ^ 131071)) as a(i); Because the doc notes that the maximum numeric digit before decimal point is 131072, i hope this query covers the overflow case (in the last value it will generate, if we add 9 x 10^13071 with 10^13071, add_var will overflows). But in my tests, that isn't the case. The code works without any error and returns the correct rows: numeric - -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 (19 rows) Regards, -- Ali Akbar *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 14074,14081 AND tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one --- 14074,14081 tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one *** *** 14084,14091 AND row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter --- 14084,14091 row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *** *** 28,33 --- 28,34 #include access/hash.h #include catalog/pg_type.h + #include funcapi.h #include libpq/pqformat.h #include miscadmin.h #include nodes/nodeFuncs.h *** *** 261,266 typedef struct NumericVar --- 262,279 /* -- + * Data for generate series + * -- + */ + typedef struct + { + NumericVar current; + NumericVar finish; + NumericVar step; + } generate_series_numeric_fctx; + + + /* -- * Some preinitialized constants * -- */ *** *** 1221,1226 numeric_floor(PG_FUNCTION_ARGS) --- 1234,1346 PG_RETURN_NUMERIC(res); } + + /* + * generate_series_numeric() - + * + * Generate series of numeric. + */ + Datum + generate_series_numeric(PG_FUNCTION_ARGS) + { + return generate_series_step_numeric(fcinfo); + } + + Datum + generate_series_step_numeric(PG_FUNCTION_ARGS) + { + generate_series_numeric_fctx *fctx; + FuncCallContext *funcctx; + MemoryContext oldcontext; + + if (SRF_IS_FIRSTCALL()) + { + Numeric start_num = PG_GETARG_NUMERIC(0); + Numeric finish_num = PG_GETARG_NUMERIC(1); + NumericVar steploc = const_one; + + /* Handle NaN in start finish */ + if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(start and finish cannot be NaN))); + + /* see if we
Re: [HACKERS] Add generate_series(numeric, numeric)
+ select * from generate_series(0.1::numeric, 10.0::numeric, 0.1::numeric); + generate_series + - + 0.1 ... + 10.0 + (100 rows) Unless there is a good reason, can you please keep individual test output fewer than 100 lines? I think the 41-line result is an overkill as well. This just bloats the repository size unnecessarily. Okay. In this revision i cut the tests' result except the last one (the one that tests values just before numeric overflow). Also, I see that this patch was added to the 2014-10 commitfest and then deleted again (by user apaan). Why? User apaan is me. When i added to the commitfest, the patch is listed there by me (apaan). I only added some bits from original patch by Платон Малюгин that was revised further by Michael Paquier. So i think they should have the credits for this patch, not me. In this situation, what should i do? Thanks for the review Manti! Regards, -- Ali Akbar *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 14074,14081 AND tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one --- 14074,14081 tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one *** *** 14084,14091 AND row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter --- 14084,14091 row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *** *** 28,33 --- 28,34 #include access/hash.h #include catalog/pg_type.h + #include funcapi.h #include libpq/pqformat.h #include miscadmin.h #include nodes/nodeFuncs.h *** *** 261,266 typedef struct NumericVar --- 262,279 /* -- + * Data for generate series + * -- + */ + typedef struct + { + NumericVar current; + NumericVar finish; + NumericVar step; + } generate_series_numeric_fctx; + + + /* -- * Some preinitialized constants * -- */ *** *** 1221,1226 numeric_floor(PG_FUNCTION_ARGS) --- 1234,1346 PG_RETURN_NUMERIC(res); } + + /* + * generate_series_numeric() - + * + * Generate series of numeric. + */ + Datum + generate_series_numeric(PG_FUNCTION_ARGS) + { + return generate_series_step_numeric(fcinfo); + } + + Datum + generate_series_step_numeric(PG_FUNCTION_ARGS) + { + generate_series_numeric_fctx *fctx; + FuncCallContext *funcctx; + MemoryContext oldcontext; + + if (SRF_IS_FIRSTCALL()) + { + Numeric start_num = PG_GETARG_NUMERIC(0); + Numeric finish_num = PG_GETARG_NUMERIC(1); + NumericVar steploc = const_one; + + /* Handle NaN in start finish */ + if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(start and finish cannot be NaN))); + + /* see if we were given an explicit step size */ + if (PG_NARGS() == 3) + { + Numeric step_num = PG_GETARG_NUMERIC(2); + + if (NUMERIC_IS_NAN(step_num)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(step size cannot be NaN))); + + init_var_from_num(step_num, steploc); + + if (cmp_var(steploc, const_zero) == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(step size cannot equal zero
Re: [HACKERS] Add generate_series(numeric, numeric)
2014-10-06 22:51 GMT+07:00 Marti Raudsepp ma...@juffo.org: That's fine I think, it's just for tracking who made the changes in the CommitFest app. What actually matters is what you write in the Author field, which could contain all 3 names separated by commas. Ok. Added to commitfest: https://commitfest.postgresql.org/action/patch_view?id=1591 the one that tests values just before numeric overflow Actually I don't know if that's too useful. I think you should add a test case that causes an error to be thrown. Actually i added the test case because in the code, when adding step into current for the last value, i expected it to overflow: /* increment current in preparation for next iteration */ add_var(fctx-current, fctx-step, fctx-current); where in the last calculation, current is 9 * 10^131071. Plus 10^131071, it will be 10^131072, which i expected to overflow numeric type (in the doc, numeric's range is up to 131072 digits before the decimal point). In attached patch, i narrowed the test case to produce smaller result. Also, I noticed that there are a few trailing spaces in the patch that should be removed: +generate_series_numeric(PG_FUNCTION_ARGS) ... + if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num)) ... + if (PG_NARGS() == 3) ... + if (NUMERIC_IS_NAN(step_num)) Ah, didn't see it. Thanks. Fixed in this patch. Regards, -- Ali Akbar *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 14074,14081 AND tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one --- 14074,14081 tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one *** *** 14084,14091 AND row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter --- 14084,14091 row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *** *** 28,33 --- 28,34 #include access/hash.h #include catalog/pg_type.h + #include funcapi.h #include libpq/pqformat.h #include miscadmin.h #include nodes/nodeFuncs.h *** *** 261,266 typedef struct NumericVar --- 262,279 /* -- + * Data for generate series + * -- + */ + typedef struct + { + NumericVar current; + NumericVar finish; + NumericVar step; + } generate_series_numeric_fctx; + + + /* -- * Some preinitialized constants * -- */ *** *** 1221,1226 numeric_floor(PG_FUNCTION_ARGS) --- 1234,1346 PG_RETURN_NUMERIC(res); } + + /* + * generate_series_numeric() - + * + * Generate series of numeric. + */ + Datum + generate_series_numeric(PG_FUNCTION_ARGS) + { + return generate_series_step_numeric(fcinfo); + } + + Datum + generate_series_step_numeric(PG_FUNCTION_ARGS) + { + generate_series_numeric_fctx *fctx; + FuncCallContext *funcctx; + MemoryContext oldcontext; + + if (SRF_IS_FIRSTCALL()) + { + Numeric start_num = PG_GETARG_NUMERIC(0); + Numeric finish_num = PG_GETARG_NUMERIC(1); + NumericVar steploc = const_one; + + /* Handle NaN in start finish */ + if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(start and finish cannot be NaN))); + + /* see if we were
Re: [HACKERS] Last Commitfest patches waiting review
2014-10-03 23:14 GMT+07:00 Heikki Linnakangas hlinnakan...@vmware.com: Fix xpath() to return namespace definitions (fixes the issue with nested or repeated xpath()) Peter, you've done some XML stuff before; could you have a look at this too? I am the author of the patch. I've sent Peter off-the-list review request email as you had suggested before, but he didn't respond. How can i help to ease the review? The last patch is re-implementation, as per Tom Lane's findings about xmlNodeCopy's behavior if there's not enough memory. It turns out that the reimplementation is not as simple as before (because reimplement some of xmlNodeCopy code must be reimplemented here). Reviewing the patch myself, i've found some code formatting problems. Will fix and post in the patch's thread. Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
While reviewing the patch myself, i spotted some formatting problems in the code. Fixed in this v5 patch. Also, this patch uses context patch format (in first versions, because of the small modification, context patch format obfucates the changes. After reimplementation this isn't the case anymore) Thanks, Ali Akbar *** a/src/backend/utils/adt/xml.c --- b/src/backend/utils/adt/xml.c *** *** 141,149 static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, --- 141,151 pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur, ! PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, *** *** 3594,3620 SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, #ifdef USE_LIBXML /* * Convert XML node to text (dump subtree in case of element, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; buf = xmlBufferCreate(); PG_TRY(); { xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { xmlBufferFree(buf); PG_RE_THROW(); } --- 3596,3706 #ifdef USE_LIBXML + /* check ns definition of node and its childrens. If any one of ns is + * not defined in node and it's children, but in the node's parent, + * copy the definition to node. + */ + static void + xml_checkandcopyns(xmlNodePtr root, + PgXmlErrorContext *xmlerrcxt, + xmlNodePtr node, + xmlNsPtr lastns_before) + { + xmlNsPtr ns = NULL; + xmlNsPtr cur_ns; + xmlNodePtr cur; + + if (node-ns != NULL) + { + /* check in new nses */ + cur_ns = lastns_before == NULL ? node-nsDef : lastns_before-next; + while (cur_ns != NULL) + { + if (cur_ns-href != NULL) + { + if (((cur_ns-prefix == NULL) (node-ns-prefix == NULL)) || + ((cur_ns-prefix != NULL) (node-ns-prefix != NULL) + xmlStrEqual(cur_ns-prefix, node-ns-prefix))) + { + ns = cur_ns; + break; + } + } + cur_ns = cur_ns-next; + } + if (ns == NULL) /* not in new nses */ + { + ns = xmlSearchNs(NULL, node-parent, node-ns-prefix); + + if (ns != NULL) + { + ns = xmlNewNs(root, ns-href, ns-prefix); + + if (ns == NULL xmlerrcxt-err_occurred) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not allocate xmlNs); + } + } + } + /* check and copy ns for children recursively */ + cur = node-children; + while (cur != NULL) + { + xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before); + cur = cur-next; + } + } + /* * Convert XML node to text (dump subtree in case of element, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNsPtr lastns_before; + xmlNsPtr ns; + xmlNsPtr next; buf = xmlBufferCreate(); + PG_TRY(); { + lastns_before = cur-nsDef; + if (lastns_before != NULL) + { + while (lastns_before-next != NULL) + lastns_before = lastns_before-next; + } + xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before); + xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); + + /* delete and free new nses */ + ns = lastns_before == NULL ? cur-nsDef : lastns_before-next; + while (ns != NULL) + { + next = ns-next; + xmlFree(ns); + ns = next; + } + + if (lastns_before == NULL) + cur-nsDef = NULL; + else + lastns_before-next = NULL; } PG_CATCH(); { + /* new namespaces will be freed while free-ing the node, so we + * won't free it here + */ xmlBufferFree(buf); PG_RE_THROW(); } *** *** 3660,3666 xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate) { int result = 0; Datum datum; --- 3746,3753 */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj
Re: [HACKERS] Add generate_series(numeric, numeric)
Hi Oops, it seems that I have been too hasty here. With a fresh mind I looked at my own patch again and found two bugs: - Incorrect calculation of each step's value, making stuff crash, it is necessary to switch to the context of the function to perform operations on a temporary variable first - i think you can use the fctx-current variable without temporary variable (there's comment in the add_var function: Full version of add functionality on variable level (handling signs). result might point to one of the operands too without danger.). But you _must_ switch the context first because add_var will allocate new array for the data and freeing the old one. - numeric can be NaN. We must reject it as first, finish and last parameter. - numeric datatype is large, but there are limitations. According to doc, the limit is: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point. How can we check if the next step overflows? As a comparison, in int.c, generate_series_step_int4 checks if its overflows, and stop the next call by setting step to 0. Should we do that? ~ will try to fix the patch later -- Ali Akbar
Re: [HACKERS] Add generate_series(numeric, numeric)
2014-10-05 15:21 GMT+07:00 Ali Akbar the.ap...@gmail.com: Hi Oops, it seems that I have been too hasty here. With a fresh mind I looked at my own patch again and found two bugs: - Incorrect calculation of each step's value, making stuff crash, it is necessary to switch to the context of the function to perform operations on a temporary variable first - i think you can use the fctx-current variable without temporary variable (there's comment in the add_var function: Full version of add functionality on variable level (handling signs). result might point to one of the operands too without danger.). But you _must_ switch the context first because add_var will allocate new array for the data and freeing the old one. - numeric can be NaN. We must reject it as first, finish and last parameter. - numeric datatype is large, but there are limitations. According to doc, the limit is: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point. How can we check if the next step overflows? As a comparison, in int.c, generate_series_step_int4 checks if its overflows, and stop the next call by setting step to 0. Should we do that? ~ will try to fix the patch later attached the patch. Not checking if it overflows, but testing it with 9 * 10 ^ 131072 works (not resulting in an error). Other modifications: - doc update - regresssion tests - while testing regression test, opr_sanity checks that the generate_series_numeric function is used twice (once for 2 parameter and once for the 3 parameter function), so i changed the name to generate_series_step_numeric and created new function generate_series_numeric that calls generate_series_step_numeric -- Ali Akbar *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 14074,14081 AND tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one --- 14074,14081 tbody row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of one *** *** 14084,14091 AND row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type or typebigint/type/entry ! entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter --- 14084,14091 row entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry ! entrytypeint/type, typebigint/type or typenumeric/type/entry ! entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry entry Generate a series of values, from parameterstart/parameter to parameterstop/parameter with a step size of parameterstep/parameter *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *** *** 28,33 --- 28,34 #include access/hash.h #include catalog/pg_type.h + #include funcapi.h #include libpq/pqformat.h #include miscadmin.h #include nodes/nodeFuncs.h *** *** 261,266 typedef struct NumericVar --- 262,279 /* -- + * Data for generate series + * -- + */ + typedef struct + { + NumericVar current; + NumericVar finish; + NumericVar step; + } generate_series_numeric_fctx; + + + /* -- * Some preinitialized constants * -- */ *** *** 1221,1226 numeric_floor(PG_FUNCTION_ARGS) --- 1234,1346 PG_RETURN_NUMERIC(res); } + + /* + * generate_series_numeric() - + * + * Generate series of numeric. + */ + Datum + generate_series_numeric(PG_FUNCTION_ARGS) { + return generate_series_step_numeric(fcinfo); + } + + Datum + generate_series_step_numeric(PG_FUNCTION_ARGS) + { + generate_series_numeric_fctx *fctx; + FuncCallContext *funcctx; + MemoryContext oldcontext; + + if (SRF_IS_FIRSTCALL()) + { + Numeric start_num = PG_GETARG_NUMERIC(0); + Numeric finish_num = PG_GETARG_NUMERIC(1); + NumericVar
Re: [HACKERS] Add generate_series(numeric, numeric)
Also, Платон Малюгин, can you add this patch to postgresql commitfest ( http://commitfest.postgresql.org)? -- Ali Akbar
Re: [HACKERS] [PATCH] empty xml values
While looking into this report http://www.postgresql.org/message-id/cf48ccfb.65a9d%tim.k...@gmail.com I noticed that we don't accept empty values as xml content values, even though this should apparently be allowed by the spec. Attached is a patch to fix it (which needs updates to xml_1.out, which I omit here for simplicity). The patch works, albeit must be applied with --ignore-whitespace. Attached the patch + xml_1.out updates. I'm marking this ready for commit -- Ali Akbar diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 422be69..7abe215 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -1400,11 +1400,14 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, doc-encoding = xmlStrdup((const xmlChar *) UTF-8); doc-standalone = standalone; - res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, - utf8string + count, NULL); - if (res_code != 0 || xmlerrcxt-err_occurred) -xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_CONTENT, - invalid XML content); + if (*(utf8string + count)) + { +res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0, + utf8string + count, NULL); +if (res_code != 0 || xmlerrcxt-err_occurred) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_CONTENT, +invalid XML content); + } } } PG_CATCH(); diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 382f9df..6e6c673 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -194,6 +194,18 @@ SELECT xmlelement(name foo, xmlattributes( as funny, xml 'ba/r' as fun foo funny=lt;gt;amp;quot;' funnier=blt;a/gt;r/ (1 row) +SELECT xmlparse(content ''); + xmlparse +-- + +(1 row) + +SELECT xmlparse(content ' '); + xmlparse +-- + +(1 row) + SELECT xmlparse(content 'abc'); xmlparse -- @@ -251,6 +263,22 @@ SELECT xmlparse(content 'nosuchprefix:tag/'); nosuchprefix:tag/ (1 row) +SELECT xmlparse(document ''); +ERROR: invalid XML document +DETAIL: line 1: switching encoding : no input + +^ +line 1: Document is empty + +^ +line 1: Start tag expected, '' not found + +^ +SELECT xmlparse(document ' '); +ERROR: invalid XML document +DETAIL: line 1: Start tag expected, '' not found + + ^ SELECT xmlparse(document 'abc'); ERROR: invalid XML document DETAIL: line 1: Start tag expected, '' not found diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out index a34d1f4..b0e0067 100644 --- a/src/test/regress/expected/xml_1.out +++ b/src/test/regress/expected/xml_1.out @@ -164,6 +164,14 @@ SELECT xmlelement(name foo, xmlattributes( as funny, xml 'ba/r' as fun ERROR: unsupported XML feature DETAIL: This functionality requires the server to be built with libxml support. HINT: You need to rebuild PostgreSQL using --with-libxml. +SELECT xmlparse(content ''); +ERROR: unsupported XML feature +DETAIL: This functionality requires the server to be built with libxml support. +HINT: You need to rebuild PostgreSQL using --with-libxml. +SELECT xmlparse(content ' '); +ERROR: unsupported XML feature +DETAIL: This functionality requires the server to be built with libxml support. +HINT: You need to rebuild PostgreSQL using --with-libxml. SELECT xmlparse(content 'abc'); ERROR: unsupported XML feature DETAIL: This functionality requires the server to be built with libxml support. @@ -196,6 +204,14 @@ SELECT xmlparse(content 'nosuchprefix:tag/'); ERROR: unsupported XML feature DETAIL: This functionality requires the server to be built with libxml support. HINT: You need to rebuild PostgreSQL using --with-libxml. +SELECT xmlparse(document ''); +ERROR: unsupported XML feature +DETAIL: This functionality requires the server to be built with libxml support. +HINT: You need to rebuild PostgreSQL using --with-libxml. +SELECT xmlparse(document ' '); +ERROR: unsupported XML feature +DETAIL: This functionality requires the server to be built with libxml support. +HINT: You need to rebuild PostgreSQL using --with-libxml. SELECT xmlparse(document 'abc'); ERROR: unsupported XML feature DETAIL: This functionality requires the server to be built with libxml support. diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql index 90d4d67..922ab7a 100644 --- a/src/test/regress/sql/xml.sql +++ b/src/test/regress/sql/xml.sql @@ -60,6 +60,8 @@ SELECT xmlelement(name foo, xmlattributes('infinity'::timestamp as bar)); SELECT xmlelement(name foo, xmlattributes( as funny, xml 'ba/r' as funnier)); +SELECT xmlparse(content ''); +SELECT xmlparse(content ' '); SELECT xmlparse(content 'abc'); SELECT xmlparse(content 'abcx/abc'); SELECT xmlparse(content 'invalidentity/invalidentity'); @@ -69,6 +71,8 @@ SELECT xmlparse(content 'relativens xmlns=''relative''/'); SELECT xmlparse(content
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Greetings, Because of the memory bug in xmlCopyNode, this is a new patch with different method. In this patch, instead of using xmlCopyNode to bring the namespace back, we added the required namespaces to the node before dumping the node to string, and cleaning it up afterwards (because the same node could be dumped again in next xpath result). Thanks, Ali Akbar 2014-07-11 15:36 GMT+07:00 Ali Akbar the.ap...@gmail.com: Greetings, Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport). From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion. But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? Thanks, Ali Akbar -- Ali Akbar diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 422be69..f34c87e 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -141,9 +141,11 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); -static text *xml_xmlnodetoxmltype(xmlNodePtr cur); +static text *xml_xmlnodetoxmltype(xmlNodePtr cur, + PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, - ArrayBuildState **astate); + ArrayBuildState **astate, + PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, @@ -3590,27 +3592,109 @@ SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, #ifdef USE_LIBXML +/* check ns definition of node and its childrens. If any one of ns is + * not defined in node and it's children, but in the node's parent, + * copy the definition to node. + */ +static void +xml_checkandcopyns(xmlNodePtr root, + PgXmlErrorContext *xmlerrcxt, + xmlNodePtr node, + xmlNsPtr lastns_before) +{ + xmlNsPtr ns = NULL; + xmlNsPtr cur_ns; + xmlNodePtr cur; + + if (node-ns != NULL) + { + /* check in new nses */ + cur_ns = lastns_before == NULL ? node-nsDef : lastns_before-next; + while (cur_ns != NULL) + { + if (cur_ns-href != NULL) + { +if (((cur_ns-prefix == NULL) (node-ns-prefix == NULL)) || + ((cur_ns-prefix != NULL) (node-ns-prefix != NULL) + xmlStrEqual(cur_ns-prefix, node-ns-prefix))) +{ + ns = cur_ns; + break; +} + } + cur_ns = cur_ns-next; + } + if (ns == NULL) /* not in new nses */ + { + ns = xmlSearchNs(NULL, node-parent, node-ns-prefix); + + if (ns != NULL) { +ns = xmlNewNs(root, ns-href, ns-prefix); + +if (ns == NULL xmlerrcxt-err_occurred) { + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not allocate xmlNs); +} + } + } + } + /* check and copy ns for children recursively */ + cur = node-children; + while (cur != NULL) { + xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before); + cur = cur-next; + } +} + /* * Convert XML node to text (dump subtree in case of element, * return value otherwise) */ static text * -xml_xmlnodetoxmltype(xmlNodePtr cur) +xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNsPtr lastns_before; + xmlNsPtr ns; + xmlNsPtr next; buf = xmlBufferCreate(); + PG_TRY(); { + lastns_before = cur-nsDef; + if (lastns_before != NULL) { +while (lastns_before-next != NULL) { + lastns_before = lastns_before-next; +} + } + xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before); + xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); + + /* delete and free new nses */ + ns = lastns_before == NULL ? cur-nsDef : lastns_before-next; + while (ns != NULL) { +next = ns-next; +xmlFree(ns); +ns = next; + } + if (lastns_before == NULL) { +cur-nsDef = NULL; + } else { +lastns_before-next = NULL; + } } PG_CATCH(); { + /* new namespaces will be freed while free-ing the node, so we + * won't free it here + */ xmlBufferFree(buf); PG_RE_THROW(); } @@ -3656,7 +3740,8 @@ xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, - ArrayBuildState **astate) + ArrayBuildState
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Greetings, Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport). From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion. But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? Thanks, Ali Akbar *** a/src/backend/utils/adt/xml.c --- b/src/backend/utils/adt/xml.c *** *** 141,149 static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, --- 141,151 pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur, ! PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, *** *** 3595,3620 SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; buf = xmlBufferCreate(); PG_TRY(); { ! xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); xmlBufferFree(buf); } else --- 3597,3636 * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNodePtr cur_copy; buf = xmlBufferCreate(); + + /* the result of xmlNodeDump won't contain namespace definitions + * from parent nodes, but xmlCopyNode duplicates a node along + * with its required namespace definitions. + */ + cur_copy = xmlCopyNode(cur, 1); + + if (cur_copy == NULL) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not serialize xml); + PG_TRY(); { ! xmlNodeDump(buf, NULL, cur_copy, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { + xmlFreeNode(cur_copy); xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); + xmlFreeNode(cur_copy); xmlBufferFree(buf); } else *** *** 3656,3662 xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate) { int result = 0; Datum datum; --- 3672,3679 */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt) { int result = 0; Datum datum; *** *** 3678,3684 xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, for (i = 0; i result; i++) { ! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i])); *astate = accumArrayResult(*astate, datum, false, XMLOID, CurrentMemoryContext); --- 3695,3702 for (i = 0; i result; i++) { ! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i], ! xmlerrcxt)); *astate = accumArrayResult(*astate, datum, false, XMLOID, CurrentMemoryContext); *** *** 3882,3890 xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, * Extract the results as requested. */ if (res_nitems != NULL) ! *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate); else ! (void) xml_xpathobjtoxmlarray(xpathobj, astate); } PG_CATCH(); { --- 3900,3908 * Extract the results as requested. */ if (res_nitems != NULL) ! *res_nitems
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
I don't know enough about XML/XPATH to know if this is a good idea or not, Actually currently because of the namespace problem, xpath() returns wrong result (even worse: invalid xml!). So this patch corrects the behavior of xpath() to the correct one. but I did go read the documentation of xmlCopyNode(), and I notice it says the second argument is extended: if 1 do a recursive copy (properties, namespaces and children when applicable) if 2 copy properties and namespaces (when applicable) Do we really want it to copy children? Perhaps the value should be 2? (I have no idea, I'm just raising the question.) If we put 1 as the parameter, the resulting Node will link to the existing children. In this case, the namespace problem for the children might be still exists. If any of the children uses different namespace(s) than the parent, the namespace definition will not be copied correctly. Just to be safe, in the patch 1 passed as the second argument. Ideally, we can traverse the Node object and generate XML accordingly, but it is a lot of work, and a lot of duplicating libxml's code. Maybe we should push this upstream to libxml? I also notice that it says Returns: a new #xmlNodePtr, or NULL in case of error. and the patch is failing to cover the error case. Most likely, that would represent out-of-memory, so it definitely seems to be something we should account for. Ah, overlooked that. Skimming through libxml2's source, it looks like there are some other cases beside out-of-memory. Will update the patch according to these cases. Thanks for the review. -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen a...@2ndquadrant.com: Thanks for the patch, and welcome to Postgres development. I can confirm that it works fine. I have attached here a very slightly tweaked version of the patch (removed trailing whitespace, and changed some comment text). I'm marking this ready for committer. Thanks for the review. Hope i will be able to contribute a little here and there in the future. -- Ali Akbar
Re: [HACKERS] pivot aggregation with a patched intarray
2014-06-05 17:18 GMT+07:00 Marc Mamin m.ma...@intershop.de: I'm thinking about adding a final function to my aggregate that would replace zero values will nulls, hence transforming the intarray into a standard int[], possibly with nullbitmap and a lowerbound that can be 1. This will probably degrade the performance considerably, but may reduce the size of the end result for spare data and not too small integers... Performances should greatly depend on the data distribution and order as they influence the number of palloc. My first tests shown as well better and poorer results. My target is not to get better performances at the first place, but to get a pivot structure in an early aggregation stage. Usually for pivot, i use crosstab function from tablefunc (http://www.postgresql.org/docs/9.4/static/tablefunc.html#AEN158550). If your patch doesn't perform better, it's more easier to just use crosstab. For storing it efficiently, the result can be transformed into array manually. PS: as Michael Paquier said above, its better if you could send the patch in the .patch file format (see: https://wiki.postgresql.org/wiki/Working_with_GIT). -- Ali Akbar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix xpath() to return namespace definitions
2014-06-05 2:37 GMT+07:00 Robert Haas robertmh...@gmail.com: Please add your patch here so we don't forget about it: https://commitfest.postgresql.org/action/commitfest_view/open Added: https://commitfest.postgresql.org/action/patch_view?id=1461 Please see also: https://wiki.postgresql.org/wiki/Submitting_a_Patch Thanks, i've read it. As for the suggestion there: Start out by reviewing a patch or responding to email on the lists. Even if it is unrelated to what you're doing, i'll look around other posts in this list. For back versions, i think because this patch changes xpath() behavior, we will only apply this to future versions. The old behavior is wrong (according to XPath standard) for not including namespaces, but maybe there are some application that depends on the old behavior. Thanks! -- Ali Akbar
Re: [HACKERS] pivot aggregation with a patched intarray
2014-06-01 20:48 GMT+07:00 Marc Mamin m.ma...@intershop.de: On Sat, May 31, 2014 at 12:31 AM, Marc Mamin m.ma...@intershop.de wrote: I have patched intarray with 3 additional functions in order to count[distinct] event IDs into arrays, whereas the array position correspond to the integer values. (mimic column oriented storage) I didn't look at the feature itself, but here are some comments about the format of the patch: - Be careful the newlines on the file you posted use ¥r¥n, which is purely Windows stuff... This will generate unnecessary diffs with the source code I don't mean to suggests this directly as a patch, I'm first interested to see if there are some interest for such an aggregation type. From what i see, the icount_to_array is complementary to standard count() aggregates, but it produces array. If the values are not sparse, i think the performance and memory/storage benefit you mentioned will be true. But if the values are sparse, there will be many 0's, how it will perform? I'm interested to benchmark it with some use cases, to confirm the performance benefits of it. -- Ali Akbar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix xpath() to return namespace definitions
Hi all, While developing some XML processing queries, i stumbled on an old bug mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or repeated xpath() that apparently mess up namespaces. Source of the bug is that libxml2's xmlNodeDump won't output XML namespace definitions that declared in the node's parents. As per https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858, the behavior is intentional. This patch uses function xmlCopyNode that copies a node, including its namespace definitions as required (without unused namespace in the node or its children). When the copy dumped, the resulting XML is complete with its namespaces. Calling xmlCopyNode will need additional memory to execute, but reimplementing its routine to handle namespace definition will introduce much complexity to the code. Note: This is my very first postgresql patch. -- Ali Akbar diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 422be69..93e335c 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur) if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNodePtr cur_copy; buf = xmlBufferCreate(); + + /* the result of xmlNodeDump won't contain namespace definitions, + * but libxml2 has xmlCopyNode that duplicates a node, along + * with its required namespace definitions. + */ + cur_copy = xmlCopyNode(cur, 1); PG_TRY(); { - xmlNodeDump(buf, NULL, cur, 0, 1); + xmlNodeDump(buf, NULL, cur_copy, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { + xmlFreeNode(cur_copy); xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); + xmlFreeNode(cur_copy); xmlBufferFree(buf); } else diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 382f9df..a6d26f7 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;loc {1,2} (1 row) +SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); + xpath + + {local:piece xmlns:local=\http://127.0.0.1\; id=\1\number one/local:piece,local:piece xmlns:local=\http://127.0.0.1\; id=\2\/} +(1 row) + SELECT xpath('//b', 'aone btwo/b three betc/b/a'); xpath - diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out index a34d1f4..c7bcf91 100644 --- a/src/test/regress/expected/xml_1.out +++ b/src/test/regress/expected/xml_1.out @@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=ht... ^ DETAIL: This functionality requires the server to be built with libxml support. HINT: You need to rebuild PostgreSQL using --with-libxml. +SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); +ERROR: unsupported XML feature +LINE 1: SELECT xpath('//loc:piece', 'local:data xmlns:local=http:/... +^ +DETAIL: This functionality requires the server to be built with libxml support. +HINT: You need to rebuild PostgreSQL using --with-libxml. SELECT xpath('//b', 'aone btwo/b three betc/b/a'); ERROR: unsupported XML feature LINE 1: SELECT xpath('//b', 'aone btwo/b three betc/b/a'... diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql index 90d4d67..241a5d6 100644 --- a/src/test/regress/sql/xml.sql +++ b/src/test/regress/sql/xml.sql @@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest; SELECT xpath('', '!-- error --'); SELECT xpath('//text()', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data'); SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); +SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]); SELECT xpath('//b', 'aone btwo/b three betc/b/a'); SELECT xpath('//text()', 'rootlt;/root'); SELECT xpath('//@value', 'root value=lt;/'); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers