from:"Ali Akbar"

Re: [HACKERS] Bug in comparison of empty jsonb arrays to scalars

2016-11-09 Thread Ali Akbar

2016-11-10 13:54 GMT+07:00 Michael Paquier <michael.paqu...@gmail.com>:

> On Thu, Nov 10, 2016 at 7:37 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > Given that nobody actually cares what that sort order is, I think that
> > having to jump through hoops in pg_upgrade in order to fix it is not a
> > great tradeoff.  I suggest changing the documentation to match the code.
>

Don't you in this case think we should match sort order in javascript?


> Yes, definitely.
> =# create table json_data (a jsonb);
> CREATE TABLE
> =# INSERT INTO json_data values ('{}'::jsonb), ('[]'::jsonb),
> ('null'::jsonb), ('true'::jsonb), ('1'::jsonb), ('""'::jsonb);
> INSERT 0 6
> =# SELECT * FROM json_data ORDER BY 1 DESC;
>   a
> --
>  {}
>  true
>  1
>  ""
>  null
>  []
> (6 rows)
> So that's object > boolean > integer > string > NULL > array.
>

> a = [{}, [], null, true, 1, '""']
[ {}, [], null, true, 1, '""' ]
> a.sort()
[ [], '""', 1, {}, null, true ]
> a.reverse()
[ true, null, {}, 1, '""', [] ]

So in this case it's boolean > NULL > Object > integer > string > array
(tried in Chromium 53, Firefox 49 and Node v6.9.1)

When I tried to search for the ECMA Standard for this behavior, i found
this: http://blog.rodneyrehm.de/archives/14-Sorting-Were-Doing-It-Wrong.html.
There are problems about automatic conversion in javascript, like this:

> a = [{}, [], null, true, 1, 'someotherstring']
[ {}, [], null, true, 1, 'someotherstring' ]
> a.sort().reverse()
[ true, 'someotherstring', null, {}, 1, [] ]


versus this:

> a = [{}, [], null, true, 1, 'SomeOtherString']
[ {}, [], null, true, 1, 'SomeOtherString' ]
> a.sort().reverse()
[ true, null, {}, 'SomeOtherString', 1, [] ]


and this:

> a = [{}, [], null, true, 1, '2']
[ {}, [], null, true, 1, '2' ]
> a.sort().reverse()
[ true, null, {}, '2', 1, [] ]


So we can't replicate javascript sort order without emulating those.

Regards,
Ali Akbar

Re: [HACKERS] Postgres service stops when I kill client backend on Windows

2015-10-10 Thread Ali Akbar

Greetings,

2015-10-11 0:18 GMT+07:00 Pavel Stehule <pavel.steh...@gmail.com>:

>
> 2015-10-10 18:04 GMT+02:00 Dmitry Vasilyev <d.vasil...@postgrespro.ru>:
>
>>
>> On Сб, 2015-10-10 at 10:55 -0500, Tom Lane wrote:
>> > Dmitry Vasilyev <d.vasil...@postgrespro.ru> writes:
>> > > I have written, what service stopped. This action is repeatable.
>> > > You can run command 'psql -c "do $$ unpack p,1x8 $$ language
>> > > plperlu;"'
>> > > and after this windows service will stop.
>> >
>>
>
> so it is expected behave. After any unexpected client fails, the server is
> restarted
>

I can confirm this too. In linux (i use Fedora 22), this is what happens
when a server is killed:

=== 1. before:
$ sudo systemctl status postgresql.service
postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled)
   Active: active (running) since Jum 2015-10-09 16:25:43 WIB; 1 day 14h ago
  Process: 778 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p
${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)
  Process: 747 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA}
(code=exited, status=0/SUCCESS)
 Main PID: 783 (postgres)
   CGroup: /system.slice/postgresql.service
   ├─  783 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432
   ├─  812 postgres: logger process
   ├─  821 postgres: checkpointer process
   ├─  822 postgres: writer process
   ├─  823 postgres: wal writer process
   ├─  824 postgres: autovacuum launcher process
   ├─  825 postgres: stats collector process
   └─17181 postgres: postgres test [local] idle


=== 2. killing and attempt to reconnect:
$ sudo kill 17181

test=# select 1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.


=== 3. service status after:
$ sudo systemctl status postgresql.service
postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled)
   Active: active (running) since Jum 2015-10-09 16:25:43 WIB; 1 day 14h ago
  Process: 778 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p
${PGPORT} -w -t 300 (code=exited, status=0/SUCCESS)
  Process: 747 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA}
(code=exited, status=0/SUCCESS)
 Main PID: 783 (postgres)
   CGroup: /system.slice/postgresql.service
   ├─  783 /usr/bin/postgres -D /var/lib/pgsql/data -p 5432
   ├─  812 postgres: logger process
   ├─  821 postgres: checkpointer process
   ├─  822 postgres: writer process
   ├─  823 postgres: wal writer process
   ├─  824 postgres: autovacuum launcher process
   ├─  825 postgres: stats collector process
   └─17422 postgres: postgres test [local] idle

===

The service status is still active (running), and new process 17422 handles
the client.


But this is what happens in Windows (win 7 32 bit, postgres 9.4):

=== 1. before:
C:\Windows\system32>sc queryex postgresql-9.4

SERVICE_NAME: postgresql-9.4
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 4  RUNNING
(STOPPABLE, PAUSABLE, ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE: 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0
PID: 3716
FLAGS  :



=== 2. killing & attempt to reconnect:
postgres=# select pg_backend_pid();
 pg_backend_pid

   2080
(1 row)

C:\Windows\system32>taskkill /F /PID 2080
SUCCESS: The process with PID 2080 has been terminated.

postgres=# select 1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

=== 3. service status after:
C:\Windows\system32>sc query postgresql-9.4

SERVICE_NAME: postgresql-9.4
TYPE   : 10  WIN32_OWN_PROCESS
STATE  : 1  STOPPED
WIN32_EXIT_CODE: 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT : 0x0
WAIT_HINT  : 0x0

===

The client cannot reconnect. The service is dead. This is nasty, because
any client can exploit some segfault bug like the one in perl Dmitry
mentoined upthread, and the postgresql service is down.

Note: killing the server process with pg_terminate_backend isn't causing
this behavior to happen. The client reconnects normally, and the service is
still running.

Regards,
Ali Akbar

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Ali Akbar

2015-01-20 18:17 GMT+07:00 Ali Akbar the.ap...@gmail.com:

 2015-01-20 14:22 GMT+07:00 Jeff Davis pg...@j-davis.com:

 The current patch, which I am evaluating for commit, does away with
 per-group memory contexts (it uses a common context for all groups), and
 reduces the initial array allocation from 64 to 8 (but preserves
 doubling behavior).


 Jeff  Tomas, spotted this comment in v8 patch:
 @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
  /*
   * makeArrayResult - produce 1-D final result of accumArrayResult
   *
 + * If the array build state was initialized with a separate memory
 context,
 + * this also frees all the memory (by deleting the subcontext). If a
 parent
 + * context was used directly, the memory may be freed by an explicit
 pfree()
 + * call (unless it's meant to be freed by destroying the parent context).
 + *
   * astate is working state (must not be NULL)
   * rcontext is where to construct result
   */

 Simple pfree(astate) call is not enough to free the memory. If it's scalar
 accumulation (initArrayResult), the user must pfree(astate-dvalues) and
 pfree(astate-dnulls) before astate. If it's array accumulation,
 pfree(astate-data) and pfree(astate-nullbitmap), with both can be null if
 no array accumulated and some other cases. If its any (scalar or array)
 accumulation, it's more complicated.

 I suggest it's simpler to just force the API user to destroy the parent
 context instead. So the comment become like this:
 @@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
  /*
   * makeArrayResult - produce 1-D final result of accumArrayResult
   *
 + * If the array build state was initialized with a separate memory
 context,
 + * this also frees all the memory (by deleting the subcontext). If a
 parent
 + * context was used directly, the memory is meant to be freed by
 destroying
 + * the parent context.
 + *
   * astate is working state (must not be NULL)
   * rcontext is where to construct result
   */


Sorry, there is another comment of makeMdArrayResult, i suggest also
changing it like this:
@@ -4738,6 +4764,12 @@ makeArrayResult(ArrayBuildState *astate,
  * beware: no check that specified dimensions match the number of values
  * accumulated.
  *
+ * beware: if the astate was not initialized within a separate memory
+ * context (i.e. using subcontext=true when calling initArrayResult),
+ * using release=true is illegal as it releases the whole context,
+ * and that may include other memory still used elsewhere (instead use
+ * release=false and release the memory with the parent context later)
+ *
  * astate is working state (must not be NULL)
  * rcontext is where to construct result

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-20 Thread Ali Akbar

2015-01-20 14:22 GMT+07:00 Jeff Davis pg...@j-davis.com:

 The current patch, which I am evaluating for commit, does away with
 per-group memory contexts (it uses a common context for all groups), and
 reduces the initial array allocation from 64 to 8 (but preserves
 doubling behavior).


Jeff  Tomas, spotted this comment in v8 patch:
@@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
 /*
  * makeArrayResult - produce 1-D final result of accumArrayResult
  *
+ * If the array build state was initialized with a separate memory context,
+ * this also frees all the memory (by deleting the subcontext). If a parent
+ * context was used directly, the memory may be freed by an explicit
pfree()
+ * call (unless it's meant to be freed by destroying the parent context).
+ *
  * astate is working state (must not be NULL)
  * rcontext is where to construct result
  */

Simple pfree(astate) call is not enough to free the memory. If it's scalar
accumulation (initArrayResult), the user must pfree(astate-dvalues) and
pfree(astate-dnulls) before astate. If it's array accumulation,
pfree(astate-data) and pfree(astate-nullbitmap), with both can be null if
no array accumulated and some other cases. If its any (scalar or array)
accumulation, it's more complicated.

I suggest it's simpler to just force the API user to destroy the parent
context instead. So the comment become like this:
@@ -4713,6 +4733,11 @@ accumArrayResult(ArrayBuildState *astate,
 /*
  * makeArrayResult - produce 1-D final result of accumArrayResult
  *
+ * If the array build state was initialized with a separate memory context,
+ * this also frees all the memory (by deleting the subcontext). If a parent
+ * context was used directly, the memory is meant to be freed by destroying
+ * the parent context.
+ *
  * astate is working state (must not be NULL)
  * rcontext is where to construct result
  */

Regards,
-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-18 Thread Ali Akbar

2015-01-18 10:44 GMT+07:00 Peter Eisentraut pete...@gmx.net:

 On 1/7/15 8:51 PM, Ali Akbar wrote:
  So now +1 for back-patching this.

 Done.  Was a bit of work to get it into all the old versions.


Wow. Thanks.

Btw, for bug-fix patches like this, should the patch creator (me) also
create patches for back branches?

Regards,
-- 
Ali Akbar

Re: [HACKERS] Add generate_series(numeric, numeric)

2015-01-13 Thread Ali Akbar

2014-12-18 19:35 GMT+07:00 Fujii Masao masao.fu...@gmail.com:

 On Mon, Dec 15, 2014 at 2:38 PM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
  I was thinking something like this, added just after that para:
 
  warning
   para
While the actual arguments to the function remain unchanged between
calls, if you detoast the argument values (which is normally done
transparently by the
functionPG_GETARG_replaceablexxx/replaceable/function
 macro)
in the transient context then the detoasted copies will be freed on
each cycle. Accordingly, if you keep references to such values in
your structfielduser_fctx/, you must either copy them into the
structfieldmulti_call_memory_ctx/ after detoasting, or ensure
that you detoast the values only in that context.
   /para
  /warning

 I'm OK with this.


Wrapping the doc changes in a patch. Will add to next commitfest so it
won't be lost.

-- 
Ali Akbar
*** a/doc/src/sgml/xfunc.sgml
--- b/doc/src/sgml/xfunc.sgml
***
*** 2986,2991  SRF_RETURN_DONE(funcctx)
--- 2986,3005 
   structfieldmulti_call_memory_ctx/ while doing the first-call setup.
  /para
  
+ warning
+  para
+   While the actual arguments to the function remain unchanged between
+   calls, if you detoast the argument values (which is normally done
+   transparently by the
+   functionPG_GETARG_replaceablexxx/replaceable/function macro)
+   in the transient context then the detoasted copies will be freed on
+   each cycle. Accordingly, if you keep references to such values in
+   your structfielduser_fctx/, you must either copy them into the
+   structfieldmulti_call_memory_ctx/ after detoasting, or ensure
+   that you detoast the values only in that context.
+  /para
+ /warning
+ 
  para
   A complete pseudo-code example looks like the following:
  programlisting

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2015-01-11 Thread Ali Akbar

  Or else we implement what you suggest below (more comments below):
 
  Thinking about the 'release' flag a bit more - maybe we could do
  this
  instead:
 
  if (release  astate-private_cxt)
  MemoryContextDelete(astate-mcontext);
  else if (release)
  {
  pfree(astate-dvalues);
  pfree(astate-dnulls);
  pfree(astate);
  }
 
  i.e. either destroy the whole context if possible, and just free
 the
  memory when using a shared memory context. But I'm afraid this
 would
  penalize the shared memory context, because that's intended for
  cases where all the build states coexist in parallel and then at
 some
  point are all converted into a result and thrown away. Adding
 pfree()
  calls is no improvement here, and just wastes cycles.
 
 
  As per Tom's comment, i'm using parent memory context instead of
  shared memory context below.
 
  In the future, if some code writer decided to use subcontext=false,
  to save memory in cases where there are many array accumulation, and
  the parent memory context is long-living, current code can cause
  memory leak. So i think we should implement your suggestion
  (pfreeing astate), and warn the implication in the API comment. The
  API user must choose between release=true, wasting cycles but
  preventing memory leak, or managing memory from the parent memory
  context.

 I'm wondering whether this is necessary after fixing makeArrayResult to
 use the privat_cxt flag. It's still possible to call makeMdArrayResult
 directly (with the wrong 'release' value).

 Another option might be to get rid of the 'release' flag altogether, and
 just use the 'private_cxt' - I'm not aware of a code using release=false
 with private_cxt=true (e.g. to build the same array twice from the same
 astate). But maybe there's such code, and another downside is that it'd
 break the existing API.

  In one possible use case, for efficiency maybe the caller will
  create a special parent memory context for all array accumulation.
  Then uses makeArrayResult* with release=false, and in the end
  releasing the parent memory context once for all.

 Yeah, although I'd much rather not break the existing code at all. That
 is - my goal is not to make it slower unless absolutely necessary (and
 in that case the code may be fixed per your suggestion). But I'm not
 convinced it's worth it.


OK. Do you think we need to note this in the comments? Something like this:
If using subcontext=false, the caller must be careful about memory usage,
because makeArrayResult* will not free the memory used.



 But I think it makes sense to move the error handling into
 initArrayResultArr(), including the get_element_type() call, and remove
 the element_type from the signature. This means initArrayResultAny()
 will call the get_element_type() twice, but I guess that's negligible.
 And everyone who calls initArrayResultArr() will get the error handling
 for free.

 Patch v7 attached, implementing those two changes, i.e.

   * makeMdArrayResult(..., astate-private_cxt)
   * move error handling into initArrayResultArr()
   * remove element_type from initArrayResultArr() signature


Reviewing the v7 patch:
- applies cleanly to current master. patch format, whitespace, etc is good
- make check runs without error
- performance  memory usage still consistent

If you think we don't have to add the comment (see above), i'll mark this
as ready for committer

Regards,
-- 
Ali Akbar

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2014-12-29 Thread Ali Akbar

2014-12-29 14:38 GMT+07:00 Jeff Davis pg...@j-davis.com:

 Just jumping into this patch now. Do we think this is worth changing the
 signature of functions in array.h, which might be used from a lot of
 third-party code? We might want to provide new functions to avoid a
 breaking change.


V6 patch from Tomas only change initArrayResult* functions. initArrayResult
is new API in 9.5 (commit bac2739), with old API still works as-is.

-- 
Ali Akbar

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2014-12-21 Thread Ali Akbar


 Another positive benefit is that this won't break the code unless it
 uses the new API. This is a problem especially with external code (e.g.
 extensions), but the new API (initArray*) is not part of 9.4 so there's
 no such code. So that's nice.

 The one annoying thing is that this makes the API slighly unbalanced.
 With the new API you can use a shared memory context, which with the old
 one (not using the initArray* methods) you can't.

 But I'm OK with that, and it makes the patch smaller (15kB - 11kB).


Yes, with this API, we can backpatch this patch to 9.4 (or below) if we
need it there.

I think this API is a good compromise of old API and new API. Ideally if we
can migrate all code to new API (all code must call initArrayResult* before
accumArrayResult*), we can remove parameter MemoryContext rcontext from
accumArrayResult. Currently, the code isn't using the rcontext for anything
except for old API calls (in first call to accumArrayResult).

2014-12-21 20:38 GMT+07:00 Tomas Vondra t...@fuzzy.cz:

 On 21.12.2014 02:54, Alvaro Herrera wrote:
  Tomas Vondra wrote:
  Attached is v5 of the patch, fixing an error with releasing a shared
  memory context (invalid flag values in a few calls).
 
  The functions that gain a new argument should get their comment updated,
  to explain what the new argument is for.

 Right. I've added a short description of the 'subcontext' parameter to
 all three variations of the initArray* function, and a more thorough
 explanation to initArrayResult().



With this API, i think we should make it clear if we call initArrayResult
with subcontext=false, we can't call makeArrayResult, but we must use
makeMdArrayResult directly.

Or better, we can modify makeArrayResult to release according to
astate-private_cxt:

 @@ -4742,7 +4742,7 @@ makeArrayResult(ArrayBuildState *astate,
 dims[0] = astate-nelems;
 lbs[0] = 1;

 -   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext, true);
 +   return makeMdArrayResult(astate, ndims, dims, lbs, rcontext,
 astate-private_cxt);




Or else we implement what you suggest below (more comments below):

Thinking about the 'release' flag a bit more - maybe we could do this
 instead:

 if (release  astate-private_cxt)
 MemoryContextDelete(astate-mcontext);
 else if (release)
 {
 pfree(astate-dvalues);
 pfree(astate-dnulls);
 pfree(astate);
 }

 i.e. either destroy the whole context if possible, and just free the
 memory when using a shared memory context. But I'm afraid this would
 penalize the shared memory context, because that's intended for cases
 where all the build states coexist in parallel and then at some point
 are all converted into a result and thrown away. Adding pfree() calls is
 no improvement here, and just wastes cycles.


As per Tom's comment, i'm using parent memory context instead of shared
memory context below.

In the future, if some code writer decided to use subcontext=false, to save
memory in cases where there are many array accumulation, and the parent
memory context is long-living, current code can cause memory leak. So i
think we should implement your suggestion (pfreeing astate), and warn the
implication in the API comment. The API user must choose between
release=true, wasting cycles but preventing memory leak, or managing memory
from the parent memory context.

In one possible use case, for efficiency maybe the caller will create a
special parent memory context for all array accumulation. Then uses
makeArrayResult* with release=false, and in the end releasing the parent
memory context once for all.

Regards,
-- 
Ali Akbar

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2014-12-20 Thread Ali Akbar

2014-12-16 11:01 GMT+07:00 Ali Akbar the.ap...@gmail.com:



 2014-12-16 10:47 GMT+07:00 Ali Akbar the.ap...@gmail.com:


 2014-12-16 6:27 GMT+07:00 Tomas Vondra t...@fuzzy.cz:
 Just fast-viewing the patch.

 The patch is not implementing the checking for not creating new context
 in initArrayResultArr. I think we should implement it also there for
 consistency (and preventing future problems).



Testing the performance with your query, looks promising: speedup is
between 12% ~ 15%.

Because i'm using 32-bit systems, setting work_mem to 1024GB failed:

 ERROR:  1073741824 is outside the valid range for parameter work_mem (64
 .. 2097151)
 STATEMENT:  SET work_mem = '1024GB';
 psql:/media/truecrypt1/oss/postgresql/postgresql/../patches/array-agg.sql:20:
 ERROR:  1073741824 is outside the valid range for parameter work_mem (64
 .. 2097151)


Maybe because of that, in the large groups a test, the speedup is awesome:

 master: 16,819 ms

with patch: 1,720 ms

Looks like with master, postgres resort to disk, but with the patch it fits
in memory.

Note: I hasn't tested the large dataset.

As expected, testing array_agg(anyarray), the performance is still the
same, because the subcontext hasn't implemented there (test script modified
from Tomas', attached).

I implemented the subcontext checking in initArrayResultArr by changing the
v3 patch like this:

 +++ b/src/backend/utils/adt/arrayfuncs.c
 @@ -4797,10 +4797,11 @@ initArrayResultArr(Oid array_type, Oid
 element_type, MemoryContext rcontext,
bool subcontext)
  {
 ArrayBuildStateArr *astate;
 -   MemoryContext arr_context;
 +   MemoryContext arr_context = rcontext;   /* by default use the parent
 ctx */

 /* Make a temporary context to hold all the junk */
 -   arr_context = AllocSetContextCreate(rcontext,
 +   if (subcontext)
 +   arr_context = AllocSetContextCreate(rcontext,
 accumArrayResultArr,
 ALLOCSET_DEFAULT_MINSIZE,
 ALLOCSET_DEFAULT_INITSIZE,


Testing the performance, it got the 12%~15% speedup. Good. (patch attached)


Looking at the modification in accumArrayResult* functions, i don't really
 comfortable with:

1. Code that calls accumArrayResult* after explicitly calling
initArrayResult* must always passing subcontext, but it has no effect.
2. All existing codes that calls accumArrayResult must be changed.

 Just an idea: why don't we minimize the change in API like this:

1. Adding parameter bool subcontext, only in initArrayResult*
functions but not in accumArrayResult*
2. Code that want to not creating subcontext must calls
initArrayResult* explicitly.

 Other codes that calls directly to accumArrayResult can only be changed in
 the call to makeArrayResult* (with release=true parameter). In places that
 we don't want to create subcontext (as in array_agg_transfn), modify it to
 use initArrayResult* before calling accumArrayResult*.

 What do you think?



As per your concern about calling initArrayResult* with subcontext=false,
while makeArrayResult* with release=true:

   Also, it seems that using 'subcontext=false' and then 'release=true'
   would be a bug. Maybe it would be appropriate to store the
   'subcontext' value into the ArrayBuildState and then throw an error
   if makeArrayResult* is called with (release=true  subcontext=false).


Yes, i think we should do that to minimize unexpected coding errors. In
makeArrayResult*, i think its better to not throwing an error, but using
assertions:

 Assert(release == false || astate-subcontext == true);


Regards,
-- 
Ali Akbar


array-agg-anyarray.sql
Description: application/sql
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 18ae318..48e66bf 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -1271,13 +1271,13 @@ dblink_get_connections(PG_FUNCTION_ARGS)
 			/* stash away current value */
 			astate = accumArrayResult(astate,
 	  CStringGetTextDatum(hentry-name),
-	  false, TEXTOID, CurrentMemoryContext);
+	  false, TEXTOID, CurrentMemoryContext, true);
 		}
 	}
 
 	if (astate)
 		PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
-			  CurrentMemoryContext));
+			  CurrentMemoryContext, true));
 	else
 		PG_RETURN_NULL();
 }
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c16b38e..bd5eb32 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -671,7 +671,7 @@ transformRelOptions(Datum oldOptions, List *defList, char *namspace,
 /* No match, so keep old option */
 astate = accumArrayResult(astate, oldoptions[i],
 		  false, TEXTOID,
-		  CurrentMemoryContext);
+		  CurrentMemoryContext, true);
 			}
 		}
 	}
@@ -758,12 +758,12 @@ transformRelOptions(Datum oldOptions, List *defList, char *namspace,
 
 			astate

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2014-12-15 Thread Ali Akbar

2014-12-16 6:27 GMT+07:00 Tomas Vondra t...@fuzzy.cz:

 On 15.12.2014 22:35, Jeff Janes wrote:
  On Sat, Nov 29, 2014 at 8:57 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  Hi,
 
  Attached is v2 of the patch lowering array_agg memory requirements.
  Hopefully it addresses the issues issues mentioned by TL in this
 thread
  (not handling some of the callers appropriately etc.).
 
 
  Hi Tomas,
 
  When configured --with-libxml I get compilation errors:
 
  xml.c: In function 'xml_xpathobjtoxmlarray':
  xml.c:3684: error: too few arguments to function 'accumArrayResult'
  xml.c:3721: error: too few arguments to function 'accumArrayResult'
  xml.c: In function 'xpath':
  xml.c:3933: error: too few arguments to function 'initArrayResult'
  xml.c:3936: error: too few arguments to function 'makeArrayResult'
 
  And when configured --with-perl, I get:
 
  plperl.c: In function 'array_to_datum_internal':
  plperl.c:1196: error: too few arguments to function 'accumArrayResult'
  plperl.c: In function 'plperl_array_to_datum':
  plperl.c:1223: error: too few arguments to function 'initArrayResult'
 
  Cheers,

 Thanks, attached is a version that fixes this.


Just fast-viewing the patch.

The patch is not implementing the checking for not creating new context in
initArrayResultArr. I think we should implement it also there for
consistency (and preventing future problems).

Regards,
-- 
Ali Akbar

Re: [HACKERS] PATCH: decreasing memory needlessly consumed by array_agg

2014-12-15 Thread Ali Akbar

2014-12-16 10:47 GMT+07:00 Ali Akbar the.ap...@gmail.com:


 2014-12-16 6:27 GMT+07:00 Tomas Vondra t...@fuzzy.cz:

 On 15.12.2014 22:35, Jeff Janes wrote:
  On Sat, Nov 29, 2014 at 8:57 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  Hi,
 
  Attached is v2 of the patch lowering array_agg memory requirements.
  Hopefully it addresses the issues issues mentioned by TL in this
 thread
  (not handling some of the callers appropriately etc.).
 
 
  Hi Tomas,
 
  When configured --with-libxml I get compilation errors:
 
  xml.c: In function 'xml_xpathobjtoxmlarray':
  xml.c:3684: error: too few arguments to function 'accumArrayResult'
  xml.c:3721: error: too few arguments to function 'accumArrayResult'
  xml.c: In function 'xpath':
  xml.c:3933: error: too few arguments to function 'initArrayResult'
  xml.c:3936: error: too few arguments to function 'makeArrayResult'
 
  And when configured --with-perl, I get:
 
  plperl.c: In function 'array_to_datum_internal':
  plperl.c:1196: error: too few arguments to function 'accumArrayResult'
  plperl.c: In function 'plperl_array_to_datum':
  plperl.c:1223: error: too few arguments to function 'initArrayResult'
 
  Cheers,

 Thanks, attached is a version that fixes this.


 Just fast-viewing the patch.

 The patch is not implementing the checking for not creating new context in
 initArrayResultArr. I think we should implement it also there for
 consistency (and preventing future problems).


Looking at the modification in accumArrayResult* functions, i don't really
comfortable with:

   1. Code that calls accumArrayResult* after explicitly calling
   initArrayResult* must always passing subcontext, but it has no effect.
   2. All existing codes that calls accumArrayResult must be changed.

Just an idea: why don't we minimize the change in API like this:

   1. Adding parameter bool subcontext, only in initArrayResult* functions
   but not in accumArrayResult*
   2. Code that want to not creating subcontext must calls initArrayResult*
   explicitly.

Other codes that calls directly to accumArrayResult can only be changed in
the call to makeArrayResult* (with release=true parameter). In places that
we don't want to create subcontext (as in array_agg_transfn), modify it to
use initArrayResult* before calling accumArrayResult*.

What do you think?

Regards,
--
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar


 I ran a test using postgres-US.fo built in the PostgreSQL source tree,
 which is 38 MB, and ran

 select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
 'http://www.w3.org/1999/XSL/Format']])) from data;

 (Table contains one row only.)

 The timings were basically indistinguishable between the three code
 versions.

 I'll try to reproduce your test.  How big is your file?  Do you have a
 link to the actual file?  Could you share your load script?


I use this xml sample:
http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml

Basically i loaded the xml to table u 100 times. Load script attached.

Regards,
-- 
Ali Akbar


load_test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar

2014-12-14 22:18 GMT+07:00 Ali Akbar the.ap...@gmail.com:


 I ran a test using postgres-US.fo built in the PostgreSQL source tree,
 which is 38 MB, and ran

 select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
 'http://www.w3.org/1999/XSL/Format']])) from data;

 (Table contains one row only.)

 The timings were basically indistinguishable between the three code
 versions.

 I'll try to reproduce your test.  How big is your file?  Do you have a
 link to the actual file?  Could you share your load script?


 I use this xml sample:
 http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml

 Basically i loaded the xml to table u 100 times. Load script attached.


Peter, while reviewing the better performing patch myself, now i think the
patch needs more work to be committed. The structuring of the method will
be confusing in the long term. I think i'll restructure the patch in the
next commitfest.

So i propose to break the patch:
1. We apply the current patch which uses xmlNodeCopy, so that the
long-standing bug will be fixed in postgres.
2. I'll work with the performance enhancement in the next commitfest.

Maybe for (2), the current better-performing patch can be viewed as PoC of
the expected performance.

Regards,
-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar

2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
  Peter, while reviewing the better performing patch myself, now i think
 the
  patch needs more work to be committed. The structuring of the method
 will be
  confusing in the long term. I think I'll restructure the patch in the
 next
  commitfest.
  So i propose to break the patch:
  1. We apply the current patch which uses xmlNodeCopy, so that the
  long-standing bug will be fixed in postgres.
  2. I'll work with the performance enhancement in the next commitfest.
 
  Maybe for (2), the current better-performing patch can be viewed as PoC
 of
  the expected performance.

 Ali, are you currently working on that? Would you mind re-creating new
 entries in the commit fest app for the new set of patches that you are
 planning to do?
 For now I am switching this patch as returned with feedback.
 Thanks,


What i mean, the last patch (v7 patch) as it is is enough to fix the bug
(nested xpath namespace problem). I think the performance regression is
still acceptable (in my case it's ~20%), because the bug is severe.
Currently, xpath can return invalid xml because the namespace isn't
included in the output!

What i'll be working is the v4 patch, because it turns out the v4 patch has
better performance (~10%, but Peter's test shows it isn't the case). But,
the problem is the v4 patch is organized wrongly, and hacks around the
libxml's xml node structure (duplicating the namespace on the existing
structure). I'll work on that, but it will affects the performance benefit.

So what i propose is, we close the longstanding bug in this comitfest with
the v7 patch. I'll work on improving the performance, without compromising
good code structure. If the result is good, i'll submit the patch.

Thanks

Regards,
-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar

2014-12-15 11:02 GMT+07:00 Ali Akbar the.ap...@gmail.com:

 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
  Peter, while reviewing the better performing patch myself, now i think
 the
  patch needs more work to be committed. The structuring of the method
 will be
  confusing in the long term. I think I'll restructure the patch in the
 next
  commitfest.
  So i propose to break the patch:
  1. We apply the current patch which uses xmlNodeCopy, so that the
  long-standing bug will be fixed in postgres.
  2. I'll work with the performance enhancement in the next commitfest.
 
  Maybe for (2), the current better-performing patch can be viewed as PoC
 of
  the expected performance.

 Ali, are you currently working on that? Would you mind re-creating new
 entries in the commit fest app for the new set of patches that you are
 planning to do?
 For now I am switching this patch as returned with feedback.
 Thanks,


 What i mean, the last patch (v7 patch) as it is is enough to fix the bug
 (nested xpath namespace problem). I think the performance regression is
 still acceptable (in my case it's ~20%), because the bug is severe.
 Currently, xpath can return invalid xml because the namespace isn't
 included in the output!


Sorry, typo here. What i mean isn't invalid xml, but incomplete xml.
Hence the next call to xpath with the previous result as its input will
become a problem because the namespace will not match.



 What i'll be working is the v4 patch, because it turns out the v4 patch
 has better performance (~10%, but Peter's test shows it isn't the case).
 But, the problem is the v4 patch is organized wrongly, and hacks around the
 libxml's xml node structure (duplicating the namespace on the existing
 structure). I'll work on that, but it will affects the performance benefit.

 So what i propose is, we close the longstanding bug in this comitfest with
 the v7 patch. I'll work on improving the performance, without compromising
 good code structure. If the result is good, i'll submit the patch.

 Thanks


Regards,
-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar

2014-12-15 11:06 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:

 On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar the.ap...@gmail.com wrote:
  2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com:
 
  On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote:
   Peter, while reviewing the better performing patch myself, now i think
   the
   patch needs more work to be committed. The structuring of the method
   will be
   confusing in the long term. I think I'll restructure the patch in the
   next
   commitfest.
   So i propose to break the patch:
   1. We apply the current patch which uses xmlNodeCopy, so that the
   long-standing bug will be fixed in postgres.
   2. I'll work with the performance enhancement in the next commitfest.
  
   Maybe for (2), the current better-performing patch can be viewed as
 PoC
   of
   the expected performance.
 
  Ali, are you currently working on that? Would you mind re-creating new
  entries in the commit fest app for the new set of patches that you are
  planning to do?
  For now I am switching this patch as returned with feedback.
  Thanks,
 
 
  What i mean, the last patch (v7 patch) as it is is enough to fix the bug
  (nested xpath namespace problem). I think the performance regression is
  still acceptable (in my case it's ~20%), because the bug is severe.
  Currently, xpath can return invalid xml because the namespace isn't
 included
  in the output!
 
  What i'll be working is the v4 patch, because it turns out the v4 patch
 has
  better performance (~10%, but Peter's test shows it isn't the case). But,
  the problem is the v4 patch is organized wrongly, and hacks around the
  libxml's xml node structure (duplicating the namespace on the existing
  structure). I'll work on that, but it will affects the performance
 benefit.
 
  So what i propose is, we close the longstanding bug in this comitfest
 with
  the v7 patch. I'll work on improving the performance, without
 compromising
  good code structure. If the result is good, i'll submit the patch.
 OK. Could you then move this patch to the new CF with Needs Review

with Peter as reviewer then? He seems to be looking at it anyway
 seeing the update from 12/11.


OK. Moved to the new CF.

Regards,
-- 
Ali Akbar

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-12-14 Thread Ali Akbar

2014-12-15 10:25 GMT+07:00 Andrew Gierth and...@tao11.riddles.org.uk:

  Fujii == Fujii Masao masao.fu...@gmail.com writes:

  Fujii Pushed.

 Bug found:

 regression=# select count(*) from generate_series(1::numeric,10) v,
 generate_series(1,v) w;
  count
 ---
  0
 (1 row)

 regression=# select count(*) from generate_series(1::numeric,10) v,
 generate_series(1,v+0) w;
  count
 ---
 55
 (1 row)

 The error is in the use of PG_GETARG_NUMERIC and init_var_from_num
 when setting up the multi-call state; init_var_from_num points at the
 original num's digits rather than copying them, but start_num and
 stop_num have just been (potentially) detoasted in the per-call
 context, in which case the storage will have been freed by the next
 call.


Oops.

Obviously this could also be fixed by not detoasting the input until
 after switching to the multi-call context, but it looks to me like
 that would be unnecessarily complex.

 Suggested patch attached.


Thanks


 (Is it also worth putting an explicit warning about this kind of thing
 in the SRF docs?)


 I think yes, it will be good. The alternative is restructuring this
paragraph in the SRF docs:

The memory context that is current when the SRF is called is a transient
 context that will be cleared between calls. This means that you do not need
 to call pfree on everything you allocated using palloc; it will go away
 anyway. However, if you want to allocate any data structures to live across
 calls, you need to put them somewhere else. The memory context referenced
 by multi_call_memory_ctx is a suitable location for any data that needs
 to survive until the SRF is finished running. In most cases, this means
 that you should switch into multi_call_memory_ctx while doing the
 first-call setup.


The important part However, if you want to allocate any data structures to
live across calls, you need to put them somewhere else. is buried in the
docs.

But i think explicit warning is more noticeable for new developer like me.

Regards,
-- 
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-11-25 Thread Ali Akbar

2014-11-26 0:38 GMT+07:00 Tom Lane t...@sss.pgh.pa.us:

 Pavel Stehule pavel.steh...@gmail.com writes:
  2014-10-27 11:20 GMT+01:00 Ali Akbar the.ap...@gmail.com:
  [ array_agg_anyarray-13.patch ]

  This patch is ready for commit

 I've committed this after some significant modifications.

 I did not like the API chosen, specifically the business about callers
 needing to deal with both accumArrayResult and accumMdArray, because that
 seemed pretty messy and error-prone.


Thanks!

When in the reviewing process, we tried to implement in existing API, and
it was messy, so the last patch is with two API. We didn't think what you
eventually did. 3 API: existing for scalar, a new API for array, _and_ a
new API for both. Great!!

Just curious, in accumArrayResultArr, while enlarging array and
nullbitmaps, why it's implemented with:

astate-abytes = Max(astate-abytes * 2,
  astate-nbytes + ndatabytes);


and

astate-aitems = Max(astate-aitems * 2, newnitems);


won't it be more consistent if it's implemented just like in the first
allocation?:

while (astate-aitems = newnitems)
 astate-aitems *= 2;


Anyway, thanks for the big modifications. I learned a lot from that.

Regards,
-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-05 Thread Ali Akbar

2014-11-04 22:16 GMT+07:00 Peter Eisentraut pete...@gmx.net:


 On 10/6/14 10:24 PM, Ali Akbar wrote:
  While reviewing the patch myself, i spotted some formatting problems in
  the code. Fixed in this v5 patch.
 
  Also, this patch uses context patch format (in first versions, because
  of the small modification, context patch format obfucates the changes.
  After reimplementation this isn't the case anymore)

 I think the problem this patch is addressing is real, and your approach
 is sound, but I'd ask you to go back to the xmlCopyNode() version, and
 try to add a test case for why the second argument = 1 is necessary.  I
 don't see any other problems.


 OK. Because upstream code is fixed in current version, i'll revert to the
 previous version. Test case added to regression test. With =1 argument, the
 result is correct:
 local:piece xmlns:local=\http://127.0.0.1\; xmlns=\http://127.0.0.2\;
 id=\1\
internalnumber one/internal
internal2/
  /local:piece

 without the argument, the result is not correct, all children will be
 lost. Because of that, the other regression test will fail too because the
 children is not copied:
 *** 584,593 

   -- Text XPath expressions evaluation
   SELECT xpath('/value', data) FROM xmltest;
 ! xpath
 ! --
 !  {valueone/value}
 !  {valuetwo/value}
   (2 rows)

   SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
 --- 584,593 

   -- Text XPath expressions evaluation
   SELECT xpath('/value', data) FROM xmltest;
 !xpath
 ! 
 !  {value/}
 !  {value/}
   (2 rows)

   SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
 ***
 ... cut

 updated patch attached.


I noticed somewhat big performance regression if the xml is large (i use
PRODML Product Volume sample from energistics.org):
* Without patch (tested 3 times):
select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
---
 flow
+
 kindgas
lift/kind+
...
Time: 84,012 ms
Time: 85,683 ms
Time: 88,766 ms


* With latest v6 patch (notice the correct result with namespace
definition):

select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
---
 flow xmlns=http://www.prodml.org/schemas/1series;
+
...
Time: 108,912 ms
Time: 108,267 ms
Time: 114,848 ms


It's 23% performance regression.

* Just curious, i'm also testing v5 patch performance (notice the namespace
in the result):
select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
---
 flow xmlns=http://www.prodml.org/schemas/1series;
+
 kindgas
lift/kind+
Time: 92,552 ms
Time: 97,440 ms
Time: 99,309 ms

The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is
much more cleaner than v5patch, should we consider the performance benefit?

Anyway, thanks for the review! :)

Regards,
--
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-27 Thread Ali Akbar

2014-10-27 9:11 GMT+07:00 Ali Akbar the.ap...@gmail.com:


 2014-10-27 1:38 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 My idea is using new ArrayBuilder optimized for building multidimensional
 arrays with own State type. I think so casting to ArrayBuildState is base
 of our problems, so I don't would to do. Code in array_agg_* is simple,
 little bit more complex code is in nodeSubplan.c. Some schematic changes
 are in attachments.


 Thanks! The structure looks clear, and thanks for the example on
 nodeSubplan.c. I will restructure the v10 of the patch to this structure.


Patch attached.

Regards,
-- 
Ali Akbar
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..f59738a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry
  row
   entry
indexterm
+primaryarray_agg/primary
+   /indexterm
+   functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+  /entry
+  entry
+   any
+  /entry
+  entry
+   the same array type as input type
+  /entry
+  entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+ /row
+
+ row
+  entry
+   indexterm
 primaryaverage/primary
/indexterm
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2f0680f..8c182a4 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  array
 ---
  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+
+SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+ array
+---
+ {{1},{2},{3},{4},{5}}
 (1 row)
 /programlisting
The subquery must return a single column. The resulting
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 401bad4..eb4de3b 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -231,7 +231,10 @@ ExecScanSubPlan(SubPlanState *node,
 	bool		found = false;	/* TRUE if got at least one subplan tuple */
 	ListCell   *pvar;
 	ListCell   *l;
+	bool		use_md_array_builder;
+	Oid			md_array_element_type;
 	ArrayBuildState *astate = NULL;
+	MdArrayBuildState *mdastate = NULL;
 
 	/*
 	 * MULTIEXPR subplans, when executed, just return NULL; but first we
@@ -366,8 +369,25 @@ ExecScanSubPlan(SubPlanState *node,
 			/* stash away current value */
 			Assert(subplan-firstColType == tdesc-attrs[0]-atttypid);
 			dvalue = slot_getattr(slot, 1, disnull);
-			astate = accumArrayResult(astate, dvalue, disnull,
-	  subplan-firstColType, oldcontext);
+			/*
+			 * use a fast array multidimensional builder when input is a array
+			 * only check on first iteration. On subsequent, use the cached values
+			 */
+			if (astate == NULL  mdastate == NULL)
+			{
+md_array_element_type = get_element_type(subplan-firstColType);
+use_md_array_builder = OidIsValid(md_array_element_type);
+			}
+
+			if (use_md_array_builder)
+mdastate = accumMdArray(mdastate,
+		disnull? NULL :
+ DatumGetArrayTypeP(dvalue),
+		disnull, md_array_element_type,
+		oldcontext);
+			else
+astate = accumArrayResult(astate, dvalue, disnull,
+		  subplan-firstColType, oldcontext);
 			/* keep scanning subplan to collect all values */
 			continue;
 		}
@@ -439,6 +459,8 @@ ExecScanSubPlan(SubPlanState *node,
 		/* We return the result in the caller's context */
 		if (astate != NULL)
 			result = makeArrayResult(astate, oldcontext);
+		else if (mdastate != NULL)
+			result = makeMdArray(mdastate, oldcontext);
 		else
 			result = PointerGetDatum(construct_empty_array(subplan-firstColType));
 	}
@@ -951,7 +973,10 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 	ListCell   *pvar;
 	ListCell   *l;
 	bool		found = false;
+	bool		use_md_array_builder;
+	Oid			md_array_element_type;
 	ArrayBuildState *astate = NULL;
+	MdArrayBuildState *mdastate = NULL;
 
 	if (subLinkType == ANY_SUBLINK ||
 		subLinkType == ALL_SUBLINK)
@@ -1018,8 +1043,25 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 			/* stash away current value */
 			Assert(subplan-firstColType == tdesc-attrs[0]-atttypid);
 			dvalue = slot_getattr(slot, 1, disnull);
-			astate = accumArrayResult(astate, dvalue, disnull,
-	  subplan-firstColType, oldcontext);
+			/*
+			 * use a fast array multidimensional builder when input is a array
+			 * only check on first iteration. On subsequent, use the cached values
+			 */
+			if (astate == NULL  mdastate == NULL)
+			{
+md_array_element_type = get_element_type(subplan-firstColType);
+use_md_array_builder = OidIsValid

Re: [HACKERS] Function array_agg(array)

2014-10-27 Thread Ali Akbar

2014-10-27 16:15 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 I did some minor changes in code

 * move tests of old or new builder style for array sublink out of main
 cycles
 * some API simplification of new builder - we should not to create
 identical API, mainly it has no sense


minor changes in the patch:
* remove array_agg_finalfn_internal declaration in top of array_userfuncs.c
* fix comment of makeMdArray
* fix return of makeMdArray
* remove unnecesary changes to array_agg_finalfn

Regards,
--
Ali Akbar
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..f59738a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry
  row
   entry
indexterm
+primaryarray_agg/primary
+   /indexterm
+   functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+  /entry
+  entry
+   any
+  /entry
+  entry
+   the same array type as input type
+  /entry
+  entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+ /row
+
+ row
+  entry
+   indexterm
 primaryaverage/primary
/indexterm
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2f0680f..8c182a4 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  array
 ---
  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+
+SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+ array
+---
+ {{1},{2},{3},{4},{5}}
 (1 row)
 /programlisting
The subquery must return a single column. The resulting
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 401bad4..21b8de1 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -231,7 +231,10 @@ ExecScanSubPlan(SubPlanState *node,
 	bool		found = false;	/* TRUE if got at least one subplan tuple */
 	ListCell   *pvar;
 	ListCell   *l;
+	bool		use_md_array_builder = false;
+	Oid			md_array_element_type = InvalidOid;
 	ArrayBuildState *astate = NULL;
+	MdArrayBuildState *mdastate = NULL;
 
 	/*
 	 * MULTIEXPR subplans, when executed, just return NULL; but first we
@@ -260,6 +263,16 @@ ExecScanSubPlan(SubPlanState *node,
 	}
 
 	/*
+	 * use a fast array multidimensional builder when input is a array
+	 * only check on first iteration. On subsequent, use the cached values
+	 */
+	if (subLinkType == ARRAY_SUBLINK)
+	{
+		md_array_element_type = get_element_type(subplan-firstColType);
+		use_md_array_builder = OidIsValid(md_array_element_type);
+	}
+
+	/*
 	 * We are probably in a short-lived expression-evaluation context. Switch
 	 * to the per-query context for manipulating the child plan's chgParam,
 	 * calling ExecProcNode on it, etc.
@@ -366,8 +379,16 @@ ExecScanSubPlan(SubPlanState *node,
 			/* stash away current value */
 			Assert(subplan-firstColType == tdesc-attrs[0]-atttypid);
 			dvalue = slot_getattr(slot, 1, disnull);
-			astate = accumArrayResult(astate, dvalue, disnull,
-	  subplan-firstColType, oldcontext);
+
+			if (use_md_array_builder)
+mdastate = accumMdArray(mdastate,
+		disnull? NULL :
+ DatumGetArrayTypeP(dvalue),
+		disnull, md_array_element_type,
+		oldcontext);
+			else
+astate = accumArrayResult(astate, dvalue, disnull,
+		  subplan-firstColType, oldcontext);
 			/* keep scanning subplan to collect all values */
 			continue;
 		}
@@ -439,6 +460,8 @@ ExecScanSubPlan(SubPlanState *node,
 		/* We return the result in the caller's context */
 		if (astate != NULL)
 			result = makeArrayResult(astate, oldcontext);
+		else if (mdastate != NULL)
+			result = PointerGetDatum(makeMdArray(mdastate, oldcontext, true));
 		else
 			result = PointerGetDatum(construct_empty_array(subplan-firstColType));
 	}
@@ -951,7 +974,10 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 	ListCell   *pvar;
 	ListCell   *l;
 	bool		found = false;
+	bool		use_md_array_builder = false;
+	Oid			md_array_element_type = InvalidOid;
 	ArrayBuildState *astate = NULL;
+	MdArrayBuildState *mdastate = NULL;
 
 	if (subLinkType == ANY_SUBLINK ||
 		subLinkType == ALL_SUBLINK)
@@ -960,6 +986,16 @@ ExecSetParamPlan(SubPlanState *node, ExprContext *econtext)
 		elog(ERROR, CTE subplans should not be executed via ExecSetParamPlan);
 
 	/*
+	 * use a fast array multidimensional builder when input is a array
+	 * only check on first iteration. On subsequent, use the cached values
+	 */
+	if (subLinkType == ARRAY_SUBLINK)
+	{
+		md_array_element_type = get_element_type(subplan-firstColType);
+		use_md_array_builder = OidIsValid

Re: [HACKERS] Function array_agg(array)

2014-10-27 Thread Ali Akbar


 super

 I tested last version and I have not any objections.

 1. We would to have this feature - it is long time number of our ToDo List

 2. Proposal and design of multidimensional aggregation is clean and nobody
 has objection here.

 3. There is zero impact on current implementation. From performance
 reasons it uses new array optimized aggregator - 30% faster for this
 purpose than current (scalar) array aggregator

 4. Code is clean and respect PostgreSQL coding rules

 5. There are documentation and necessary regress tests

 6. Patching and compilation is clean without warnings.

 This patch is ready for commit

 Thank you for patch


Thank you for the thorough review process.

Regards,
-- 
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-26 Thread Ali Akbar

2014-10-27 1:38 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 My idea is using new ArrayBuilder optimized for building multidimensional
 arrays with own State type. I think so casting to ArrayBuildState is base
 of our problems, so I don't would to do. Code in array_agg_* is simple,
 little bit more complex code is in nodeSubplan.c. Some schematic changes
 are in attachments.


Thanks! The structure looks clear, and thanks for the example on
nodeSubplan.c. I will restructure the v10 of the patch to this structure.

Regards,
--
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-25 Thread Ali Akbar

2014-10-25 10:29 GMT+07:00 Ali Akbar the.ap...@gmail.com:

 I fixed small issue in regress tests and I enhanced tests for varlena
 types and null values.

 Thanks.

 it is about 15% faster than original implementation.

 15% faster than array_agg(scalar)? I haven't verify the performance, but
 because the internal array data and null bitmap is copied as-is, that will
 be faster.

 2014-10-25 1:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi Ali


 I checked a code. I am thinking so code organization is not good.
 accumArrayResult is too long now. makeMdArrayResult will not work, when
 arrays was joined (it is not consistent now). I don't like a usage of
 state-is_array_accum in array_userfunc.c -- it is signal of wrong
 wrapping.

 Yes, i was thinking the same. Attached WIP patch to reorganizate the code.
 makeMdArrayResult works now, with supplied arguments act as override from
 default values calculated from ArrayBuildStateArray.

 In array_userfunc.c, because we don't want to override ndims, dims and
 lbs, i copied array_agg_finalfn and only change the call to
 makeMdArrayResult (we don't uses makeArrayResult because we want to set
 release to false). Another alternative is to create new
 makeArrayResult-like function that has parameter bool release.


adding makeArrayResult1 (do you have better name for this?), that accepts
bool release parameter. array_agg_finalfn becomes more clean, and no
duplicate code in array_agg_anyarray_finalfn.


 next question: there is function array_append(anyarray, anyelement). Isn't
 time to define array_append(anyarray, anyarray) now?


 There is array_cat(anyarray, anyarray):

 /*-
  * array_cat :
  * concatenate two nD arrays to form an nD array, or
  * push an (n-1)D array onto the end of an nD array

  *
  */


Regards,
-- 
Ali Akbar
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..f59738a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry
  row
   entry
indexterm
+primaryarray_agg/primary
+   /indexterm
+   functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+  /entry
+  entry
+   any
+  /entry
+  entry
+   the same array type as input type
+  /entry
+  entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+ /row
+
+ row
+  entry
+   indexterm
 primaryaverage/primary
/indexterm
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2f0680f..8c182a4 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  array
 ---
  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+
+SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+ array
+---
+ {{1},{2},{3},{4},{5}}
 (1 row)
 /programlisting
The subquery must return a single column. The resulting
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 41e973b..0261fcb 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -108,12 +108,16 @@ exprType(const Node *expr)
 	type = exprType((Node *) tent-expr);
 	if (sublink-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-			format_type_be(exprType((Node *) tent-expr);
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+format_type_be(exprType((Node *) tent-expr);
+		}
 	}
 }
 else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
@@ -139,12 +143,16 @@ exprType(const Node *expr)
 	type = subplan-firstColType;
 	if (subplan-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-	format_type_be(subplan-firstColType;
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR

Re: [HACKERS] Function array_agg(array)

2014-10-25 Thread Ali Akbar


  makeArrayResult1 - I have no better name now

 I found one next minor detail.

 you reuse a array_agg_transfn function. Inside is a message
 array_agg_transfn called in non-aggregate context. It is not correct for
 array_agg_anyarray_transfn


Fixed.


 probably specification dim and lbs in array_agg_finalfn is useless, when
 you expect a natural result. A automatic similar to
 array_agg_anyarray_finalfn should work there too.

 makeMdArrayResultArray and appendArrayDatum  should be marked as static


ok, marked all of that as static.


 I am thinking so change of makeArrayResult is wrong. It is developed for
 1D array. When we expect somewhere ND result now, then we should to use
 makeMdArrayResult there. So makeArrayResult should to return 1D array in
 all cases. Else a difference between makeArrayResult and makeMdArrayResult
 hasn't big change and we have a problem with naming.


I'm thinking it like this:
- if we want to accumulate array normally, use accumArrayResult and
makeArrayResult. If we accumulate scalar the result will be 1D array. if we
accumulate array, the resulting dimension is incremented by 1.
- if we want, somehow to affect the normal behavior, and change the
dimensions, use makeMdArrayResult.

Searching through the postgres' code, other than internal use in
arrayfuncs.c, makeMdArrayResult is used only
in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl
array to postgres array.

So if somehow we will accumulate array other than in array_agg, i think the
most natural way is using accumArrayResult and then makeArrayResult.

CMIIW

Regards,
--
Ali Akbar
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..f59738a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry
  row
   entry
indexterm
+primaryarray_agg/primary
+   /indexterm
+   functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+  /entry
+  entry
+   any
+  /entry
+  entry
+   the same array type as input type
+  /entry
+  entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+ /row
+
+ row
+  entry
+   indexterm
 primaryaverage/primary
/indexterm
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2f0680f..8c182a4 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  array
 ---
  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+
+SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+ array
+---
+ {{1},{2},{3},{4},{5}}
 (1 row)
 /programlisting
The subquery must return a single column. The resulting
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 41e973b..0261fcb 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -108,12 +108,16 @@ exprType(const Node *expr)
 	type = exprType((Node *) tent-expr);
 	if (sublink-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-			format_type_be(exprType((Node *) tent-expr);
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+format_type_be(exprType((Node *) tent-expr);
+		}
 	}
 }
 else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
@@ -139,12 +143,16 @@ exprType(const Node *expr)
 	type = subplan-firstColType;
 	if (subplan-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-	format_type_be(subplan-firstColType;
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+		format_type_be(subplan-firstColType;
+		}
 	}
 }
 else if (subplan-subLinkType == MULTIEXPR_SUBLINK)
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3e7dc85

Re: [HACKERS] Function array_agg(array)

2014-10-25 Thread Ali Akbar

2014-10-25 15:43 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:



 2014-10-25 10:16 GMT+02:00 Ali Akbar the.ap...@gmail.com:

  makeArrayResult1 - I have no better name now

 I found one next minor detail.

 you reuse a array_agg_transfn function. Inside is a message
 array_agg_transfn called in non-aggregate context. It is not correct for
 array_agg_anyarray_transfn


 Fixed.


 probably specification dim and lbs in array_agg_finalfn is useless, when
 you expect a natural result. A automatic similar to
 array_agg_anyarray_finalfn should work there too.

 makeMdArrayResultArray and appendArrayDatum  should be marked as
 static


 ok, marked all of that as static.


 I am thinking so change of makeArrayResult is wrong. It is developed for
 1D array. When we expect somewhere ND result now, then we should to use
 makeMdArrayResult there. So makeArrayResult should to return 1D array in
 all cases. Else a difference between makeArrayResult and makeMdArrayResult
 hasn't big change and we have a problem with naming.


 I'm thinking it like this:
 - if we want to accumulate array normally, use accumArrayResult and
 makeArrayResult. If we accumulate scalar the result will be 1D array. if we
 accumulate array, the resulting dimension is incremented by 1.
 - if we want, somehow to affect the normal behavior, and change the
 dimensions, use makeMdArrayResult.

 Searching through the postgres' code, other than internal use in
 arrayfuncs.c, makeMdArrayResult is used only
 in src/pl/plperl/plperl.c:plperl_array_to_datum, while converting perl
 array to postgres array.

 So if somehow we will accumulate array other than in array_agg, i think
 the most natural way is using accumArrayResult and then makeArrayResult.


 ok, there is more variants and I can't to decide. But I am not satisfied
 with this API. We do some wrong in structure. makeMdArrayResult is now
 ugly.


One approach that i can think is we cleanly separate the structures and
API. We don't touch existing ArrayBuildState, accumArrayResult,
makeArrayResult and makeMdArrayResult, and we create new functions:
ArrayBuildStateArray, accumArrayResultArray, makeArrayResultArray and
makeArrayResultArrayCustom.

But if we do that, the array(subselect) implementation will not be as
simple as current patch. We must also change all the ARRAY_SUBLINK-related
accumArrayResult call.

Regarding current patch implementation, the specific typedefs are declared
as:
+typedef struct ArrayBuildStateScalar
+{
+ ArrayBuildState astate;
...
+} ArrayBuildStateArray;

so it necessities rather ugly code like this:
+ result-elemtype = astate-astate.element_type;

Can we use C11 feature of unnamed struct like this? :
+typedef struct ArrayBuildStateScalar
+{
+ ArrayBuildState;
...
+} ArrayBuildStateArray;

so the code can be a little less ugly by doing it like this:
+ result-elemtype = astate-element_type;

I don't know whether all currently supported compilers implements this
feature..


Can you suggest a better structure for this?

If we can't settle on a better structure, i think i'll reimplement
array_agg without the performance improvement, using deconstruct_array and
unchanged accumArrayResult  makeMdArrayResult.

Thanks,
-- 
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-25 Thread Ali Akbar


 you can check it? We can test, how performance lost we get. As second
 benefit we can get numbers for introduction new optimized array builder


array_agg(anyarray) with deconstruct_array, unchanged accumArrayResult and
makeMdArrayResult:

 INSERT 0 1
 Time: 852,527 ms
 INSERT 0 1
 Time: 844,275 ms
 INSERT 0 1
 Time: 858,855 ms
 INSERT 0 1
 Time: 861,072 ms
 INSERT 0 1
 Time: 952,006 ms
 INSERT 0 1
 Time: 953,918 ms
 INSERT 0 1
 Time: 926,945 ms
 INSERT 0 1
 Time: 923,692 ms
 INSERT 0 1
 Time: 940,916 ms
 INSERT 0 1
 Time: 948,700 ms
 INSERT 0 1
 Time: 933,333 ms
 INSERT 0 1
 Time: 948,869 ms
 INSERT 0 1
 Time: 847,113 ms
 INSERT 0 1
 Time: 908,572 ms



Total: 12776.83

Avg: 912,63


with last patch (v10):

 INSERT 0 1
 Time: 643,339 ms
 INSERT 0 1
 Time: 608,010 ms
 INSERT 0 1
 Time: 610,465 ms
 INSERT 0 1
 Time: 613,931 ms
 INSERT 0 1
 Time: 616,466 ms
 INSERT 0 1
 Time: 634,754 ms
 INSERT 0 1
 Time: 683,566 ms
 INSERT 0 1
 Time: 656,665 ms
 INSERT 0 1
 Time: 630,096 ms
 INSERT 0 1
 Time: 607,564 ms
 INSERT 0 1
 Time: 610,353 ms
 INSERT 0 1
 Time: 626,816 ms
 INSERT 0 1
 Time: 610,450 ms
 INSERT 0 1
 Time: 614,342 ms



Total: 8842,7
 Avg: 631,6


It's 30% faster (i tried varlena element - text). I tried several times and
it's consistent in +/- 30%.


quick  dirty non-optimized patch and the test script attached.

Regards,
--
Ali Akbar
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..f59738a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry
  row
   entry
indexterm
+primaryarray_agg/primary
+   /indexterm
+   functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+  /entry
+  entry
+   any
+  /entry
+  entry
+   the same array type as input type
+  /entry
+  entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+ /row
+
+ row
+  entry
+   indexterm
 primaryaverage/primary
/indexterm
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2f0680f..8c182a4 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  array
 ---
  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+
+SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+ array
+---
+ {{1},{2},{3},{4},{5}}
 (1 row)
 /programlisting
The subquery must return a single column. The resulting
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 41e973b..0261fcb 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -108,12 +108,16 @@ exprType(const Node *expr)
 	type = exprType((Node *) tent-expr);
 	if (sublink-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-			format_type_be(exprType((Node *) tent-expr);
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+format_type_be(exprType((Node *) tent-expr);
+		}
 	}
 }
 else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
@@ -139,12 +143,16 @@ exprType(const Node *expr)
 	type = subplan-firstColType;
 	if (subplan-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-	format_type_be(subplan-firstColType;
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+		format_type_be(subplan-firstColType;
+		}
 	}
 }
 else if (subplan-subLinkType == MULTIEXPR_SUBLINK)
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 3e7dc85..8fc8b49 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -668,10 +668,16 @@ build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
 
 		Assert(!te-resjunk);
 		Assert(testexpr == NULL

Re: [HACKERS] Function array_agg(array)

2014-10-24 Thread Ali Akbar

Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append
 it.


i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for
array(subselect) defined?


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be 
 preferred.

 We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
 haven't we?

 Actually array_agg(anyarray) can be implemented by using
 accumArrayResult and makeMdArrayResult, but in the process we will
 deconstruct the array data and null bit-flags into ArrayBuildState-dvalues
 and dnulls. And we will reconstruct it through makeMdArrayResult. I think
 this way it's not efficient, so i decided to reimplement it and memcpy the
 array data and null flags as-is.


 aha, so isn't better to fix a performance for accumArrayResult ?


 Ok, i'll go this route. While reading the code of array(subselect) as you
 pointed below, i think the easiest way to implement support for both
 array_agg(anyarray) and array(subselect) is to change accumArrayResult so
 it operates both with scalar datum(s) and array datums, with performance
 optimization for the latter.


implemented it by modifying ArrayBuildState to ArrayBuildStateArray and
ArrayBuildStateScalar (do you have any idea for better struct naming?)


 In other places, i think it's clearer if we just use accumArrayResult and
 makeArrayResult/makeMdArrayResult as API for building (multidimensional)
 arrays.


 3. array_agg was consistent with array(subselect), so it should be
 fixed too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


 I'm pretty new in postgresql development. Can you point out where is
 array(subselect) implemented?


 where you can start?

 postgres=# explain select array(select a from test);
 ERROR:  42704: could not find array type for data type integer[]
 LOCATION:  exprType, nodeFuncs.c:116

 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
 postgresql sources this keyword


 attention: probably we don't would to allow arrays everywhere.


I've changed the places where i think it's appropriate.


 4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) /
 8);


 just follow the arbitrary size choosen in accumArrayResult
 (arrayfuncs.c):
 astate-alen = 64; /* arbitrary starting array size */

 it can be any number not too small and too big. Too small, and we will
 realloc shortly. Too big, we will end up wasting memory.


 you can try to alloc 1KB instead as start -- it is used more times in
 Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

 You take this value from accumArrayResult, but it is targeted for
 shorted scalars - you should to expect so any array will be much larger.


this patch allocates 1KB (1024 bytes) if the ndatabytes is  512bytes. If
it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 *
number of items in array.

Regards,
--
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 12046,12051  NULL baz/literallayout(3 rows)/entry
--- 12046,12067 
   row
entry
 indexterm
+ primaryarray_agg/primary
+/indexterm
+functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+   /entry
+   entry
+any
+   /entry
+   entry
+the same array type as input type
+   /entry
+   entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+  /row
+ 
+  row
+   entry
+indexterm
  primaryaverage/primary
 /indexterm
 indexterm
*** a/src/backend/nodes/nodeFuncs.c
--- b/src/backend/nodes/nodeFuncs.c
***
*** 108,119  exprType(const Node *expr)
  	type = exprType((Node *) tent-expr);
  	if (sublink-subLinkType == ARRAY_SUBLINK)
  	{
! 		type = get_array_type(type);
! 		if (!OidIsValid(type))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_UNDEFINED_OBJECT),
! 	 errmsg(could not find array type for data type %s,
! 			format_type_be(exprType((Node *) tent-expr);
  	}
  }
  else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
--- 108,123 
  	type = exprType((Node *) tent-expr);
  	if (sublink-subLinkType == ARRAY_SUBLINK)
  	{
! 		if (!OidIsValid

Re: [HACKERS] Function array_agg(array)

2014-10-24 Thread Ali Akbar

2014-10-24 15:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 it looks well

 doc:
 http://www.postgresql.org/docs/9.4/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
 it should be fixed too

 Regards

 Pavel


doc updated with additional example for array(subselect). patch attached.

Regards,
--
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 12046,12051  NULL baz/literallayout(3 rows)/entry
--- 12046,12067 
   row
entry
 indexterm
+ primaryarray_agg/primary
+/indexterm
+functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+   /entry
+   entry
+any
+   /entry
+   entry
+the same array type as input type
+   /entry
+   entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+  /row
+ 
+  row
+   entry
+indexterm
  primaryaverage/primary
 /indexterm
 indexterm
*** a/doc/src/sgml/syntax.sgml
--- b/doc/src/sgml/syntax.sgml
***
*** 2238,2243  SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
--- 2238,2248 
   array
  ---
   {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+ 
+ SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+  array
+ ---
+  {{1},{2},{3},{4},{5}}
  (1 row)
  /programlisting
 The subquery must return a single column. The resulting
*** a/src/backend/nodes/nodeFuncs.c
--- b/src/backend/nodes/nodeFuncs.c
***
*** 108,119  exprType(const Node *expr)
  	type = exprType((Node *) tent-expr);
  	if (sublink-subLinkType == ARRAY_SUBLINK)
  	{
! 		type = get_array_type(type);
! 		if (!OidIsValid(type))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_UNDEFINED_OBJECT),
! 	 errmsg(could not find array type for data type %s,
! 			format_type_be(exprType((Node *) tent-expr);
  	}
  }
  else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
--- 108,123 
  	type = exprType((Node *) tent-expr);
  	if (sublink-subLinkType == ARRAY_SUBLINK)
  	{
! 		if (!OidIsValid(get_element_type(type)))
! 		{
! 			/* not array, so check for its array type */
! 			type = get_array_type(type);
! 			if (!OidIsValid(type))
! ereport(ERROR,
! 		(errcode(ERRCODE_UNDEFINED_OBJECT),
! 		 errmsg(could not find array type for data type %s,
! format_type_be(exprType((Node *) tent-expr);
! 		}
  	}
  }
  else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
***
*** 139,150  exprType(const Node *expr)
  	type = subplan-firstColType;
  	if (subplan-subLinkType == ARRAY_SUBLINK)
  	{
! 		type = get_array_type(type);
! 		if (!OidIsValid(type))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_UNDEFINED_OBJECT),
! 	 errmsg(could not find array type for data type %s,
! 	format_type_be(subplan-firstColType;
  	}
  }
  else if (subplan-subLinkType == MULTIEXPR_SUBLINK)
--- 143,158 
  	type = subplan-firstColType;
  	if (subplan-subLinkType == ARRAY_SUBLINK)
  	{
! 		if (!OidIsValid(get_element_type(type)))
! 		{
! 			/* not array, so check for its array type */
! 			type = get_array_type(type);
! 			if (!OidIsValid(type))
! ereport(ERROR,
! 		(errcode(ERRCODE_UNDEFINED_OBJECT),
! 		 errmsg(could not find array type for data type %s,
! 		format_type_be(subplan-firstColType;
! 		}
  	}
  }
  else if (subplan-subLinkType == MULTIEXPR_SUBLINK)
*** a/src/backend/optimizer/plan/subselect.c
--- b/src/backend/optimizer/plan/subselect.c
***
*** 668,677  build_subplan(PlannerInfo *root, Plan *plan, PlannerInfo *subroot,
  
  		Assert(!te-resjunk);
  		Assert(testexpr == NULL);
! 		arraytype = get_array_type(exprType((Node *) te-expr));
! 		if (!OidIsValid(arraytype))
! 			elog(ERROR, could not find array type for datatype %s,
!  format_type_be(exprType((Node *) te-expr)));
  		prm = generate_new_param(root,
   arraytype,
   exprTypmod((Node *) te-expr),
--- 668,683 
  
  		Assert(!te-resjunk);
  		Assert(testexpr == NULL);
! 
! 		arraytype = exprType((Node *) te-expr);
! 		if (!OidIsValid(get_element_type(arraytype)))
! 		{
! 			/* not array, so get the array type */
! 			arraytype = get_array_type(exprType((Node *) te-expr));
! 			if (!OidIsValid(arraytype))
! elog(ERROR, could not find array type for datatype %s,
! 	 format_type_be(exprType((Node *) te-expr)));
! 		}
  		prm = generate_new_param(root,
   arraytype,
   exprTypmod((Node *) te-expr),
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src

Re: [HACKERS] Function array_agg(array)

2014-10-24 Thread Ali Akbar

2014-10-24 16:26 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 some in last patch is wrong, I cannot to compile it:

 arrayfuncs.c: In function ‘accumArrayResult’:
 arrayfuncs.c:4603:9: error: ‘ArrayBuildState’ has no member named ‘alen’
astate-alen = 64;  /* arbitrary starting array size */
  ^
 arrayfuncs.c:4604:9: error: ‘ArrayBuildState’ has no member named ‘dvalues’
astate-dvalues = (Datum *) palloc(astate-alen * sizeof(Datum));
  ^
 arrayfuncs.c:4604:44: error: ‘ArrayBuildState’ has no member named ‘alen’
astate-dvalues = (Datum *) palloc(astate-alen * sizeof(Datum));
 ^
 arrayfuncs.c:4605:9: error: ‘ArrayBuildState’ has no member named ‘dnulls’
astate-dnulls = (bool *) palloc(astate-alen * sizeof(bool));
  ^
 arrayfuncs.c:4605:42: error: ‘ArrayBuildState’ has no member named ‘alen’
astate-dnulls = (bool *) palloc(astate-alen * sizeof(bool));
   ^
 arrayfuncs.c:4606:9: error: ‘ArrayBuildState’ has no member named ‘nelems’
astate-nelems = 0;
  ^
 arrayfuncs.c:4618:13: error: ‘ArrayBuildState’ has no member named ‘nelems’
if (astate-nelems = astate-alen)
  ^
 arrayfuncs.c:4618:31: error: ‘ArrayBuildState’ has no member named ‘alen’
if (astate-nelems = astate-alen)
^
 arrayfuncs.c:4620:10: error: ‘ArrayBuildState’ has no member named ‘alen’
 astate-alen *= 2;


Sorry,  correct patch attached.

This patch is in patience format (git --patience ..). In previous patches,
i use context format (git --patience ... | filterdiff --format=context),
but it turns out that some modification is lost.

-- 
Ali Akbar
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7e5bcd9..f59738a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12046,6 +12046,22 @@ NULL baz/literallayout(3 rows)/entry
  row
   entry
indexterm
+primaryarray_agg/primary
+   /indexterm
+   functionarray_agg(replaceable class=parameteranyarray/replaceable)/function
+  /entry
+  entry
+   any
+  /entry
+  entry
+   the same array type as input type
+  /entry
+  entryinput arrays, aggregated into higher-order multidimesional array. Rejects NULL and empty array as input./entry
+ /row
+
+ row
+  entry
+   indexterm
 primaryaverage/primary
/indexterm
indexterm
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 2f0680f..8c182a4 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -2238,6 +2238,11 @@ SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
  array
 ---
  {2011,1954,1948,1952,1951,1244,1950,2005,1949,1953,2006,31,2412,2413}
+
+SELECT ARRAY(SELECT array(select i) FROM generate_series(1,5) a(i));
+ array
+---
+ {{1},{2},{3},{4},{5}}
 (1 row)
 /programlisting
The subquery must return a single column. The resulting
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 41e973b..0261fcb 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -108,12 +108,16 @@ exprType(const Node *expr)
 	type = exprType((Node *) tent-expr);
 	if (sublink-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-			format_type_be(exprType((Node *) tent-expr);
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+format_type_be(exprType((Node *) tent-expr);
+		}
 	}
 }
 else if (sublink-subLinkType == MULTIEXPR_SUBLINK)
@@ -139,12 +143,16 @@ exprType(const Node *expr)
 	type = subplan-firstColType;
 	if (subplan-subLinkType == ARRAY_SUBLINK)
 	{
-		type = get_array_type(type);
-		if (!OidIsValid(type))
-			ereport(ERROR,
-	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(could not find array type for data type %s,
-	format_type_be(subplan-firstColType;
+		if (!OidIsValid(get_element_type(type)))
+		{
+			/* not array, so check for its array type */
+			type = get_array_type(type);
+			if (!OidIsValid(type))
+ereport(ERROR,
+		(errcode(ERRCODE_UNDEFINED_OBJECT),
+		 errmsg(could not find array type for data type %s,
+		format_type_be(subplan-firstColType;
+		}
 	}
 }
 else if (subplan-subLinkType

Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Ali Akbar

Thanks for the review

2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append it.

OK, i'll add the documentation and regression test


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be preferred.

We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
haven't we?

Actually array_agg(anyarray) can be implemented by using accumArrayResult
and makeMdArrayResult, but in the process we will deconstruct the array
data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we
will reconstruct it through makeMdArrayResult. I think this way it's not
efficient, so i decided to reimplement it and memcpy the array data and
null flags as-is.

In other places, i think it's clearer if we just use accumArrayResult and
makeArrayResult/makeMdArrayResult as API for building (multidimensional)
arrays.


 3. array_agg was consistent with array(subselect), so it should be fixed
 too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


I'm pretty new in postgresql development. Can you point out where is
array(subselect) implemented?



 4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8);


just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
astate-alen = 64; /* arbitrary starting array size */

it can be any number not too small and too big. Too small, and we will
realloc shortly. Too big, we will end up wasting memory.

Regards,
-- 
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-22 Thread Ali Akbar

2014-10-22 22:48 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 2014-10-22 16:58 GMT+02:00 Ali Akbar the.ap...@gmail.com:

 Thanks for the review

 2014-10-22 20:51 GMT+07:00 Pavel Stehule pavel.steh...@gmail.com:

 I agree with your proposal. I have a few comments to design:


 1. patch doesn't hold documentation and regress tests, please append it.

 OK, i'll add the documentation and regression test


 2. this functionality (multidimensional aggregation) can be interesting
 more times, so maybe some interface like array builder should be preferred.

 We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
 haven't we?

 Actually array_agg(anyarray) can be implemented by using accumArrayResult
 and makeMdArrayResult, but in the process we will deconstruct the array
 data and null bit-flags into ArrayBuildState-dvalues and dnulls. And we
 will reconstruct it through makeMdArrayResult. I think this way it's not
 efficient, so i decided to reimplement it and memcpy the array data and
 null flags as-is.


 aha, so isn't better to fix a performance for accumArrayResult ?


Ok, i'll go this route. While reading the code of array(subselect) as you
pointed below, i think the easiest way to implement support for both
array_agg(anyarray) and array(subselect) is to change accumArrayResult so
it operates both with scalar datum(s) and array datums, with performance
optimization for the latter.

In other places, i think it's clearer if we just use accumArrayResult and
 makeArrayResult/makeMdArrayResult as API for building (multidimensional)
 arrays.


 3. array_agg was consistent with array(subselect), so it should be fixed
 too

 postgres=# select array_agg(a) from test;
array_agg
 ---
  {{1,2,3,4},{1,2,3,4}}
 (1 row)

 postgres=# select array(select a from test);
 ERROR:  could not find array type for data type integer[]


 I'm pretty new in postgresql development. Can you point out where is
 array(subselect) implemented?


 where you can start?

 postgres=# explain select array(select a from test);
 ERROR:  42704: could not find array type for data type integer[]
 LOCATION:  exprType, nodeFuncs.c:116

 look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
 postgresql sources this keyword


Found it. Thanks.

On a side note in postgres array type for integer[] is still integer[], but
in pg_type, integer[] has no array type. I propose we consider to change it
so array type of anyarray is itself (not in this patch, of course, because
it is a big change). Consider the following code in nodeFuncs.c:109

if (sublink-subLinkType == ARRAY_SUBLINK)
{
type = get_array_type(type);
if (!OidIsValid(type))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_OBJECT),
 errmsg(could not find array type for data type %s,
format_type_be(exprType((Node *) tent-expr);
}

to implement array(subselect) you pointed above, we must specially checks
for array types. Quick search on get_array_type returns 10 places.

I'll think more about this later. For this patch, i'll go without changes
in pg_type.h.

4. why you use a magic constant (64) there?

 + astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
 + astate-aitems = 64 * nitems;

 + astate-nullbitmap = (bits8 *)
 + repalloc(astate-nullbitmap, (astate-aitems + 7) / 8);


 just follow the arbitrary size choosen in accumArrayResult (arrayfuncs.c):
 astate-alen = 64; /* arbitrary starting array size */

 it can be any number not too small and too big. Too small, and we will
 realloc shortly. Too big, we will end up wasting memory.


 you can try to alloc 1KB instead as start -- it is used more times in
 Postgres. Then a overhead is max 1KB per agg call - what is acceptable.

 You take this value from accumArrayResult, but it is targeted for shorted
 scalars - you should to expect so any array will be much larger.


Ok.


Regards,
--
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-19 Thread Ali Akbar


 So, is there any idea how we will handle NULL and empty array in
 array_agg(anyarray)?
 I propose we just reject those input because the output will make no sense:
 - array_agg(NULL::int[]) -- the result will be indistinguished from
 array_agg of NULL ints.
 - array_agg('{}'::int[]) -- how we determine the dimension of the result?
 is it 0? Or the result will be just an empty array {} ?


This updated patch rejects NULL and {} arrays as noted above.

Regards,
-- 
Ali Akbar
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src/backend/utils/adt/array_userfuncs.c
***
*** 16,21 
--- 16,51 
  #include utils/builtins.h
  #include utils/lsyscache.h
  
+ #include utils/memutils.h
+ 
+ /*-
+  * ArrayAggAnyArrayState:
+  *		aggregate state for array_agg(anyarray)
+  *-
+  */
+ typedef struct
+ {
+ 	MemoryContext mcontext;		/* where all the temp stuff is kept */
+ 	char	   *data;			/* array of accumulated data */
+ 	bits8	   *nullbitmap;		/* bitmap of is-null flags for data */
+ 
+ 	int			abytes;			/* allocated length of above arrays */
+ 	int			aitems;			/* allocated length of above arrays */
+ 	int			nbytes;			/* number of used bytes in above arrays */
+ 	int			nitems;			/* number of elements in above arrays */
+ 	int			narray;			/* number of array accumulated */
+ 	Oid			element_type;	/* data type of the Datums */
+ 	int16		typlen;			/* needed info about datatype */
+ 	bool		typbyval;
+ 	char		typalign;
+ 
+ 	int			ndims;			/* element dimensions */
+ 	int		   *dims;
+ 	int		   *lbs;
+ 
+ 	bool		hasnull;		/* any element has null */
+ } ArrayAggAnyArrayState;
+ 
  
  /*-
   * array_push :
***
*** 544,546  array_agg_finalfn(PG_FUNCTION_ARGS)
--- 574,821 
  
  	PG_RETURN_DATUM(result);
  }
+ 
+ /*
+  * ARRAY_AGG(anyarray) aggregate function
+  */
+ Datum
+ array_agg_anyarray_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext aggcontext,
+ arr_context,
+ oldcontext;
+ 	ArrayAggAnyArrayState *astate;
+ 
+ 	Oid			arg_typeid = get_fn_expr_argtype(fcinfo-flinfo, 1);
+ 	Oid			arg_elemtype = get_element_type(arg_typeid);
+ 	ArrayType  *arg;
+ 	int		   *dims,
+ 			   *lbs,
+ ndims,
+ nitems,
+ ndatabytes;
+ 	char	   *data;
+ 
+ 	int			i;
+ 
+ 	if (arg_elemtype == InvalidOid)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(could not determine input data type)));
+ 
+ 	if (!AggCheckCallContext(fcinfo, aggcontext))
+ 		elog(ERROR, array_agg_anyarray_transfn called in non-aggregate context);
+ 
+ 	if (PG_ARGISNULL(1))
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(cannot aggregate null arrays)));
+ 
+ 	astate = PG_ARGISNULL(0) ? NULL : (ArrayAggAnyArrayState *) PG_GETARG_POINTER(0);
+ 
+ 	if (astate == NULL)
+ 	{
+ 		arr_context = AllocSetContextCreate(aggcontext,
+ 			array_agg_anyarray_transfn,
+ 			ALLOCSET_DEFAULT_MINSIZE,
+ 			ALLOCSET_DEFAULT_INITSIZE,
+ 			ALLOCSET_DEFAULT_MAXSIZE);
+ 		oldcontext = MemoryContextSwitchTo(arr_context);
+ 		astate = (ArrayAggAnyArrayState *) palloc(sizeof(ArrayAggAnyArrayState));
+ 
+ 		astate-mcontext = arr_context;
+ 		astate-abytes = 0;
+ 		astate-aitems = 0;
+ 		astate-data = NULL;
+ 		astate-nullbitmap = NULL;
+ 		astate-nitems = 0;
+ 		astate-narray = 0;
+ 		astate-element_type = arg_elemtype;
+ 		get_typlenbyvalalign(arg_elemtype,
+ 			 astate-typlen,
+ 			 astate-typbyval,
+ 			 astate-typalign);
+ 	}
+ 	else
+ 	{
+ 		oldcontext = MemoryContextSwitchTo(astate-mcontext);
+ 		Assert(astate-element_type == arg_elemtype);
+ 	}
+ 
+ 	arg = PG_GETARG_ARRAYTYPE_P(1);
+ 
+ 	ndims = ARR_NDIM(arg);
+ 	dims = ARR_DIMS(arg);
+ 	lbs = ARR_LBOUND(arg);
+ 	data = ARR_DATA_PTR(arg);
+ 	nitems = ArrayGetNItems(ndims, dims);
+ 
+ 	ndatabytes = ARR_SIZE(arg) - ARR_DATA_OFFSET(arg);
+ 
+ 	if (astate-data == NULL)
+ 	{
+ 		if (ndims == 0)
+ 			ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(cannot aggregate empty arrays)));
+ 
+ 		if (ndims + 1  MAXDIM)
+ 			ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+  errmsg(number of array dimensions (%d) exceeds the maximum allowed (%d),
+ 		ndims + 1, MAXDIM)));
+ 
+ 		astate-ndims = ndims;
+ 		astate-dims = (int *) palloc(ndims * sizeof(int));
+ 		astate-lbs = (int *) palloc(ndims * sizeof(int));
+ 		memcpy(astate-dims, dims, ndims * sizeof(int));
+ 		memcpy(astate-lbs, lbs, ndims * sizeof(int));
+ 
+ 		astate-abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
+ 		astate-aitems = 64 * nitems;
+ 		astate-data = (char *) palloc(astate-abytes);
+ 
+ 		memcpy(astate-data, data, ndatabytes);
+ 		astate-nbytes = ndatabytes;
+ 		astate-nitems = nitems;
+ 
+ 		if (ARR_HASNULL(arg))
+ 		{
+ 			astate-hasnull = true;
+ 			astate

Re: [HACKERS] Allow format 0000-0000-0000 in postgresql MAC parser

2014-10-17 Thread Ali Akbar

 It has been registered now
 (https://commitfest.postgresql.org/action/patch_view?id=1585). I've got
 an updated version of the patch with the documentation fix.


Looks like the patch is all good. I'm marking as ready for commiter.

On a side note, i'm noticing from http://en.wikipedia.org/wiki/MAC_address,
that there is three numbering namespace for MAC: MAC-48, EUI-48 and EUI-64.
The last one is 64 bits long (8 bytes). Currently PostgreSQL's macaddr is
only 6 bytes long. Should we change it to 8 bytes (not in this patch, of
course)?

Regards,
-- 
Ali Akbar

Re: [HACKERS] proposal: plpgsql - Assert statement

2014-10-15 Thread Ali Akbar

2014-09-30 10:04 GMT+07:00 Jim Nasby j...@nasby.net:

 On 9/17/14, 7:40 PM, Jan Wieck wrote:

 Exactly. Doing something like

  ASSERT (select count(*) from foo
  where fk not in (select pk from bar)) = 0;

 is a perfectly fine, arbitrary boolean expression. It will probably work
 well in a development environment too. And I am very sure that it will not
 scale well once that code gets deployed. And I know how DBAs react to the
 guaranteed following performance problem. They will disable ALL assert ...
 or was there some sort of assert class system proposed that I missed?


Actually, compared with for example Java or C, in production systems,
usually all asserts are disabled for performance (in java removed by JIT,
in C we define NDEBUG).


  We're also putting too much weight on the term assert here. C-style
 asserts are generally not nearly as useful in a database as general
 sanity-checking or error handling, especially if you're trying to use the
 database to enforce data sanity.


+1.
without any query capability, assert will become much less useful. If we
cannot query in assert, we will code:

-- perform some query
ASSERT WHEN some_check_on_query_result;

.. and disabling the query in production system will become another trouble.

My wish-list for asserts is:

 - Works at a SQL level
 - Unique/clear way to identify asserts (so you're not guessing where the
 assert came from)

+1


 - Allows for over-riding individual asserts (so if you need to do
 something you're not supposed to do you still have the protection of all
 other asserts)
 - Less verbose than IF THEN RAISE END IF

+1

-- 
Ali Akbar

Re: [HACKERS] Function array_agg(array)

2014-10-12 Thread Ali Akbar

2014-10-11 22:28 GMT+07:00 Tom Lane t...@sss.pgh.pa.us:

 Seems dangerous as heck; certainly it would have side-effects far more
 wide-ranging than just making this particular function work.

 A safer answer is to split array_agg into two functions,
 array_agg(anynonarray) - anyarray
 array_agg(anyarray) - anyarray

 I rather imagine you should do that anyway, because I really doubt
 that this hack is operating quite as intended.  I suspect you are
 producing arrays containing arrays as elements, not true 2-D arrays.
 That's not a direction we want to go in I think; certainly there are
 no other operations that produce such things.


Thanks for the review. Yes, it looks like the patch produced array as the
elements. So, all array operations behaves wierdly.

In this quick  dirty patch, I am trying to implement the
array_agg(anyarray), introducing two new functions:
- array_agg_anyarray_transfn
- array_agg_anyarray_finalfn

At first, i want to use accumArrayResult and makeMdArrayResult, but it's
complicated to work with multi-dimensional arrays with those two functions.
So i combined array_cat with those function.

Currently, it cannot handle NULL arrays:
backend select array_agg(a) from (values(null::int[])) a(a);
 1: array_agg(typeid = 1007, len = -1, typmod = -1, byval = f)

ERROR:  cannot aggregate null arrays

Regards,
-- 
Ali Akbar
*** a/src/backend/utils/adt/array_userfuncs.c
--- b/src/backend/utils/adt/array_userfuncs.c
***
*** 16,21 
--- 16,51 
  #include utils/builtins.h
  #include utils/lsyscache.h
  
+ #include utils/memutils.h
+ 
+ /*-
+  * ArrayAggAnyArrayState:
+  *		aggregate state for array_agg(anyarray)
+  *-
+  */
+ typedef struct
+ {
+ 	MemoryContext mcontext;		/* where all the temp stuff is kept */
+ 	char	   *data;			/* array of accumulated data */
+ 	bits8	   *nullbitmap;		/* bitmap of is-null flags for data */
+ 
+ 	int			abytes;			/* allocated length of above arrays */
+ 	int			aitems;			/* allocated length of above arrays */
+ 	int			nbytes;			/* number of used bytes in above arrays */
+ 	int			nitems;			/* number of elements in above arrays */
+ 	int			narray;			/* number of array accumulated */
+ 	Oid			element_type;	/* data type of the Datums */
+ 	int16		typlen;			/* needed info about datatype */
+ 	bool		typbyval;
+ 	char		typalign;
+ 
+ 	int			ndims;			/* element dimensions */
+ 	int		   *dims;
+ 	int		   *lbs;
+ 
+ 	bool		hasnull;		/* any element has null */
+ } ArrayAggAnyArrayState;
+ 
  
  /*-
   * array_push :
***
*** 544,546  array_agg_finalfn(PG_FUNCTION_ARGS)
--- 574,814 
  
  	PG_RETURN_DATUM(result);
  }
+ 
+ /*
+  * ARRAY_AGG(anyarray) aggregate function
+  */
+ Datum
+ array_agg_anyarray_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext aggcontext,
+ arr_context,
+ oldcontext;
+ 	ArrayAggAnyArrayState *astate;
+ 
+ 	Oid			arg_typeid = get_fn_expr_argtype(fcinfo-flinfo, 1);
+ 	Oid			arg_elemtype = get_element_type(arg_typeid);
+ 	ArrayType  *arg;
+ 	int		   *dims,
+ 			   *lbs,
+ ndims,
+ nitems,
+ ndatabytes;
+ 	char	   *data;
+ 
+ 	int			i;
+ 
+ 	if (arg_elemtype == InvalidOid)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(could not determine input data type)));
+ 
+ 	if (!AggCheckCallContext(fcinfo, aggcontext))
+ 		elog(ERROR, array_agg_anyarray_transfn called in non-aggregate context);
+ 
+ 	if (PG_ARGISNULL(1))
+ 		elog(ERROR, cannot aggregate null arrays);
+ 
+ 	astate = PG_ARGISNULL(0) ? NULL : (ArrayAggAnyArrayState *) PG_GETARG_POINTER(0);
+ 
+ 	if (astate == NULL)
+ 	{
+ 		arr_context = AllocSetContextCreate(aggcontext,
+ 			array_agg_anyarray_transfn,
+ 			ALLOCSET_DEFAULT_MINSIZE,
+ 			ALLOCSET_DEFAULT_INITSIZE,
+ 			ALLOCSET_DEFAULT_MAXSIZE);
+ 		oldcontext = MemoryContextSwitchTo(arr_context);
+ 		astate = (ArrayAggAnyArrayState *) palloc(sizeof(ArrayAggAnyArrayState));
+ 
+ 		astate-mcontext = arr_context;
+ 		astate-abytes = 0;
+ 		astate-aitems = 0;
+ 		astate-data = NULL;
+ 		astate-nullbitmap = NULL;
+ 		astate-nitems = 0;
+ 		astate-narray = 0;
+ 		astate-element_type = arg_elemtype;
+ 		get_typlenbyvalalign(arg_elemtype,
+ 			 astate-typlen,
+ 			 astate-typbyval,
+ 			 astate-typalign);
+ 	}
+ 	else
+ 	{
+ 		oldcontext = MemoryContextSwitchTo(astate-mcontext);
+ 		Assert(astate-element_type == arg_elemtype);
+ 	}
+ 
+ 	arg = PG_GETARG_ARRAYTYPE_P(1);
+ 
+ 	ndims = ARR_NDIM(arg);
+ 	dims = ARR_DIMS(arg);
+ 	lbs = ARR_LBOUND(arg);
+ 	data = ARR_DATA_PTR(arg);
+ 	nitems = ArrayGetNItems(ndims, dims);
+ 
+ 	ndatabytes = ARR_SIZE(arg) - ARR_DATA_OFFSET(arg);
+ 
+ 	if (astate-data == NULL)
+ 	{
+ 		if (ndims + 1  MAXDIM)
+ 			ereport(ERROR,
+ (errcode

[HACKERS] Function array_agg(array)

2014-10-11 Thread Ali Akbar

Greetings,

While looking for easier items in PostgreSQL Wiki's Todo List (this will be
my 3rd patch), i found this TODO:

Add a built-in array_agg(anyarray) or similar, that can aggregate
 1-dimensional arrays into a 2-dimensional array.


I've stumbled by this lack of array_agg(anyarray) sometimes ago in my work,
so i decided to explore this.

Currently, if we try array_agg(anyarray), PostgreSQL behaves like this:

# select array_agg('{1,2}'::int[]);
ERROR:  could not find array type for data type integer[]

Reading implementation of array_agg, it looks like the array_agg function
is generic, and can process any input. The error comes from PostgreSQL not
finding array type for int[] (_int4 in pg_proc).

In PostgreSQL, any array is multidimensional, array type for any array is
the same:
- the type of {1,2} is int[]
- {{1,2}, {3,4}} is int[]
- {{{1},{2}, {3} ,{4}}} is still int[]

So, can't we just set the typarray of array types to its self oid? (patch
attached). So far:
- the array_agg is working and returning correct types:

backend select array_agg('{1,2}'::int[]);
 1: array_agg(typeid = 1007, len = -1, typmod = -1, byval = f)

 1: array_agg = {{1,2}}(typeid = 1007, len = -1, typmod = -1,
byval = f)


select array_agg('{''a'',''b''}'::varchar[]);
 1: array_agg(typeid = 1015, len = -1, typmod = -1, byval = f)

 1: array_agg = {{'a','b'}}(typeid = 1015, len = -1, typmod =
-1, byval = f)



- Regression tests passed except for the pg_type sanity check while
checking typelem relation with typarray:

SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype,
   p2.typelem, p2.typlen
FROM   pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid)
WHERE  p1.typarray  0 AND
   (p2.oid IS NULL OR p2.typelem  p1.oid OR p2.typlen  -1);
!  oid  |basetype|   arraytype| typelem | typlen
! --+++-+
!   143 | _xml   | _xml   | 142 | -1
!   199 | _json  | _json  | 114 | -1
!   629 | _line  | _line  | 628 | -1
!   719 | _circle| _circle| 718 | -1
... (cut)


Aside from the sanity check complaints, I don't see any problems in the
resulting array operations.

So, back to the question: Can't we just set the typarray of array types to
its self oid?

Regards,
-- 
Ali Akbar
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index b7d9256..7934874 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -357,8 +357,8 @@ DATA(insert OID = 114 ( json		   PGNSP PGUID -1 f b U f t \054 0 0 199 json_in j
 DATA(insert OID = 142 ( xml		   PGNSP PGUID -1 f b U f t \054 0 0 143 xml_in xml_out xml_recv xml_send - - - i x f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR(XML content);
 #define XMLOID 142
-DATA(insert OID = 143 ( _xml	   PGNSP PGUID -1 f b A f t \054 0 142 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
-DATA(insert OID = 199 ( _json	   PGNSP PGUID -1 f b A f t \054 0 114 0 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
+DATA(insert OID = 143 ( _xml	   PGNSP PGUID -1 f b A f t \054 0 142 143 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
+DATA(insert OID = 199 ( _json	   PGNSP PGUID -1 f b A f t \054 0 114 199 array_in array_out array_recv array_send - - array_typanalyze i x f 0 -1 0 0 _null_ _null_ _null_ ));
 
 DATA(insert OID = 194 ( pg_node_tree	PGNSP PGUID -1 f b S f t \054 0 0 0 pg_node_tree_in pg_node_tree_out pg_node_tree_recv pg_node_tree_send - - - i x f 0 -1 0 100 _null_ _null_ _null_ ));
 DESCR(string representing an internal node tree);
@@ -395,7 +395,7 @@ DESCR(geometric polygon '(pt1,...)');
 DATA(insert OID = 628 (  line	   PGNSP PGUID 24 f b G f t \054 0 701 629 line_in line_out line_recv line_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR(geometric line);
 #define LINEOID			628
-DATA(insert OID = 629 (  _line	   PGNSP PGUID	-1 f b A f t \054 0 628 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
+DATA(insert OID = 629 (  _line	   PGNSP PGUID	-1 f b A f t \054 0 628 629 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
 
 /* OIDS 700 - 799 */
 
@@ -421,11 +421,11 @@ DESCR();
 DATA(insert OID = 718 (  circlePGNSP PGUID	24 f b G f t \054 0 0 719 circle_in circle_out circle_recv circle_send - - - d p f 0 -1 0 0 _null_ _null_ _null_ ));
 DESCR(geometric circle '(center,radius)');
 #define CIRCLEOID		718
-DATA(insert OID = 719 (  _circle   PGNSP PGUID	-1 f b A f t \054 0  718 0 array_in array_out array_recv array_send - - array_typanalyze d x f 0 -1 0 0 _null_ _null_ _null_ ));
+DATA(insert OID = 719 (  _circle   PGNSP PGUID	-1 f b A f t \054 0  718 719

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-10-06 Thread Ali Akbar

Thanks for the review. Attached the formatted patch according to your
suggestion.

- numeric datatype is large, but there are limitations. According to doc,
 the limit is: up to 131072 digits before the decimal point; up to 16383
 digits after the decimal point. How can we check if the next step
 overflows? As a comparison, in int.c, generate_series_step_int4 checks if
 its overflows, and stop the next call by setting step to 0. Should we do
 that?

 Yes we should.


how can we check the overflow after add_var?
(in int.c, the code checks for integer calculation overflow, that wraps the
result to negative value)

in numeric.sql regression test, i've added this query:
select (i / (10::numeric ^ 131071))::numeric(1,0)
from generate_series(-9*(10::numeric ^ 131071),
9*(10::numeric ^ 131071),
(10::numeric ^ 131071))
  as a(i);

Because the doc notes that the maximum numeric digit before decimal point
is 131072, i hope this query covers the overflow case (in the last value it
will generate, if we add 9 x 10^13071 with 10^13071, add_var will
overflows). But in my tests, that isn't the case. The code works without
any error and returns the correct rows:

 numeric
-
  -9
  -8
  -7
  -6
  -5
  -4
  -3
  -2
  -1
   0
   1
   2
   3
   4
   5
   6
   7
   8
   9
(19 rows)



Regards,
-- 
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 14074,14081  AND
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
--- 14074,14081 
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
***
*** 14084,14091  AND
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
--- 14084,14091 
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
***
*** 28,33 
--- 28,34 
  
  #include access/hash.h
  #include catalog/pg_type.h
+ #include funcapi.h
  #include libpq/pqformat.h
  #include miscadmin.h
  #include nodes/nodeFuncs.h
***
*** 261,266  typedef struct NumericVar
--- 262,279 
  
  
  /* --
+  * Data for generate series
+  * --
+  */
+ typedef struct
+ {
+ 	NumericVar	current;
+ 	NumericVar	finish;
+ 	NumericVar	step;
+ } generate_series_numeric_fctx;
+ 
+ 
+ /* --
   * Some preinitialized constants
   * --
   */
***
*** 1221,1226  numeric_floor(PG_FUNCTION_ARGS)
--- 1234,1346 
  	PG_RETURN_NUMERIC(res);
  }
  
+ 
+ /*
+  * generate_series_numeric() -
+  *
+  *  Generate series of numeric.
+  */
+ Datum
+ generate_series_numeric(PG_FUNCTION_ARGS) 
+ {
+ 	return generate_series_step_numeric(fcinfo);
+ }
+ 
+ Datum
+ generate_series_step_numeric(PG_FUNCTION_ARGS)
+ {
+ 	generate_series_numeric_fctx *fctx;
+ 	FuncCallContext *funcctx;
+ 	MemoryContext oldcontext;
+ 
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		Numeric		start_num = PG_GETARG_NUMERIC(0);
+ 		Numeric		finish_num = PG_GETARG_NUMERIC(1);
+ 		NumericVar	steploc = const_one;
+ 
+ 		/* Handle NaN in start  finish */
+ 		if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num)) 
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 	 errmsg(start and finish cannot be NaN)));
+ 
+ 		/* see if we

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-10-06 Thread Ali Akbar

 + select * from generate_series(0.1::numeric, 10.0::numeric, 0.1::numeric);
 +  generate_series
 + -
 +  0.1
 ...
 + 10.0
 + (100 rows)

 Unless there is a good reason, can you please keep individual test
 output fewer than 100 lines? I think the 41-line result is an overkill
 as well. This just bloats the repository size unnecessarily.


Okay. In this revision i cut the tests' result except the last one (the one
that tests values just before numeric overflow).


 Also, I see that this patch was added to the 2014-10 commitfest and
 then deleted again (by user apaan). Why?


User apaan is me. When i added to the commitfest, the patch is listed there
by me (apaan). I only added some bits from original patch by Платон Малюгин
that was revised further by Michael Paquier. So i think they should have
the credits for this patch, not me.
In this situation, what should i do?

Thanks for the review Manti!

Regards,
-- 
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 14074,14081  AND
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
--- 14074,14081 
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
***
*** 14084,14091  AND
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
--- 14084,14091 
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
***
*** 28,33 
--- 28,34 
  
  #include access/hash.h
  #include catalog/pg_type.h
+ #include funcapi.h
  #include libpq/pqformat.h
  #include miscadmin.h
  #include nodes/nodeFuncs.h
***
*** 261,266  typedef struct NumericVar
--- 262,279 
  
  
  /* --
+  * Data for generate series
+  * --
+  */
+ typedef struct
+ {
+ 	NumericVar	current;
+ 	NumericVar	finish;
+ 	NumericVar	step;
+ } generate_series_numeric_fctx;
+ 
+ 
+ /* --
   * Some preinitialized constants
   * --
   */
***
*** 1221,1226  numeric_floor(PG_FUNCTION_ARGS)
--- 1234,1346 
  	PG_RETURN_NUMERIC(res);
  }
  
+ 
+ /*
+  * generate_series_numeric() -
+  *
+  *  Generate series of numeric.
+  */
+ Datum
+ generate_series_numeric(PG_FUNCTION_ARGS) 
+ {
+ 	return generate_series_step_numeric(fcinfo);
+ }
+ 
+ Datum
+ generate_series_step_numeric(PG_FUNCTION_ARGS)
+ {
+ 	generate_series_numeric_fctx *fctx;
+ 	FuncCallContext *funcctx;
+ 	MemoryContext oldcontext;
+ 
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		Numeric		start_num = PG_GETARG_NUMERIC(0);
+ 		Numeric		finish_num = PG_GETARG_NUMERIC(1);
+ 		NumericVar	steploc = const_one;
+ 
+ 		/* Handle NaN in start  finish */
+ 		if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num)) 
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 	 errmsg(start and finish cannot be NaN)));
+ 
+ 		/* see if we were given an explicit step size */
+ 		if (PG_NARGS() == 3) 
+ 		{
+ 			Numeric	step_num = PG_GETARG_NUMERIC(2);
+ 
+ 			if (NUMERIC_IS_NAN(step_num)) 
+ ereport(ERROR,
+ 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 		 errmsg(step size cannot be NaN)));
+ 
+ 			init_var_from_num(step_num, steploc);
+ 
+ 			if (cmp_var(steploc, const_zero) == 0)
+ ereport(ERROR,
+ 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 		 errmsg(step size cannot equal zero

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-10-06 Thread Ali Akbar

2014-10-06 22:51 GMT+07:00 Marti Raudsepp ma...@juffo.org:

 That's fine I think, it's just for tracking who made the changes in
 the CommitFest app. What actually matters is what you write in the
 Author field, which could contain all 3 names separated by commas.


Ok. Added to commitfest:
https://commitfest.postgresql.org/action/patch_view?id=1591


  the one that tests values just before numeric overflow

 Actually I don't know if that's too useful. I think you should add a
 test case that causes an error to be thrown.


Actually i added the test case because in the code, when adding step into
current for the last value, i expected it to overflow:

/* increment current in preparation for next iteration */
add_var(fctx-current, fctx-step, fctx-current);

where in the last calculation, current is 9 * 10^131071. Plus 10^131071, it
will be 10^131072, which i expected to overflow numeric type (in the doc,
numeric's range is up to 131072 digits before the decimal point).

In attached patch, i narrowed the test case to produce smaller result.

Also, I noticed that there are a few trailing spaces in the patch that
 should be removed:

 +generate_series_numeric(PG_FUNCTION_ARGS)
 ...
 +   if (NUMERIC_IS_NAN(start_num) ||
 NUMERIC_IS_NAN(finish_num))
 ...
 +   if (PG_NARGS() == 3)
 ...
 +   if (NUMERIC_IS_NAN(step_num))


Ah, didn't see it. Thanks. Fixed in this patch.


Regards,
-- 
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 14074,14081  AND
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
--- 14074,14081 
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
***
*** 14084,14091  AND
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
--- 14084,14091 
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
***
*** 28,33 
--- 28,34 
  
  #include access/hash.h
  #include catalog/pg_type.h
+ #include funcapi.h
  #include libpq/pqformat.h
  #include miscadmin.h
  #include nodes/nodeFuncs.h
***
*** 261,266  typedef struct NumericVar
--- 262,279 
  
  
  /* --
+  * Data for generate series
+  * --
+  */
+ typedef struct
+ {
+ 	NumericVar	current;
+ 	NumericVar	finish;
+ 	NumericVar	step;
+ } generate_series_numeric_fctx;
+ 
+ 
+ /* --
   * Some preinitialized constants
   * --
   */
***
*** 1221,1226  numeric_floor(PG_FUNCTION_ARGS)
--- 1234,1346 
  	PG_RETURN_NUMERIC(res);
  }
  
+ 
+ /*
+  * generate_series_numeric() -
+  *
+  *  Generate series of numeric.
+  */
+ Datum
+ generate_series_numeric(PG_FUNCTION_ARGS)
+ {
+ 	return generate_series_step_numeric(fcinfo);
+ }
+ 
+ Datum
+ generate_series_step_numeric(PG_FUNCTION_ARGS)
+ {
+ 	generate_series_numeric_fctx *fctx;
+ 	FuncCallContext *funcctx;
+ 	MemoryContext oldcontext;
+ 
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		Numeric		start_num = PG_GETARG_NUMERIC(0);
+ 		Numeric		finish_num = PG_GETARG_NUMERIC(1);
+ 		NumericVar	steploc = const_one;
+ 
+ 		/* Handle NaN in start  finish */
+ 		if (NUMERIC_IS_NAN(start_num) || NUMERIC_IS_NAN(finish_num))
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 	 errmsg(start and finish cannot be NaN)));
+ 
+ 		/* see if we were

Re: [HACKERS] Last Commitfest patches waiting review

2014-10-06 Thread Ali Akbar

2014-10-03 23:14 GMT+07:00 Heikki Linnakangas hlinnakan...@vmware.com:

 Fix xpath() to return namespace definitions (fixes the issue with nested
 or repeated xpath())

   Peter, you've done some XML stuff before; could you have a look at this
 too?


I am the author of the patch. I've sent Peter off-the-list review request
email as you had suggested before, but he didn't respond.

How can i help to ease the review? The last patch is re-implementation, as
per Tom Lane's findings about xmlNodeCopy's behavior if there's not enough
memory. It turns out that the reimplementation is not as simple as before
(because reimplement some of xmlNodeCopy code must be reimplemented here).

Reviewing the patch myself, i've  found some code formatting problems. Will
fix and post in the patch's thread.

Regards,
-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-10-06 Thread Ali Akbar

While reviewing the patch myself, i spotted some formatting problems in the
code. Fixed in this v5 patch.

Also, this patch uses context patch format (in first versions, because of
the small modification, context patch format obfucates the changes. After
reimplementation this isn't the case anymore)

Thanks,
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3594,3620  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
  
  #ifdef USE_LIBXML
  
  /*
   * Convert XML node to text (dump subtree in case of element,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur-type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
  			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
--- 3596,3706 
  
  #ifdef USE_LIBXML
  
+ /* check ns definition of node and its childrens. If any one of ns is
+  * not defined in node and it's children, but in the node's parent,
+  * copy the definition to node.
+  */
+ static void
+ xml_checkandcopyns(xmlNodePtr root,
+    PgXmlErrorContext *xmlerrcxt,
+    xmlNodePtr node,
+    xmlNsPtr lastns_before)
+ {
+ 	xmlNsPtr ns = NULL;
+ 	xmlNsPtr cur_ns;
+ 	xmlNodePtr cur;
+ 
+ 	if (node-ns != NULL)
+ 	{
+ 		/* check in new nses */
+ 		cur_ns = lastns_before == NULL ? node-nsDef : lastns_before-next;
+ 		while (cur_ns != NULL)
+ 		{
+ 			if (cur_ns-href != NULL)
+ 			{
+ if (((cur_ns-prefix == NULL)  (node-ns-prefix == NULL)) ||
+ 	((cur_ns-prefix != NULL)  (node-ns-prefix != NULL) 
+ 	 xmlStrEqual(cur_ns-prefix, node-ns-prefix)))
+ {
+ 	ns = cur_ns;
+ 	break;
+ }
+ 			}
+ 			cur_ns = cur_ns-next;
+ 		}
+ 		if (ns == NULL) /* not in new nses */
+ 		{
+ 			ns = xmlSearchNs(NULL, node-parent, node-ns-prefix);
+ 
+ 			if (ns != NULL)
+ 			{
+ ns = xmlNewNs(root, ns-href, ns-prefix);
+ 
+ if (ns == NULL  xmlerrcxt-err_occurred)
+ 	xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		could not allocate xmlNs);
+ 			}
+ 		}
+ 	}
+ 	/* check and copy ns for children recursively */
+ 	cur = node-children;
+ 	while (cur != NULL)
+ 	{
+ 		xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before);
+ 		cur = cur-next;
+ 	}
+ }
+ 
  /*
   * Convert XML node to text (dump subtree in case of element,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
  {
  	xmltype*result;
  
  	if (cur-type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
+ 		xmlNsPtr lastns_before;
+ 		xmlNsPtr ns;
+ 		xmlNsPtr next;
  
  		buf = xmlBufferCreate();
+ 
  		PG_TRY();
  		{
+ 			lastns_before = cur-nsDef;
+ 			if (lastns_before != NULL)
+ 			{
+ while (lastns_before-next != NULL)
+ 	lastns_before = lastns_before-next;
+ 			}
+ 			xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before);
+ 
  			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
+ 
+ 			/* delete and free new nses */
+ 			ns = lastns_before == NULL ? cur-nsDef : lastns_before-next;
+ 			while (ns != NULL)
+ 			{
+ next = ns-next;
+ xmlFree(ns);
+ ns = next;
+ 			}
+ 
+ 			if (lastns_before == NULL)
+ cur-nsDef = NULL;
+ 			else
+ lastns_before-next = NULL;
  		}
  		PG_CATCH();
  		{
+ 			/* new namespaces will be freed while free-ing the node, so we
+ 			 * won't free it here
+ 			 */
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
***
*** 3660,3666  xml_xmlnodetoxmltype(xmlNodePtr cur)
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate)
  {
  	int			result = 0;
  	Datum		datum;
--- 3746,3753 
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-10-05 Thread Ali Akbar

Hi
Oops, it seems that I have been too hasty here. With a fresh mind I looked
at my own patch again and found two bugs:


 - Incorrect calculation of each step's value, making stuff crash, it is
 necessary to switch to the context of the function to perform operations on
 a temporary variable first


- i think you can use the fctx-current variable without temporary variable
(there's comment in the add_var function: Full version of add functionality
on variable level (handling signs). result might point to one of the
operands too without danger.). But you _must_ switch the context first
because add_var will allocate new array for the data and freeing the old
one.

- numeric can be NaN. We must reject it as first, finish and last parameter.
- numeric datatype is large, but there are limitations. According to doc,
the limit is: up to 131072 digits before the decimal point; up to 16383
digits after the decimal point. How can we check if the next step
overflows? As a comparison, in int.c, generate_series_step_int4 checks if
its overflows, and stop the next call by setting step to 0. Should we do
that?

~ will try to fix the patch later

-- 
Ali Akbar

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-10-05 Thread Ali Akbar

2014-10-05 15:21 GMT+07:00 Ali Akbar the.ap...@gmail.com:

 Hi
 Oops, it seems that I have been too hasty here. With a fresh mind I looked
 at my own patch again and found two bugs:


 - Incorrect calculation of each step's value, making stuff crash, it is
 necessary to switch to the context of the function to perform operations on
 a temporary variable first


 - i think you can use the fctx-current variable without temporary
 variable (there's comment in the add_var function: Full version of add
 functionality on variable level (handling signs). result might point to one
 of the operands too without danger.). But you _must_ switch the context
 first because add_var will allocate new array for the data and freeing the
 old one.

 - numeric can be NaN. We must reject it as first, finish and last
 parameter.
 - numeric datatype is large, but there are limitations. According to doc,
 the limit is: up to 131072 digits before the decimal point; up to 16383
 digits after the decimal point. How can we check if the next step
 overflows? As a comparison, in int.c, generate_series_step_int4 checks if
 its overflows, and stop the next call by setting step to 0. Should we do
 that?

 ~ will try to fix the patch later

attached the patch. Not checking if it overflows, but testing it with 9 *
10 ^ 131072 works (not resulting in an error). Other modifications:
- doc update
- regresssion tests
- while testing regression test, opr_sanity checks that the
generate_series_numeric function is used twice (once for 2 parameter and
once for the 3 parameter function), so i changed the name to
generate_series_step_numeric and created new function
generate_series_numeric that calls generate_series_step_numeric

-- 
Ali Akbar
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 14074,14081  AND
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
--- 14074,14081 
  tbody
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type, or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of one
***
*** 14084,14091  AND
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type or typebigint/type/entry
!   entrytypesetof int/type or typesetof bigint/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
--- 14084,14091 
  
   row
entryliteralfunctiongenerate_series(parameterstart/parameter, parameterstop/parameter, parameterstep/parameter)/function/literal/entry
!   entrytypeint/type, typebigint/type or typenumeric/type/entry
!   entrytypesetof int/type, typesetof bigint/type or typesetof numeric/type (same as argument type)/entry
entry
 Generate a series of values, from parameterstart/parameter to parameterstop/parameter
 with a step size of parameterstep/parameter
*** a/src/backend/utils/adt/numeric.c
--- b/src/backend/utils/adt/numeric.c
***
*** 28,33 
--- 28,34 
  
  #include access/hash.h
  #include catalog/pg_type.h
+ #include funcapi.h
  #include libpq/pqformat.h
  #include miscadmin.h
  #include nodes/nodeFuncs.h
***
*** 261,266  typedef struct NumericVar
--- 262,279 
  
  
  /* --
+  * Data for generate series
+  * --
+  */
+ typedef struct
+ {
+ 	NumericVar	current;
+ 	NumericVar	finish;
+ 	NumericVar	step;
+ } generate_series_numeric_fctx;
+ 
+ 
+ /* --
   * Some preinitialized constants
   * --
   */
***
*** 1221,1226  numeric_floor(PG_FUNCTION_ARGS)
--- 1234,1346 
  	PG_RETURN_NUMERIC(res);
  }
  
+ 
+ /*
+  * generate_series_numeric() -
+  *
+  *  Generate series of numeric.
+  */
+ Datum
+ generate_series_numeric(PG_FUNCTION_ARGS) {
+ 	return generate_series_step_numeric(fcinfo);
+ }
+ 
+ Datum
+ generate_series_step_numeric(PG_FUNCTION_ARGS)
+ {
+ 	generate_series_numeric_fctx *fctx;
+ 	FuncCallContext *funcctx;
+ 	MemoryContext oldcontext;
+ 
+ 	if (SRF_IS_FIRSTCALL())
+ 	{
+ 		Numeric		start_num = PG_GETARG_NUMERIC(0);
+ 		Numeric		finish_num = PG_GETARG_NUMERIC(1);
+ 		NumericVar

Re: [HACKERS] Add generate_series(numeric, numeric)

2014-10-05 Thread Ali Akbar

Also, Платон Малюгин, can you add this patch to postgresql commitfest (
http://commitfest.postgresql.org)?

--
Ali Akbar

Re: [HACKERS] [PATCH] empty xml values

2014-08-30 Thread Ali Akbar


 While looking into this report
 http://www.postgresql.org/message-id/cf48ccfb.65a9d%tim.k...@gmail.com I
 noticed that we don't accept empty values as xml content values, even
 though this should apparently be allowed by the spec.  Attached is a
 patch to fix it (which needs updates to xml_1.out, which I omit here for
 simplicity).


The patch works, albeit must be applied with --ignore-whitespace. Attached
the patch + xml_1.out updates.

I'm marking this ready for commit
-- 
Ali Akbar
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..7abe215 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -1400,11 +1400,14 @@ xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
 			doc-encoding = xmlStrdup((const xmlChar *) UTF-8);
 			doc-standalone = standalone;
 
-			res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
-   utf8string + count, NULL);
-			if (res_code != 0 || xmlerrcxt-err_occurred)
-xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_CONTENT,
-			invalid XML content);
+			if (*(utf8string + count))
+			{
+res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
+	   utf8string + count, NULL);
+if (res_code != 0 || xmlerrcxt-err_occurred)
+	xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_CONTENT,
+invalid XML content);
+			}
 		}
 	}
 	PG_CATCH();
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..6e6c673 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -194,6 +194,18 @@ SELECT xmlelement(name foo, xmlattributes( as funny, xml 'ba/r' as fun
  foo funny=lt;gt;amp;quot;' funnier=blt;a/gt;r/
 (1 row)
 
+SELECT xmlparse(content '');
+ xmlparse 
+--
+ 
+(1 row)
+
+SELECT xmlparse(content '  ');
+ xmlparse 
+--
+   
+(1 row)
+
 SELECT xmlparse(content 'abc');
  xmlparse 
 --
@@ -251,6 +263,22 @@ SELECT xmlparse(content 'nosuchprefix:tag/');
  nosuchprefix:tag/
 (1 row)
 
+SELECT xmlparse(document '');
+ERROR:  invalid XML document
+DETAIL:  line 1: switching encoding : no input
+
+^
+line 1: Document is empty
+
+^
+line 1: Start tag expected, '' not found
+
+^
+SELECT xmlparse(document '   ');
+ERROR:  invalid XML document
+DETAIL:  line 1: Start tag expected, '' not found
+   
+   ^
 SELECT xmlparse(document 'abc');
 ERROR:  invalid XML document
 DETAIL:  line 1: Start tag expected, '' not found
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..b0e0067 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -164,6 +164,14 @@ SELECT xmlelement(name foo, xmlattributes( as funny, xml 'ba/r' as fun
 ERROR:  unsupported XML feature
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xmlparse(content '');
+ERROR:  unsupported XML feature
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xmlparse(content '  ');
+ERROR:  unsupported XML feature
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 SELECT xmlparse(content 'abc');
 ERROR:  unsupported XML feature
 DETAIL:  This functionality requires the server to be built with libxml support.
@@ -196,6 +204,14 @@ SELECT xmlparse(content 'nosuchprefix:tag/');
 ERROR:  unsupported XML feature
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xmlparse(document '');
+ERROR:  unsupported XML feature
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xmlparse(document '   ');
+ERROR:  unsupported XML feature
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 SELECT xmlparse(document 'abc');
 ERROR:  unsupported XML feature
 DETAIL:  This functionality requires the server to be built with libxml support.
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..922ab7a 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -60,6 +60,8 @@ SELECT xmlelement(name foo, xmlattributes('infinity'::timestamp as bar));
 SELECT xmlelement(name foo, xmlattributes( as funny, xml 'ba/r' as funnier));
 
 
+SELECT xmlparse(content '');
+SELECT xmlparse(content '  ');
 SELECT xmlparse(content 'abc');
 SELECT xmlparse(content 'abcx/abc');
 SELECT xmlparse(content 'invalidentity/invalidentity');
@@ -69,6 +71,8 @@ SELECT xmlparse(content 'relativens xmlns=''relative''/');
 SELECT xmlparse(content

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-08-29 Thread Ali Akbar

Greetings,

Because of the memory bug in xmlCopyNode, this is a new patch with
different method. In this patch, instead of using xmlCopyNode to bring the
namespace back, we added the required namespaces to the  node before
dumping the node to string, and cleaning it up afterwards (because the same
node could be dumped again in next xpath result).

Thanks,
Ali Akbar


2014-07-11 15:36 GMT+07:00 Ali Akbar the.ap...@gmail.com:

 Greetings,

 Attached modified patch to handle xmlCopyNode returning NULL. The patch is
 larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
 is needed for calling xml_ereport).

 From libxml2 source, it turns out that the other cases that xmlCopyNode
 will return NULL will not happen. So in this patch i assume that the only
 case is memory exhaustion.

 But i found some bug in libxml2's code, because we call xmlCopyNode with 1
 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
 recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
 check the return of xmlStaticCopyNode whether it's NULL. So if someday the
 recursion returns NULL (maybe because of memory exhaustion), it will
 SEGFAULT.

 Knowing this but in libxml2 code, is this patch is still acceptable?

 Thanks,
 Ali Akbar




-- 
Ali Akbar
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..f34c87e 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -141,9 +141,11 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
 			   pg_enc encoding, int standalone);
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
 		  bool preserve_whitespace, int encoding);
-static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
+static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
+	   PgXmlErrorContext *xmlerrcxt);
 static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
-	   ArrayBuildState **astate);
+	   ArrayBuildState **astate,
+	   PgXmlErrorContext *xmlerrcxt);
 #endif   /* USE_LIBXML */
 
 static StringInfo query_to_xml_internal(const char *query, char *tablename,
@@ -3590,27 +3592,109 @@ SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
 
 #ifdef USE_LIBXML
 
+/* check ns definition of node and its childrens. If any one of ns is
+ * not defined in node and it's children, but in the node's parent,
+ * copy the definition to node.
+ */
+static void
+xml_checkandcopyns(xmlNodePtr root,
+   PgXmlErrorContext *xmlerrcxt,
+   xmlNodePtr node,
+   xmlNsPtr lastns_before)
+{
+	xmlNsPtr ns = NULL;
+	xmlNsPtr cur_ns;
+	xmlNodePtr cur;
+
+	if (node-ns != NULL)
+	{
+		/* check in new nses */
+		cur_ns = lastns_before == NULL ? node-nsDef : lastns_before-next;
+		while (cur_ns != NULL)
+		{
+			if (cur_ns-href != NULL)
+			{
+if (((cur_ns-prefix == NULL)  (node-ns-prefix == NULL)) ||
+	((cur_ns-prefix != NULL)  (node-ns-prefix != NULL) 
+	 xmlStrEqual(cur_ns-prefix, node-ns-prefix)))
+{
+	ns = cur_ns;
+	break;
+}
+			}
+			cur_ns = cur_ns-next;
+		}
+		if (ns == NULL) /* not in new nses */
+		{
+			ns = xmlSearchNs(NULL, node-parent, node-ns-prefix);
+
+			if (ns != NULL) {
+ns = xmlNewNs(root, ns-href, ns-prefix);
+
+if (ns == NULL  xmlerrcxt-err_occurred) {
+	xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+		could not allocate xmlNs);
+}
+			}
+		}
+	}
+	/* check and copy ns for children recursively */
+	cur = node-children;
+	while (cur != NULL) {
+		xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before);
+		cur = cur-next;
+	}
+}
+
 /*
  * Convert XML node to text (dump subtree in case of element,
  * return value otherwise)
  */
 static text *
-xml_xmlnodetoxmltype(xmlNodePtr cur)
+xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
 	xmltype*result;
 
 	if (cur-type == XML_ELEMENT_NODE)
 	{
 		xmlBufferPtr buf;
+		xmlNsPtr lastns_before;
+		xmlNsPtr ns;
+		xmlNsPtr next;
 
 		buf = xmlBufferCreate();
+
 		PG_TRY();
 		{
+			lastns_before = cur-nsDef;
+			if (lastns_before != NULL) {
+while (lastns_before-next != NULL) {
+	lastns_before = lastns_before-next;
+}
+			}
+			xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before);
+
 			xmlNodeDump(buf, NULL, cur, 0, 1);
 			result = xmlBuffer_to_xmltype(buf);
+
+			/* delete and free new nses */
+			ns = lastns_before == NULL ? cur-nsDef : lastns_before-next;
+			while (ns != NULL) {
+next = ns-next;
+xmlFree(ns);
+ns = next;
+			}
+			if (lastns_before == NULL) {
+cur-nsDef = NULL;
+			} else {
+lastns_before-next = NULL;
+			}
 		}
 		PG_CATCH();
 		{
+			/* new namespaces will be freed while free-ing the node, so we
+			 * won't free it here
+			 */
 			xmlBufferFree(buf);
 			PG_RE_THROW();
 		}
@@ -3656,7 +3740,8 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
  */
 static int
 xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
-	   ArrayBuildState **astate)
+	   ArrayBuildState

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-07-11 Thread Ali Akbar

Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3595,3620  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur-type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
  		xmlBufferFree(buf);
  	}
  	else
--- 3597,3636 
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
  {
  	xmltype*result;
  
  	if (cur-type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
+ 		xmlNodePtr cur_copy;
  
  		buf = xmlBufferCreate();
+ 
+ 		/* the result of xmlNodeDump won't contain namespace definitions
+ 		 * from parent nodes, but xmlCopyNode duplicates a node along
+ 		 * with its required namespace definitions.
+ 		 */
+ 		cur_copy = xmlCopyNode(cur, 1);
+ 
+ 		if (cur_copy == NULL)
+ 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		could not serialize xml);
+ 
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
+ 			xmlFreeNode(cur_copy);
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
+ 		xmlFreeNode(cur_copy);
  		xmlBufferFree(buf);
  	}
  	else
***
*** 3656,3662  xml_xmlnodetoxmltype(xmlNodePtr cur)
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate)
  {
  	int			result = 0;
  	Datum		datum;
--- 3672,3679 
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt)
  {
  	int			result = 0;
  	Datum		datum;
***
*** 3678,3684  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
  
  	for (i = 0; i  result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i]));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
--- 3695,3702 
  
  	for (i = 0; i  result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i],
! 	 xmlerrcxt));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
***
*** 3882,3890  xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
  		 * Extract the results as requested.
  		 */
  		if (res_nitems != NULL)
! 			*res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate);
  		else
! 			(void) xml_xpathobjtoxmlarray(xpathobj, astate);
  	}
  	PG_CATCH();
  	{
--- 3900,3908 
  		 * Extract the results as requested.
  		 */
  		if (res_nitems != NULL)
! 			*res_nitems

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-07-08 Thread Ali Akbar

 I don't know enough about XML/XPATH to know if this is a good idea or not,


Actually currently because of the namespace problem, xpath() returns wrong
result (even worse: invalid xml!). So this patch corrects the behavior of
xpath() to the correct one.


 but I did go read the documentation of xmlCopyNode(), and I notice it says
 the second argument is

 extended: if 1 do a recursive copy (properties, namespaces and children
 when applicable) if 2 copy properties and namespaces (when applicable)

 Do we really want it to copy children?  Perhaps the value should be 2?
 (I have no idea, I'm just raising the question.)


If we put 1 as the parameter, the resulting Node will link to the existing
children. In this case, the namespace problem for the children might be
still exists. If any of the children uses different namespace(s) than the
parent, the namespace definition will not be copied correctly.

Just to be safe, in the patch 1 passed as the second argument.

Ideally, we can traverse the Node object and generate XML accordingly, but
it is a lot of work, and a lot of duplicating libxml's code. Maybe we
should push this upstream to libxml?

I also notice that it says

 Returns: a new #xmlNodePtr, or NULL in case of error.

 and the patch is failing to cover the error case.  Most likely, that
 would represent out-of-memory, so it definitely seems to be something
 we should account for.


Ah, overlooked that.

Skimming through libxml2's source, it looks like there are some other cases
beside out-of-memory. Will update the patch according to these cases.

Thanks for the review.

-- 
Ali Akbar

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-06-22 Thread Ali Akbar

2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen a...@2ndquadrant.com:

 Thanks for the patch, and welcome to Postgres development.



 I can confirm that it works fine. I have attached here a very slightly
 tweaked version of the patch (removed trailing whitespace, and changed
 some comment text).

 I'm marking this ready for committer.


Thanks for the review. Hope i will be able to contribute a little here and
there in the future.

-- 
Ali Akbar

Re: [HACKERS] pivot aggregation with a patched intarray

2014-06-05 Thread Ali Akbar

2014-06-05 17:18 GMT+07:00 Marc Mamin m.ma...@intershop.de:
 I'm thinking about adding a final function to my aggregate that would replace 
 zero values will nulls,
 hence transforming the intarray into a standard int[], possibly with 
 nullbitmap and a lowerbound that can be  1.
 This will probably degrade the performance considerably, but may reduce the 
 size of the end result for spare data and not too small integers...
 Performances should greatly depend on the data distribution and order as they 
 influence the number of palloc.
 My first tests shown as well better and poorer results.

 My target is not to get better performances at the first place, but to get a 
 pivot structure in an early aggregation stage.

Usually for pivot, i use crosstab function from tablefunc
(http://www.postgresql.org/docs/9.4/static/tablefunc.html#AEN158550).
If your patch doesn't perform better, it's more easier to just use
crosstab. For storing it efficiently, the result can be transformed
into array manually.

PS: as Michael Paquier said above, its better if you could send the
patch in the .patch file format (see:
https://wiki.postgresql.org/wiki/Working_with_GIT).

-- 
Ali Akbar


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix xpath() to return namespace definitions

2014-06-04 Thread Ali Akbar

2014-06-05 2:37 GMT+07:00 Robert Haas robertmh...@gmail.com:

 Please add your patch here so we don't forget about it:

 https://commitfest.postgresql.org/action/commitfest_view/open


Added: https://commitfest.postgresql.org/action/patch_view?id=1461


 Please see also:

 https://wiki.postgresql.org/wiki/Submitting_a_Patch


Thanks, i've read it.
As for the suggestion there: Start out by reviewing a patch or responding
to email on the lists. Even if it is unrelated to what you're doing, i'll
look around other posts in this list.

For back versions, i think because this patch changes xpath() behavior, we
will only apply this to future versions. The old behavior is wrong
(according to XPath standard) for not including namespaces, but maybe there
are some application that depends on the old behavior.

Thanks!
--
Ali Akbar

Re: [HACKERS] pivot aggregation with a patched intarray

2014-06-04 Thread Ali Akbar

2014-06-01 20:48 GMT+07:00 Marc Mamin m.ma...@intershop.de:

 On Sat, May 31, 2014 at 12:31 AM, Marc Mamin m.ma...@intershop.de wrote:
  I have patched intarray with 3 additional functions in order to 
  count[distinct] event IDs
  into arrays, whereas the array position correspond to the integer values. 
  (mimic column oriented storage)
 
 I didn't look at the feature itself, but here are some comments about
 the format of the patch:
 - Be careful the newlines on the file you posted use ¥r¥n, which is
 purely Windows stuff... This will generate unnecessary diffs with the
 source code
 I don't mean to suggests this directly as a patch,
 I'm first interested to see if there are some interest
 for such an aggregation type.

From what i see, the icount_to_array is complementary to standard
count() aggregates, but it produces array. If the values are not
sparse, i think the performance and memory/storage benefit you
mentioned will be true. But if the values are sparse, there will be
many 0's, how it will perform?

I'm interested to benchmark it with some use cases, to confirm the
performance benefits of it.

--
Ali Akbar


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix xpath() to return namespace definitions

2014-05-30 Thread Ali Akbar

Hi all,

While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.

Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.

This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.

Note: This is my very first postgresql patch.

-- 
Ali Akbar
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..93e335c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
 	if (cur-type == XML_ELEMENT_NODE)
 	{
 		xmlBufferPtr buf;
+		xmlNodePtr cur_copy;
 
 		buf = xmlBufferCreate();
+
+		/* the result of xmlNodeDump won't contain namespace definitions, 
+		 * but libxml2 has xmlCopyNode that duplicates a node, along
+		 * with its required namespace definitions. 
+		 */
+		cur_copy = xmlCopyNode(cur, 1);
 		PG_TRY();
 		{
-			xmlNodeDump(buf, NULL, cur, 0, 1);
+			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
 			result = xmlBuffer_to_xmltype(buf);
 		}
 		PG_CATCH();
 		{
+			xmlFreeNode(cur_copy);
 			xmlBufferFree(buf);
 			PG_RE_THROW();
 		}
 		PG_END_TRY();
+		xmlFreeNode(cur_copy);
 		xmlBufferFree(buf);
 	}
 	else
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..a6d26f7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;loc
  {1,2}
 (1 row)
 
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath  
+
+ {local:piece xmlns:local=\http://127.0.0.1\; id=\1\number one/local:piece,local:piece xmlns:local=\http://127.0.0.1\; id=\2\/}
+(1 row)
+
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
   xpath  
 -
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..c7bcf91 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=ht...
 ^
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xpath('//loc:piece', 'local:data xmlns:local=http:/...
+^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
 ERROR:  unsupported XML feature
 LINE 1: SELECT xpath('//b', 'aone btwo/b three betc/b/a'...
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..241a5d6 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
 SELECT xpath('', '!-- error --');
 SELECT xpath('//text()', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data');
 SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
 SELECT xpath('//text()', 'rootlt;/root');
 SELECT xpath('//@value', 'root value=lt;/');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

55 matches

Mail list logo