Re: [HACKERS] hash partitioning based on v10Beta2

2017-08-28 Thread yang...@highgo.com
On Sat, Aug 26, 2017 at 10:10 AM, yang...@highgo.com 
wrote:

> Hi all,
>
> Now we have had the range / list partition, but hash partitioning is not
> implemented yet.
> Attached is a POC patch based on the v10Beta2 to add the
> hash partitioning feature.
> Although we will need more discussions about the syntax
> and other specifications before going ahead the project,
> but I think this runnable code might help to discuss
> what and how we implement this.
>
>
FYI, there is already an existing commitfest entry for this project.

https://commitfest.postgresql.org/14/1059/



> Description
>
> The hash partition's implement is on the basis of
> the original range / list partition,and using similar syntax.
>
> To create a partitioned table ,use:
>
> CREATE TABLE h (id int) PARTITION BY HASH(id);
>
> The partitioning key supports only one value, and I think
> the partition key can support multiple values,
> which may be difficult to implement when querying, but
> it is not impossible.
>
> A partition table can be create as bellow:
>
>  CREATE TABLE h1 PARTITION OF h;
>  CREATE TABLE h2 PARTITION OF h;
>  CREATE TABLE h3 PARTITION OF h;
>
> FOR VALUES clause cannot be used, and the partition bound
> is calclulated automatically as partition index of single integer value.
>
> An inserted record is stored in a partition whose index equals
> DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0],
> TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts
> /* Number of partitions */
> ;
> In the above example, this is DatumGetUInt32(OidFunctionCall1(lookup_type_
> cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;
>
> postgres=# insert into h select generate_series(1,20);
> INSERT 0 20
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id
> --+
>  h1   |  3
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h2   |  2
>  h2   |  6
>  h2   |  7
>  h2   | 11
>  h2   | 12
>  h2   | 14
>  h2   | 15
>  h2   | 18
>  h2   | 20
>  h3   |  1
>  h3   |  4
>  h3   |  8
>  h3   |  9
>  h3   | 10
>  h3   | 13
>  h3   | 16
> (20 rows)
>
> The number of partitions here can be dynamically added, and
> if a new partition is created, the number of partitions
> changes, the calculated target partitions will change,
>  and the same data is not reasonable in different
> partitions,So you need to re-calculate the existing data
> and insert the target partition when you create a new partition.
>
> postgres=# create table h4 partition of h;
> CREATE TABLE
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id
> --+
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h1   |  6
>  h1   | 12
>  h1   |  8
>  h1   | 13
>  h2   | 11
>  h2   | 14
>  h3   |  1
>  h3   |  9
>  h3   |  2
>  h3   | 15
>  h4   |  3
>  h4   |  7
>  h4   | 18
>  h4   | 20
>  h4   |  4
>  h4   | 10
>  h4   | 16
> (20 rows)
>
> When querying the data, the hash partition uses the same
> algorithm as the insertion, and filters out the table
> that does not need to be scanned.
>
> postgres=# explain analyze select * from h where id = 1;
>  QUERY PLAN
>
> 
> 
>  Append  (cost=0.00..41.88 rows=13 width=4) (actual time=
> 0.020..0.023 rows=1 loops=1)
>->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (
> actual time=0.013..0.016 rows=1 loops=1)
>  Filter: (id = 1)
>  Rows Removed by Filter: 3
>  Planning time: 0.346 ms
>  Execution time: 0.061 ms
> (6 rows)
>
> postgres=# explain analyze select * from h where id in (1,5);;
>  QUERY PLAN
>
> 
> 
>  Append  (cost=0.00..83.75 rows=52 width=4) (actual time=
> 0.016..0.028 rows=2 loops=1)
>->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (
> actual time=0.015..0.018 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 6
>->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (
> actual time=0.005..0.007 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 3
>  Planning time: 0.720 ms
>  Execution time: 0.074 ms
> (9 rows)
>

Re: [HACKERS] [POC] hash partitioning

2017-08-28 Thread yang...@highgo.com
Hello

Looking at your hash partitioning syntax, I implemented a hash partition in a 
more concise way, with no need to determine the number of sub-tables, and 
dynamically add partitions.

Description

The hash partition's implement is on the basis of the original range / list 
partition,and using similar syntax.

To create a partitioned table ,use:

CREATE TABLE h (id int) PARTITION BY HASH(id);

The partitioning key supports only one value, and I think the partition key can 
support multiple values, 
which may be difficult to implement when querying, but it is not impossible.

A partition table can be create as bellow:

 CREATE TABLE h1 PARTITION OF h;
 CREATE TABLE h2 PARTITION OF h;
 CREATE TABLE h3 PARTITION OF h;
 
FOR VALUES clause cannot be used, and the partition bound is calclulated 
automatically as partition index of single integer value.

An inserted record is stored in a partition whose index equals 
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts/* Number of partitions */
;
In the above example, this is 
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;

postgres=# insert into h select generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
 tableoid | id 
--+
 h1   |  3
 h1   |  5
 h1   | 17
 h1   | 19
 h2   |  2
 h2   |  6
 h2   |  7
 h2   | 11
 h2   | 12
 h2   | 14
 h2   | 15
 h2   | 18
 h2   | 20
 h3   |  1
 h3   |  4
 h3   |  8
 h3   |  9
 h3   | 10
 h3   | 13
 h3   | 16
(20 rows)

The number of partitions here can be dynamically added, and if a new partition 
is created, the number of partitions changes, the calculated target partitions 
will change, and the same data is not reasonable in different partitions,So you 
need to re-calculate the existing data and insert the target partition when you 
create a new partition.

postgres=# create table h4 partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
 tableoid | id 
--+
 h1   |  5
 h1   | 17
 h1   | 19
 h1   |  6
 h1   | 12
 h1   |  8
 h1   | 13
 h2   | 11
 h2   | 14
 h3   |  1
 h3   |  9
 h3   |  2
 h3   | 15
 h4   |  3
 h4   |  7
 h4   | 18
 h4   | 20
 h4   |  4
 h4   | 10
 h4   | 16
(20 rows)

When querying the data, the hash partition uses the same algorithm as the 
insertion, and filters out the table that does not need to be scanned.

postgres=# explain analyze select * from h where id = 1;
 QUERY PLAN 


 Append  (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 
loops=1)
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (actual 
time=0.013..0.016 rows=1 loops=1)
 Filter: (id = 1)
 Rows Removed by Filter: 3
 Planning time: 0.346 ms
 Execution time: 0.061 ms
(6 rows)

postgres=# explain analyze select * from h where id in (1,5);;
 QUERY PLAN 


 Append  (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 
loops=1)
   ->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (actual 
time=0.015..0.018 rows=1 loops=1)
 Filter: (id = ANY ('{1,5}'::integer[]))
 Rows Removed by Filter: 6
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (actual 
time=0.005..0.007 rows=1 loops=1)
 Filter: (id = ANY ('{1,5}'::integer[]))
 Rows Removed by Filter: 3
 Planning time: 0.720 ms
 Execution time: 0.074 ms
(9 rows)

postgres=# explain analyze select * from h where id = 1 or id = 5;;
 QUERY PLAN 


 Append  (cost=0.00..96.50 rows=50 width=4) (actual time=0.017..0.078 rows=2 
loops=1)
   ->  Seq Scan on h1  (cost=0.00..48.25 rows=25 width=4) (actual 
time=0.015..0.019 rows=1 loops=1)
 Filter: ((id = 1) OR (id = 5))
 Rows Removed by Filter: 6
   ->  Seq Scan on h3  (cost=0.00..48.25 rows=25 width=4) (actual 
time=0.005..0.010 rows=1 loops=1)
 Filter: ((id = 1) OR (id = 5))
 Rows Removed by Filter: 3
 Planning time: 0.396 ms
 Execution time: 0.139 ms
(9 rows)

Can not detach / attach / drop partition table.

Best regards,
young


yonj1e.github.io
yang...@highgo.com


[HACKERS] hash partitioning based on v10Beta2

2017-08-26 Thread yang...@highgo.com
Hi all,

Now we have had the range / list partition, but hash partitioning is not 
implemented yet. 
Attached is a POC patch based on the v10Beta2 to add the hash partitioning 
feature. 
Although we will need more discussions about the syntax and other 
specifications before going ahead the project, 
but I think this runnable code might help to discuss what and how we implement 
this.

Description

The hash partition's implement is on the basis of the original range / list 
partition,and using similar syntax.

To create a partitioned table ,use:

CREATE TABLE h (id int) PARTITION BY HASH(id);

The partitioning key supports only one value, and I think the partition key can 
support multiple values, 
which may be difficult to implement when querying, but it is not impossible.

A partition table can be create as bellow:

 CREATE TABLE h1 PARTITION OF h;
 CREATE TABLE h2 PARTITION OF h;
 CREATE TABLE h3 PARTITION OF h;
 
FOR VALUES clause cannot be used, and the partition bound is calclulated 
automatically as partition index of single integer value.

An inserted record is stored in a partition whose index equals 
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts /* Number of partitions */
;
In the above example, this is 
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], 
TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;

postgres=# insert into h select generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
 tableoid | id 
--+
 h1   |  3
 h1   |  5
 h1   | 17
 h1   | 19
 h2   |  2
 h2   |  6
 h2   |  7
 h2   | 11
 h2   | 12
 h2   | 14
 h2   | 15
 h2   | 18
 h2   | 20
 h3   |  1
 h3   |  4
 h3   |  8
 h3   |  9
 h3   | 10
 h3   | 13
 h3   | 16
(20 rows)

The number of partitions here can be dynamically added, and if a new partition 
is created, the number of partitions changes, the calculated target partitions 
will change, and the same data is not reasonable in different partitions,So you 
need to re-calculate the existing data and insert the target partition when you 
create a new partition.

postgres=# create table h4 partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
 tableoid | id 
--+
 h1   |  5
 h1   | 17
 h1   | 19
 h1   |  6
 h1   | 12
 h1   |  8
 h1   | 13
 h2   | 11
 h2   | 14
 h3   |  1
 h3   |  9
 h3   |  2
 h3   | 15
 h4   |  3
 h4   |  7
 h4   | 18
 h4   | 20
 h4   |  4
 h4   | 10
 h4   | 16
(20 rows)

When querying the data, the hash partition uses the same algorithm as the 
insertion, and filters out the table that does not need to be scanned.

postgres=# explain analyze select * from h where id = 1;
 QUERY PLAN 


 Append  (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 
loops=1)
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (actual 
time=0.013..0.016 rows=1 loops=1)
 Filter: (id = 1)
 Rows Removed by Filter: 3
 Planning time: 0.346 ms
 Execution time: 0.061 ms
(6 rows)

postgres=# explain analyze select * from h where id in (1,5);;
 QUERY PLAN 


 Append  (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 
loops=1)
   ->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (actual 
time=0.015..0.018 rows=1 loops=1)
 Filter: (id = ANY ('{1,5}'::integer[]))
 Rows Removed by Filter: 6
   ->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (actual 
time=0.005..0.007 rows=1 loops=1)
 Filter: (id = ANY ('{1,5}'::integer[]))
 Rows Removed by Filter: 3
 Planning time: 0.720 ms
 Execution time: 0.074 ms
(9 rows)

postgres=# explain analyze select * from h where id = 1 or id = 5;;
 QUERY PLAN 


 Append  (cost=0.00..96.50 rows=50 width=4) (actual time=0.017..0.078 rows=2 
loops=1)
   ->  Seq Scan on h1  (cost=0.00..48.25 rows=25 width=4) (actual 
time=0.015..0.019 rows=1 loops=1)
 Filter: ((id = 1) OR (id = 5))
 Rows Removed by Filter: 6
   ->  Seq Scan on h3  (cost=0.00..48.25 rows=25 width=4) (actual 
time=0.005..0.010 rows=1 loops=1)
 Filter: ((id = 1) OR (id = 5))
 Rows Removed by Filter: 3
 Planning time: 0.396 ms
 Execution time: 0.139 ms
(9 rows)

Can not detach / attach / drop partition table.

Be

Re: [HACKERS] Memory leak

2017-07-29 Thread Fan Yang
You are right. When I add those code, it really give me a strong bad smell.
It not worth these effort.

Thanks for your reply and suggestion.

--
Sincerely
Fan Yang


[HACKERS] Memory leak

2017-07-29 Thread fan yang
;
+escaped = escape_quotes(lc_ctype);
+bki_lines = replace_token(bki_lines, "LC_CTYPE", escaped);
+free(escaped);

 /*
  * Pass correct LC_xxx environment to bootstrap.
@@ -1391,13 +1411,16 @@ setup_auth(FILE *cmdfd)
 "REVOKE ALL on pg_authid FROM public;\n\n",
 NULL
 };
+char*escaped;

 for (line = pg_authid_setup; *line != NULL; line++)
 PG_CMD_PUTS(*line);

+escaped = escape_quotes(superuser_password);
 if (superuser_password)
 PG_CMD_PRINTF2("ALTER USER \"%s\" WITH PASSWORD E'%s';\n\n",
-   username, escape_quotes(superuser_password));
+   username, escaped);
+free(escaped);
 }

 /*
@@ -1582,14 +1605,18 @@ setup_sysviews(FILE *cmdfd)
 static void
 setup_description(FILE *cmdfd)
 {
+char   *escaped;
+
 PG_CMD_PUTS("CREATE TEMP TABLE tmp_pg_description ( "
 "objoid oid, "
 "classname name, "
 "objsubid int4, "
 "description text) WITHOUT OIDS;\n\n");

+escaped = escape_quotes(desc_file);
 PG_CMD_PRINTF1("COPY tmp_pg_description FROM E'%s';\n\n",
-   escape_quotes(desc_file));
+   escaped);
+free(escaped);

 PG_CMD_PUTS("INSERT INTO pg_description "
 " SELECT t.objoid, c.oid, t.objsubid, t.description "
@@ -1601,8 +1628,10 @@ setup_description(FILE *cmdfd)
 " classname name, "
 " description text) WITHOUT OIDS;\n\n");

+escaped = escape_quotes(shdesc_file);
 PG_CMD_PRINTF1("COPY tmp_pg_shdescription FROM E'%s';\n\n",
-   escape_quotes(shdesc_file));
+   escaped);
+free(escaped);

 PG_CMD_PUTS("INSERT INTO pg_shdescription "
 " SELECT t.objoid, c.oid, t.description "
@@ -1846,9 +1875,13 @@ setup_privileges(FILE *cmdfd)
 "srvacl IS NOT NULL;",
 NULL
 };
+char*escaped;

+escaped = escape_quotes(username);
 priv_lines = replace_token(privileges_setup, "$POSTGRES_SUPERUSERNAME",
-   escape_quotes(username));
+   escaped);
+free(escaped);
+
 for (line = priv_lines; *line != NULL; line++)
 PG_CMD_PUTS(*line);
 }
@@ -1889,6 +1922,7 @@ setup_schema(FILE *cmdfd)
 {
 char  **line;
 char  **lines;
+char   *escaped;

 lines = readfile(info_schema_file);

@@ -1905,11 +1939,13 @@ setup_schema(FILE *cmdfd)
"  WHERE implementation_info_name = 'DBMS
VERSION';\n\n",
infoversion);

+escaped = escape_quotes(features_file);
 PG_CMD_PRINTF1("COPY information_schema.sql_features "
"  (feature_id, feature_name, sub_feature_id, "
"  sub_feature_name, is_supported, comments) "
" FROM E'%s';\n\n",
-   escape_quotes(features_file));
+   escaped);
+free(escaped);
 }

 /*




Hope this patch can fix this problem.
--
Yang Fan


[HACKERS] `array_position...()` causes SIGSEGV

2016-12-08 Thread Junseok Yang
Hello hackers,

I met SIGSEGV when using `array_position()` with record type
arguments, so I've written a patch which corrects this problem. It
seems that `array_position...()` sets wrong memory context for the
cached function (in this case `record_eq()`) which is used to find a
matching element.

The problem is reproducable with the following query.

SELECT array_position(ids, (1, 1))
FROM (VALUES (ARRAY[(0, 0)]), (ARRAY[(1, 1)])) AS _(ids);
From 8868ae0050ec382bc1bae2b993742eb2a40bbb14 Mon Sep 17 00:00:00 2001
From: Junseok Yang 
Date: Thu, 8 Dec 2016 18:25:21 -0800
Subject: [PATCH] Fix memory context bugs in `array_position...()`

When `array_position()` is called with record type arguments, it uses
`record_eq()` to find a matching element. Before calling `record_eq()`,
it stores `FmgrInfo` of `record_eq()` to its `FmgrInfo` as extra data
in the context of `ecxt_per_query_memory`. However, it sets the context
of extra data for `FmgrInfo` of `record_eq()` to `CurrentMemoryContext`
which is `ecxt_per_tuple_memory`. And `record_eq()` also stores extra
data in the context set by `array_position()`. In this scenario, if
`array_position()` is called more than twice over tuples in a query,
the process for this session will be terminated with SIGSEGV because
the extra data for `record_eq()` should be already freed after the
first tuple was processed.

`array_positions()` has the same issue.
---
 src/backend/utils/adt/array_userfuncs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 8d6fa41..9eb678a 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -795,7 +795,8 @@ array_position_common(FunctionCallInfo fcinfo)
 	   format_type_be(element_type;
 
 		my_extra->element_type = element_type;
-		fmgr_info(typentry->eq_opr_finfo.fn_oid, &my_extra->proc);
+		fmgr_info_cxt(typentry->eq_opr_finfo.fn_oid, &my_extra->proc,
+	  fcinfo->flinfo->fn_mcxt);
 	}
 
 	/* Examine each array element until we find a match. */
@@ -933,7 +934,8 @@ array_positions(PG_FUNCTION_ARGS)
 	   format_type_be(element_type;
 
 		my_extra->element_type = element_type;
-		fmgr_info(typentry->eq_opr_finfo.fn_oid, &my_extra->proc);
+		fmgr_info_cxt(typentry->eq_opr_finfo.fn_oid, &my_extra->proc,
+	  fcinfo->flinfo->fn_mcxt);
 	}
 
 	/*
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "pg_upgrade" cannot write to log file pg_upgrade_internal.log

2015-12-15 Thread Yang, Leo
I run "pg_upgrade -xx" fromPostgreSQL8.3 to PostgreSQL9.4.5 on window server 
2008.  It will show some wrong information:

cannot write to log file pg_upgrade_internal.log
Failure, exiting.

It seems that the upgrade can be successful on window 7. It seems to be a 
permissions problem. I get some information about restricting privileges from 
source code, and I find the changes "Run pg_upgrade and pg_resetxlog with 
restricted privileges on Windows, so that they don't fail when run by an 
administrator (refer to 
http://www.postgresql.org/docs/9.4/static/release-9-4-2.html )"  for release 
9.4.2. Why did you do this? How can I solve this problem? I hope you can help 
me.

Best Regards,
Leo



Re: [HACKERS] CREATE POLICY and RETURNING

2015-10-05 Thread Zhaomo Yang
Stephen,

I just tried a little bit your patch for applying SELECT policies to
DELETE/UPDATE. It is consistent with the GRANT system so it looks
really good. I'll test it more thoroughly later.

Also, I guess we don't need to worry about the syntax of "restrictive
policies" you mentioned in the upthread since SELECT policies are
essentially restrictive now. Since that work has already been done,
I'm wondering if I can take the task of allowing policies to reference
both the 'old' and 'new' versions of the row. I understand that this
feature won't be considered for 9.5 but I'd like to implement it and
hopefully get it incorporated into 9.6.

Thanks,
Zhaomo


On Wed, Sep 23, 2015 at 11:54 AM, Stephen Frost  wrote:
> * Zhaomo Yang (zmp...@gmail.com) wrote:
>> > Just a side-note, but your mail client doesn't seem to get the quoting
>> > quite right sometimes, which can be confusing.  Not sure if there's
>> > anything you can do about it but wanted to let you know in case there
>> > is.
>>
>> Sorry about this. From now on I'll use the plain text mode for msgs I
>> send to the mailing list.
>> Please let me know if this happens also in this email.
>
> Looks like this one has all of the quoting correct- thanks!
>
>> > Regarding this, specifically, we'd need to first decide on what the
>> > syntax/grammar should be.
>>
>> I'll think about it. Also, thanks for the pointers.
>
> Sure, no problem.
>
>> > Right, and we adressed the concerns with RETURNING.  Regarding the
>> > non-RETURNING case, The same concerns about blind updates and deletes
>> > already exist with the GRANT permission system; it's not anything new.
>>
>> I think they are different. In the current GRANT permission system,
>> one can do blind updates but he
>> cannot refer to any existing values in either the expressions or the
>> condition if he doesn't have
>> SELECT privilege on the table (or the columns), thus the tricks like
>> divide-by-zero cannot be used and a malicious
>> user cannot get information out of blind updates.
>
> Ok, I see what you're getting at with that and I believe it'll be a
> pretty straight-forward change, thanks to Dean's recent rework.  I'll
> take a look at making that happens.
>
> Thanks!
>
> Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-23 Thread Zhaomo Yang
Stephen,

> Just a side-note, but your mail client doesn't seem to get the quoting
> quite right sometimes, which can be confusing.  Not sure if there's
> anything you can do about it but wanted to let you know in case there
> is.

Sorry about this. From now on I'll use the plain text mode for msgs I
send to the mailing list.
Please let me know if this happens also in this email.

> Regarding this, specifically, we'd need to first decide on what the
> syntax/grammar should be.

I'll think about it. Also, thanks for the pointers.

> Right, and we adressed the concerns with RETURNING.  Regarding the
> non-RETURNING case, The same concerns about blind updates and deletes
> already exist with the GRANT permission system; it's not anything new.

I think they are different. In the current GRANT permission system,
one can do blind updates but he
cannot refer to any existing values in either the expressions or the
condition if he doesn't have
SELECT privilege on the table (or the columns), thus the tricks like
divide-by-zero cannot be used and a malicious
user cannot get information out of blind updates.

Thanks,
Zhaomo


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-23 Thread Zhaomo Yang
Stephen,

It'd be great if others who are interested can help define the grammar
> changes necessary
> and perhaps even help with the code aspect of it.

I'd like to help on both. Can you elaborate a little bit more, especially
on the code aspect?

I don't buy that argument.

It is agreed that blind updates and deletes with RETURNING clause are
dangerous. It is quite similar here.
Instead of using
   BEGIN
   UPDATE-or-DELETE-with-RETURNING
   ROLLBACK
as a substitute for SELECT, a malicious user can do a binary search with
some trick like divide-by-zero
to figure out rows he is not allowed to access. Of course, this is not as
serious as RETURNING, but it is still quite convenient for attackers.

Thanks,
Zhaomo


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-16 Thread Zhaomo Yang
Stephen,

> I agree that if we force a single visibility policy for all commands
> then we wouldn't need the USING clauses for UPDATE and DELETE, but we
> would certainly need *some* policy for DELETE to prevent users from
> being able to delete records that they aren't supposed to be allowed to.
> Therefore, we'd just be replacing the USING policy with a 'WITH CHECK'
> policy, no?

If we force a single visibility policy (SELECT policy), then  we will
need a command-specific policy for each of UPDATE/DELETE/INSERT. A
command-specific policy may be  a writing policy (as for INSERT), a
reading policy (as for DELETE), or a hybrid policy (as for UPDATE).

For DELETE we can either combine the visibility policy (SELECT policy)
with the DELETE policy using AND and then scan the table, or just
attach the DELETE policy to the WHERE clause after the visibility
policy has been enforced. I don't see why we need to replace USING
policy with a "WITH CHECK".

BTW, what is the fundamental difference between a USING predicate and
a WITH CHECK predicate? Is it that which phase they are applied (read
or write)? Or is it that how they handle violations (nothing-happens
or error-out)?

> Removing the existing ability to control the visibility on a
> per-command basis is pretty clearly a reduction in the overall
> flexibility of the system without a clear gain to me.

I think there is a clear gain: security.

One interesting issue related to this discussion is that how
violations are handled. Now reading violations fail silently
(nothing-happens result) while writing violations cause errors
(throw-error result).

In the paper named "Extending Query Rewriting Techniques for
Fine-Grained Access Control" [1], Rizvi et al. added row level access
control to DBMSes using an interesting syntax: GRANT-WHERE. They added
a WHERE predicate to the SQL GRANT statement
to achieve row-level access control. Besides the interesting syntax,
they brought up the two possible models of handling violations in the
paper. One model is "nothing-happens" model (they call it Truman's
world model) and another is "error out" model (they call it Non-Truman
model). The authors discussed the pros and cons of both models: the
"nothing-happens" model is more secure since it leaks less information
but a user may get surprised  by the results; the "error-out" model
leaks information but may be more convenient when a user is debugging
his queries. I curious about our community's  take on this issue.


Thanks,
Zhaomo

[1] http://avid.cs.umass.edu/courses/645/s2006/645-paper5.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE POLICY and RETURNING

2015-09-11 Thread Zhaomo Yang
>
> I really don't like the approach you're suggesting above where an 'OR'
> inside of
> such a clause could mean that users can arbitrarly change any existing row
> without any further check on that row and I have a hard time seeing the
> use-case which justifies the additional complexity and user confusion.


I admit that I gave some bad examples in the previous email, and it is fair
to say
this (Being able to have something like NEW.value > 10 OR OLD.id = 1) is
not a advantage of what I proposed
before I can come up with any real-world examples.

So there would also be a SELECT policy anyway, which is just like the
> existing UPDATE USING policy is today and what you're really asking for
> is the ability to have the WITH CHECK policy reference both the OLD and
> NEW records.

Yes. Then we won't need any USING clauses for UPDATE/DELETE. For
UPDATE/DELETE, we only need
one predicate which can reference both OLD and NEW.

I might be able to get behind supporting that, but I'm not
> terribly excited about it and you've not provided any real use-cases for
> it that I've seen


I think that there are two major advantages:

1)
As many folks have pointed out in this and other threads, this will makes
information leakage less likely.
Now a permissive USING clause for UPDATE/DELETE can give an attacker chance
to read rows he
is not allowed to SELECT. Even without leaky functions, an attacker can
easily figure out the rows by doing a
binary search with tricks like division by zero.

2)
This proposal allows a user to reference both the OLD and NEW records in
the same clause. For example,
NEW.id == OLD.id , or NEW.value <= OLD.value + 10. I think this should be
useful for users since they may often
need to check the new value against the old one.


it still doesn't really change anything regarding
> RETURNING any differently than the earlier suggestions did about having
> the SELECT policy applied to all commands.


No, it doesn't. I proposed it here because there are some related
discussions (applying SELECT policy to other commands).

Thanks,
Zhaomo

On Tue, Aug 25, 2015 at 8:17 AM, Stephen Frost  wrote:

> Zhaomo,
>
> * Zhaomo Yang (zmp...@gmail.com) wrote:
> > > If no NEW or OLD is used, what happens?  Or would you have
> > > to always specify OLD/NEW for UPDATE, and then what about for the other
> > > policies, and the FOR ALL policies?
> >
> > I should be clearer with references to OLD/NEW. SELECT Predicates cannot
> > reference any of them.
> > INSERT predicates cannot refer to OLD and DELETE predicates cannot refer
> to
> > NEW. Basically,
> > for INSERT/UPDATE/DELETE, we specify predicates the same way as we do for
> > triggers' WHEN condition.
> >
> > As for FOR ALL, I think we will abandon it if we apply SELECT policy to
> > other commands, since SELECT predicate
> > will be the new universally applicable read policy, which makes the FOR
> ALL
> > USING clause much less useful. Of course users may need to specify
> separate
> > predicates for different commands, but I think it is fine. How often do
> > users want the same predicate for all the commands?
>
> I can certainly see use-cases where you'd want to apply the same policy
> to all new records, regardless of how they're being added, and further,
> the use-case where you want the same policy for records which are
> visible and those which are added.  In fact, I'd expect that to be one
> of the most common use-cases as it maps directly to a set of rows which
> are owned by one user, where that user can see/modify/delete their own
> records but not impact other users.
>
> So, I don't think it would be odd at all for users to want the same
> predicate for all of the commands.
>
> > > This could be accomplished with "USING (bar > 1)" and "WITH CHECK (foo
> >
> > > 1)", no?
> > > Your sentence above that "USING and WITH CHECK are combined by AND"
> > > isn't correct either- they're independent and are therefore really
> OR'd.
> > > If they were AND'd then the new record would have to pass both USING
> and
> > > WITH CHECK policies.
> >
> > No, it is impossible with the current implementation.
> >
> > CREATE TABLE test {
> >  id int,
> >  v1 int,
> >  v2 int
> > };
> >
> > Suppose that the user wants an update policy which is OLD.v1 > 10 OR
> NEW.v2
> > < 10.
> > As you suggested, we use the following policy
> >
> > CREATE update_p ON test
> > FOR UPDATE TO test_user
> > USING v1 > 10
> > WITH CHECK v2 < 10;
> >
> > (1) Assum

Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-15 Thread Zhaomo Yang
Stephen,

If no NEW or OLD is used, what happens?  Or would you have
> to always specify OLD/NEW for UPDATE, and then what about for the other
> policies, and the FOR ALL policies?


I should be clearer with references to OLD/NEW. SELECT Predicates cannot
reference any of them.
INSERT predicates cannot refer to OLD and DELETE predicates cannot refer to
NEW. Basically,
for INSERT/UPDATE/DELETE, we specify predicates the same way as we do for
triggers' WHEN condition.

As for FOR ALL, I think we will abandon it if we apply SELECT policy to
other commands, since SELECT predicate
will be the new universally applicable read policy, which makes the FOR ALL
USING clause much less useful. Of course users may need to specify separate
predicates for different commands, but I think it is fine. How often do
users want the same predicate for all the commands?

This could be accomplished with "USING (bar > 1)" and "WITH CHECK (foo >
> 1)", no?
> Your sentence above that "USING and WITH CHECK are combined by AND"
> isn't correct either- they're independent and are therefore really OR'd.
> If they were AND'd then the new record would have to pass both USING and
> WITH CHECK policies.


No, it is impossible with the current implementation.

CREATE TABLE test {
 id int,
 v1 int,
 v2 int
};

Suppose that the user wants an update policy which is OLD.v1 > 10 OR NEW.v2
< 10.
As you suggested, we use the following policy

CREATE update_p ON test
FOR UPDATE TO test_user
USING v1 > 10
WITH CHECK v2 < 10;

(1) Assume there is only one row in the table
id |  v1 | v2 |
1  | 11 | 20 |

Now we execute  UPDATE test SET v2 = 100.
this query is allowed by the policy and the only row should be updated
since v1's old value > 10, but will trigger an error because it violates
the WITH CHECK clause.

(2) Again assume there is only one row in the table
id |  v1 | v2 |
1  | 9 | 20 |

Now we execute  UPDATE test SET v2 = 7.
this query is allowed by the policy and the only row should be updated
since v2's new value < 10, nothing will be updated because the only row
will be filtered out before update happens.

This is why I said USING and WITH CHECK are combined by AND. In order to
update an row, first the row needs to be visible, which meaning it needs to
pass the USING check, then it needs to pass the WITH CHECK.

Further, I'm not sure that I see how this would work in a case where you
> have the SELECT policy (which clearly could only refer to OLD) applied
> first, as you suggest?


We use SELECT policy to filter the table when we scan it (just like how we
use USING clause now). The predicate of UPDATE will be checked later
(probably similar to how we handle trigger's WHEN clause which can also
reference OLD and NEW).

Thanks,
Zhaomo


Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang
In case you missed the link to the previous discussion at the bottom,
http://www.postgresql.org/message-id/CAHGQGwEqWD=ynqe+zojbpoxywt3xlk52-v_q9s+xofckjd5...@mail.gmail.com


Re: [HACKERS] CREATE POLICY and RETURNING

2015-08-10 Thread Zhaomo Yang
Hi,

This thread has a pretty thorough discussion of pros and cons of applying
SELECT policy to other commands. Besides what have been mentioned, I think
there is another potential pro: we can enable references to pseudorelations
OLD and NEW in predicates. Now, for UPDATE, the references to the target
table in USING clause are actually references to OLD and the references in
WITH CHECK clause are references to NEW. Logically now USING and WITH CHECK
are combined by AND, so we cannot have predicates like

  foo(NEW) > 1 OR bar(OLD) > 1   (combine predicates referencing OLD
and NEW by an operation other than AND)
  NEW.id <> OLD.id(reference both in the same expression)

If we apply SELECT policy to other commands, we only need one predicate for
INSERT/UPDATE/DELETE. That predicate can reference to OLD and NEW, just like
predicates for triggers and rules. For UPDATE and DELETE, the predicate of
SELECT will be applied first (when the target table is scanned) to ensure no
information leakage and their own predicate will be applied later. This
doesn't change much for INSERT and DELETE, but it gives users more
flexibility when they set predicates for UPDATE.

Thanks,
Zhaomo



--
View this message in context: 
http://postgresql.nabble.com/CREATE-POLICY-and-RETURNING-tp5823192p5861550.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread YANG

Hello,

I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was

with pg_strom
=

explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;


  QUERY PLAN
---
 Aggregate  (cost=190993.70..190993.71 rows=1 width=0) (actual 
time=18792.236..18792.236 rows=1 loops=1)
   ->  Custom Scan (GpuPreAgg)  (cost=7933.07..184161.18 rows=86 width=108) 
(actual time=4249.656..18792.074 rows=77 loops=1)
 Bulkload: On (density: 100.00%)
 Reduction: NoGroup
 Device Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < 
'10'::double precision)
 ->  Custom Scan (BulkScan) on t0  (cost=6933.07..182660.32 
rows=1060 width=0) (actual time=139.399..18499.246 rows=1000 loops=1)
 Planning time: 0.262 ms
 Execution time: 19268.650 ms
(8 rows)



explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;

QUERY PLAN
--
 HashAggregate  (cost=298541.48..298541.81 rows=26 width=12) (actual 
time=11311.568..11311.572 rows=26 loops=1)
   Group Key: t0.cat
   ->  Custom Scan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52) 
(actual time=3304.727..11310.021 rows=2307 loops=1)
 Bulkload: On (density: 100.00%)
 Reduction: Local + Global
 ->  Custom Scan (GpuJoin)  (cost=4178.82..248541.18 rows=1060 
width=12) (actual time=923.417..2661.113 rows=1000 loops=1)
   Bulkload: On (density: 100.00%)
   Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = 
aid), nrows_ratio: 1.
   ->  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60 
rows=1060 width=16) (actual time=6.980..871.431 rows=1000 loops=1)
   ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) 
(actual time=0.204..7.309 rows=4 loops=1)
 Planning time: 47.834 ms
 Execution time: 11355.103 ms
(12 rows)


without pg_strom


test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + 
(y-12.8)^2) < 10;
   
QUERY PLAN

 Aggregate  (cost=426193.03..426193.04 rows=1 width=0) (actual 
time=3880.379..3880.379 rows=1 loops=1)
   ->  Seq Scan on t0  (cost=0.00..417859.65 rows=353 width=0) (actual 
time=0.075..3859.200 rows=314063 loops=1)
 Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < 
'10'::double precision)
 Rows Removed by Filter: 9685937
 Planning time: 0.411 ms
 Execution time: 3880.445 ms
(6 rows)

t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
  QUERY PLAN
--
 HashAggregate  (cost=431593.73..431594.05 rows=26 width=12) (actual 
time=4960.810..4960.812 rows=26 loops=1)
   Group Key: t0.cat
   ->  Hash Join  (cost=1234.00..381593.43 rows=1060 width=12) (actual 
time=20.859..3367.510 rows=1000 loops=1)
 Hash Cond: (t0.aid = t1.aid)
 ->  Seq Scan on t0  (cost=0.00..242858.60 rows=1060 width=16) 
(actual time=0.021..895.908 rows=1000 loops=1)
 ->  Hash  (cost=734.00..734.00 rows=4 width=4) (actual 
time=20.567..20.567 rows=4 loops=1)
   Buckets: 65536  Batches: 1  Memory Usage: 1919kB
   ->  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) 
(actual time=0.017..11.013 rows=4 loops=1)
 Planning time: 0.567 ms
 Execution time: 4961.029 ms
(10 rows)



Here is the details how I installed pg_strom,

1. download postgresql 9.5alpha1 and compile it with

,
| ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert
| make -j8 all
| make install
`

2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)

3. download and compile pg_strom with pg_config in /export/pg-9.5/bin

,
| make
| make install
`


4. create a d

Re: [HACKERS] Implementation of global temporary tables?

2015-07-20 Thread Zhaomo Yang
>
> Just to be clear, the idea of a global temp table is that the table def
> is available to all users, but the data is private to each session?


The table def is visible to all sessions and persistent, but the data is
private to each session and temporary.

Thanks,
Zhaomo


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Zhaomo Yang
> Sounds fine in general. I'm a bit curious to know what are the locking
implications of > vivifying the table on access.

The locking implications depend on how we interpret the existing commands
in the context of global temp tables and I think we should discuss and
agree on the behaviors of the commands with global temp tables, but I think
in general we can follow these rules:

If the command executes on the global temp table's metadata, for example an
ALTER TABLE command, then we lock the global copy at the same level as we
do a regular table.

If the command executes on the global temp table's data (which is actually
stored in the session copy), for example an DML command, then the global
copy is locked at the AccessShareLock level to prevent concurrent
modifications to the global temp table's definition from other sessions.

Thanks,
Zhaomo

On Wed, Jul 15, 2015 at 4:26 AM, Andrew Dunstan  wrote:

>
> On 07/15/2015 07:58 AM, Simon Riggs wrote:
>
>
>> For me the design summary is this
>>
>> * CREATE GLOBAL TEMP TABLE creates catalog entries like a normal table
>> but with different relkind
>> * When we see a request to INSERT, DEL, UPD, SEL from the temp table, if
>> it does not exist we create it as a TEMP table of the same name, using the
>> Global's pg_class entry as a template
>>
>> That meets the SQL Standard and doesn't contain any visibility problems
>> or need for new internals.
>>
>> The purpose of this feature is to automatically create a temp table with
>> the same definition whenever needed. The discussion of "bloat" is just
>> wrong. We create exactly the same amount of bloat as if we had typed CREATE
>> TEMP TABLE. Optimising temp table entries in the catalog is another,
>> separate patch, if we care.
>>
>>
>>
> Sounds fine in general. I'm a bit curious to know what are the locking
> implications of vivifying the table on access.
>
> cheers
>
> andrew
>


Re: [HACKERS] Implementation of global temporary tables?

2015-07-15 Thread Zhaomo Yang
>  there is other question - what is effect of ALTER TABLE of global temp
table on
>  instances of this table in active sessions?

As I said, we need to first agree on the behaviors of the existing
commands. I can think of two options now for ALTER TABLE: 1) only allow
ALTER TABLE when there is no other active sessions (this is how Oracle
deals with it.) 2) handle it as if session copies inherit from the global
copy and ALTER TABLE executes on the global copy.

Thanks,
Zhaomo


Re: [HACKERS] Implementation of global temporary tables?

2015-07-09 Thread Zhaomo Yang
>  I am not sure, if it is not useless work.

I don't understand why an implementation taking approach 2.a would be
useless. As I said, its performance will be no worse than current temp
tables and it will provide a lot of convenience to users who need to create
temp tables in every session.

Thanks,
Zhaomo

On Tue, Jul 7, 2015 at 11:53 PM, Pavel Stehule 
wrote:

> Hi
>
>
> 2015-07-08 9:08 GMT+02:00 Zhaomo Yang :
>
>> >  more global temp tables are little bit comfortable for developers,
>> I'd like to emphasize this point. This feature does much more than saving
>> a developer from issuing a CREATE TEMP TABLE statement in every session.
>> Here are two common use cases and I'm sure there are more.
>>
>> (1)
>> Imagine in a web application scenario, a developer wants to cache some
>> session information in a temp table. What's more, he also wants to specify
>> some rules which reference the session information. Without this feature,
>> the rules will be removed at the end of every session since they depend on
>> a temporary object. Global temp tables will allow the developer to define
>> the temp table and the rules once.
>>
>> (2)
>> The second case is mentioned by Tom Lane back in 2010 in a thread about
>> global temp tables.
>> (http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us)
>> "The context that I've seen it come up in is that people don't want to
>> clutter their functions with
>>  create-it-if-it-doesn't-exist logic, which you have to have given the
>> current behavior of temp tables."
>>
>> >  2.a - using on demand created temp tables - most simple solution, but
>> >  doesn't help with catalogue bloating
>>
>> I've read the thread and people disapprove this approach because of the
>> potential catalog bloat. However, I'd like to champion it. Indeed, this
>> approach may have a bloat issue. But for users who needs global temp
>> tables, they now have to create a new temp table in every session, which
>> means they already have the bloat problem and presumably they have some
>> policy to deal with it. In other words, implementing global temp tables by
>> this approach gives users the same performance, plus the convenience the
>> feature brings.
>>
>> The root problem here is that whether "whether having the unoptimized
>> feature is better than
>> having no feature at all". Actually, there was a very similar discussion
>> back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom
>> Lane's arguments here.
>>
>> Kevin Grittner's argument:
>>
>> http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov
>> "... If you're saying we can implement the standard's global temporary
>> tables in a way that performs better than current temporary tables, that's
>> cool.  That would be a nice "bonus" in addition to the application
>> programmer convenience and having another tick-mark on the standards
>> compliance charts.  Do you think that's feasible?  If not, the feature
>> would be useful to some with the same performance that temporary tables
>> currently provide."
>>
>> Tom Lane's arguments:
>>
>> http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us
>> "I'm all for eliminating catalog overheads, if we can find a way to do
>> that.  I don't think that you get to veto implementation of the feature
>> until we can find a way to optimize it better.  The question is not about
>> whether having the optimization would be better than not having it --- it's
>> about whether having the unoptimized feature is better than having no
>> feature at all (which means people have to implement the same behavior by
>> hand, and they'll *still* not get the optimization)."
>>
>> There have been several threads here discussing global temp table since
>> 2007. Quite a few ideas aimed to avoid the bloat issue by not storing the
>> metadata of the session copy in the catalog. However, it seems that none of
>> them has been implemented, or even has a feasible design. So why don't we
>> implement it in a unoptimized way first?
>>
>
> I am not sure, if it is not useless work.
>
> Now, I am thinking so best implementation of global temp tables is
> enhancing unlogged tables to have local content. All local data can be
> saved in session memory. Usually it is less than 2KB with statistic, and
> you don't need to store it in catalogue. When anybody is working

Re: [HACKERS] Implementation of global temporary tables?

2015-07-08 Thread Zhaomo Yang
>  more global temp tables are little bit comfortable for developers,
I'd like to emphasize this point. This feature does much more than saving a
developer from issuing a CREATE TEMP TABLE statement in every session. Here
are two common use cases and I'm sure there are more.

(1)
Imagine in a web application scenario, a developer wants to cache some
session information in a temp table. What's more, he also wants to specify
some rules which reference the session information. Without this feature,
the rules will be removed at the end of every session since they depend on
a temporary object. Global temp tables will allow the developer to define
the temp table and the rules once.

(2)
The second case is mentioned by Tom Lane back in 2010 in a thread about
global temp tables.
(http://www.postgresql.org/message-id/9319.1272130...@sss.pgh.pa.us)
"The context that I've seen it come up in is that people don't want to
clutter their functions with
 create-it-if-it-doesn't-exist logic, which you have to have given the
current behavior of temp tables."

>  2.a - using on demand created temp tables - most simple solution, but
>  doesn't help with catalogue bloating

I've read the thread and people disapprove this approach because of the
potential catalog bloat. However, I'd like to champion it. Indeed, this
approach may have a bloat issue. But for users who needs global temp
tables, they now have to create a new temp table in every session, which
means they already have the bloat problem and presumably they have some
policy to deal with it. In other words, implementing global temp tables by
this approach gives users the same performance, plus the convenience the
feature brings.

The root problem here is that whether "whether having the unoptimized
feature is better than
having no feature at all". Actually, there was a very similar discussion
back in 2009 on global temp tables. Let me borrow Kevin Grittner's and Tom
Lane's arguments here.

Kevin Grittner's argument:

http://www.postgresql.org/message-id/49f82aea.ee98.002...@wicourts.gov
"... If you're saying we can implement the standard's global temporary
tables in a way that performs better than current temporary tables, that's
cool.  That would be a nice "bonus" in addition to the application
programmer convenience and having another tick-mark on the standards
compliance charts.  Do you think that's feasible?  If not, the feature
would be useful to some with the same performance that temporary tables
currently provide."

Tom Lane's arguments:

http://www.postgresql.org/message-id/24110.1241035...@sss.pgh.pa.us
"I'm all for eliminating catalog overheads, if we can find a way to do
that.  I don't think that you get to veto implementation of the feature
until we can find a way to optimize it better.  The question is not about
whether having the optimization would be better than not having it --- it's
about whether having the unoptimized feature is better than having no
feature at all (which means people have to implement the same behavior by
hand, and they'll *still* not get the optimization)."

There have been several threads here discussing global temp table since
2007. Quite a few ideas aimed to avoid the bloat issue by not storing the
metadata of the session copy in the catalog. However, it seems that none of
them has been implemented, or even has a feasible design. So why don't we
implement it in a unoptimized way first?

>  Is there still interest about this feature?
I'm very interested in this feature. I'm thinking about one implementation
which is similar to Pavel's 2009 proposal (
http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com).
Here are the major ideas of my design:

(1)
Creating the cross-session persistent schema as a regular table and
creating session-private temp tables when a session first accesses it.

(2)
For DML queries, The global temp table is overloaded by its session copy
after the relation is opened by an oid or a rangevar. For DDL queries,
which copy is used depends on whether the query needs to access the data or
metadata of the global temp table.

There are more differences between this design and Pavel's 2009 proposal
and I'd like to send a detailed proposal to the mailing list but first I
want to know if our community would accept a global temp table
implementation which provides the same performance as currently temp tables
do.

Thanks,
Zhaomo


Re: [HACKERS] A mechanism securing web applications in DBMS

2014-09-20 Thread Zhaomo Yang
Stephen,

> Yes- but that's pretty trivially done, given that you've stipulated that
> a single connection DB connection must be used from authentication until
> de-authentication.  All that is needed is an additional column in the
> auth table which is populated with a pseudo-random value which is
> guaranteed to be unique and constant for the duration of the
> authenticated time- and the database backend PID is perfect for that.
> The auth function can call the pg_backend_pid() function directly and
> then the policies can include a 'pid = pg_backend_pid()' as part of the
> join to the auth table.  The auth function can also complain loudly if
> an entry in the pid table is found with the current PID during auth (and
> similar- the de-auth function can complain if an entry with the current
> PID is *not* found).  This would eliminate the need for the on-connect
> triggers, I believe (though those are interesting for other reasons..).


You are right. Using unlogged table is a good idea. I'll try it out.
Thanks for your advice!

>  It'd be very interesting to see this done with the unlogged table,
> security definer functions, and the row-level policies patch which we're
> working on.  I'd further suggest that the application also use multiple
> roles which are set noinherit and 'set role' based on the operation
> which it's currently being used for- this would add another level of
> protection.  Using stored procedures (for more than just the auth and
> de-auth functions as suggested here) can also be a good idea.


Currently auth functions are security definer functions. I'm gonna try
to create a patch using unlogged table + RLS and put it online (e.g.
this mail list) so that people can try it.

Thanks,
Zhaomo

On Sat, Sep 13, 2014 at 4:00 PM, Stephen Frost  wrote:
> Zhaomo,
>
> * Zhaomo Yang (zhy...@cs.ucsd.edu) wrote:
>> > Have you considered just using a regular, but unlogged, table?  That
>> > would also avoid any risk that the application manages to drop or shadow
>> > the temp table somehow with a "fake" table that changes who is currently
>> > authenticated, and avoids having to figure out how to deal with the temp
>> > table vanishing due to the connections going away.
>>
>> So then all the currently logged in users will be stored in the same
>> table, which means we also need to make sure that the correct row in
>> that table is used when the row-level security policy refers to the
>> current application-level user.
>
> Yes- but that's pretty trivially done, given that you've stipulated that
> a single connection DB connection must be used from authentication until
> de-authentication.  All that is needed is an additional column in the
> auth table which is populated with a pseudo-random value which is
> guaranteed to be unique and constant for the duration of the
> authenticated time- and the database backend PID is perfect for that.
> The auth function can call the pg_backend_pid() function directly and
> then the policies can include a 'pid = pg_backend_pid()' as part of the
> join to the auth table.  The auth function can also complain loudly if
> an entry in the pid table is found with the current PID during auth (and
> similar- the de-auth function can complain if an entry with the current
> PID is *not* found).  This would eliminate the need for the on-connect
> triggers, I believe (though those are interesting for other reasons..).
>
>> Let me send you a copy of our paper in a separate email which is a
>> thorough description of the mechanism (including background, threat
>> model, how it works, etc), which should give you an better idea on
>> every aspect of the mechanism. Please do not distribute it because it
>> has been accepted for publication. Note that the implementation we
>> show in the paper is just a prototype (we made the changes so that we
>> could implement it quickly). Our goal always is to integrate our
>> mechanism into open source DBMS's like PG and MySQL cleanly.
>
> It'd be very interesting to see this done with the unlogged table,
> security definer functions, and the row-level policies patch which we're
> working on.  I'd further suggest that the application also use multiple
> roles which are set noinherit and 'set role' based on the operation
> which it's currently being used for- this would add another level of
> protection.  Using stored procedures (for more than just the auth and
> de-auth functions as suggested here) can also be a good idea.
>
> Thanks,
>
> Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A mechanism securing web applications in DBMS

2014-09-06 Thread Zhaomo Yang
Stephen,

> As an FYI- we generally prefer inline responses rather than top-posting on 
> the PostgreSQL mailing lists.  Thanks.

Sorry for that.

> > - Try to make our mechanism as simple as possible.
> > Web application developers have all kinds of backgrounds. If the
> > security mechanism is too alien to them, they wouldn't use it.
> I'm surprised to hear this and a suggestion to used stored procedures in
> the same email- SPs are generally considered 'foreign' to the web
> developers that I've talked to. :)  That said, I'll grant that there are
> generally two camps: those who expect a database to only have BerkleyDB
> level key/value capabilities, and those who know what they're doing and
> what relational databases and SQL are all about.  The latter (and clear
> minority) group will take advantage of these capabilites, certainly,
> regardless of how they are expressed and are likely already comfortable
> using stored procedures and database-level roles.

I am surprised to hear this too. :) We haven't talked to many web
developers yet and this is one of the things we need to do in the
future.

The goal of this mechanism is to add another layer of protection
inside DBMS so that even if the application server is compromised the
users' data is under protection*. This requires DBMS to be able to
authenticate application-level users  (know which application-level
user it is communicating with). That it, we need to move the
authentication logic of application-level users into DBMS. For this
purpose, using store procedures (or something similar) is a must. I
think even if a security mechanism is designed to be easy to use, it
will still require some expertise.

* this mechanism can't help if the attackers control the app server
completely and the users are not aware of that and keep using the app.
In that case the attackers will be able to collect all the credentials
of the users who log in while they are in charge.

> If a temp table is being used then dynamic SQL may be required and therefore
> a plpgsql function will be involved to handle looking up the current user, as 
> you
> won't be using PG roles.

This is why I'd like to have global temp table in PG. With that we can
probably get around of dynamic SQL.

> > (3) CREATE AUTHENTICATION FUNCTION
> > In our mechanism, we ask web application developers provide an
> > authentication function which normally takes user id and password as
> > inputs and returns a row containing all the identifiers (attributes)
> > of the corresponding application-level user. Let us call the place
> > storing the current application-level user's identifiers as
> > "identifier store".
> I would *strongly* advocate *against* passing the password to the
> database in any (non-hashed) form.  You are much better off using a
> one-way hash as early as possible in the stack (ideally, in whatever
> system initially receives the password on the server side) and then
> comparing that one-way hash.  Of course, passwords in general are not
> considered secure and one-time passwords, hardware tokens, or PIV /
> HSPD12 / CAC cards with client-side certificates.

You are absolutely right. I should've explained it better. I just
wanted to show how authentication works and skipped all the hashing
part.

Thanks,
Zhaomo

On Fri, Sep 5, 2014 at 5:52 PM, Stephen Frost  wrote:
> Zhaomo,
>
>   As an FYI- we generally prefer inline responses rather than
>   top-posting on the PostgreSQL mailing lists.  Thanks.
>
> * Zhaomo Yang (zhy...@cs.ucsd.edu) wrote:
>> (1) Two philosophies important to our design
>> - Try to not force web application developers to make too many changes
>> to their apps if they wa.
>
> That's certainly fair.
>
>> - Try to make our mechanism as simple as possible.
>> Web application developers have all kinds of backgrounds. If the
>> security mechanism is too alien to them, they wouldn't use it.
>
> I'm surprised to hear this and a suggestion to used stored procedures in
> the same email- SPs are generally considered 'foreign' to the web
> developers that I've talked to. :)  That said, I'll grant that there are
> generally two camps: those who expect a database to only have BerkleyDB
> level key/value capabilities, and those who know what they're doing and
> what relational databases and SQL are all about.  The latter (and clear
> minority) group will take advantage of these capabilites, certainly,
> regardless of how they are expressed and are likely already comfortable
> using stored procedures and database-level roles.
>
>> (2) Why we need to cache application-level users' identifiers
>> We want to differentiate applic

Re: [HACKERS] A mechanism securing web applications in DBMS

2014-09-05 Thread Zhaomo Yang
Stephen,

There are several things I want to explain:

(1) Two philosophies important to our design
- Try to not force web application developers to make too many changes
to their apps if they wa.
- Try to make our mechanism as simple as possible.
Web application developers have all kinds of backgrounds. If the
security mechanism is too alien to them, they wouldn't use it.

(2) Why we need to cache application-level users' identifiers
We want to differentiate application-level users in DBMS, but not by
creating a DB user (or role in PG's terminology ) for every
application-level user, otherwise there will be all sorts of problems
when the number of application-level users is greater than a threshold
(e.g. catalog, as you mentioned). Instead, we still use one DB user
representing all the application-level users, just as how web apps
work now. Besides the identifiers (attributes) of a application-level
user are stored in some private place of the corresponding session
(e.g. temp table) when the application-level user authenticates so
that the DBMS can differentiate application-level users. (Connection
pooling should be fine as long as an application session doesn't
return its connection until it finishes. )

Normally, a web application authenticates an application-level user by
making a SELECT query with the user provided user id and password on
the password table to see if there is a match (Of course this is an
over simplified version of how authentication works. ).  Using our
mechanism, the web application instead calls the authentication
function, which does a SELECT on the table first, and store the
identifiers of that application-level user somewhere if a match found.
The identifiers of the current application-level user are referenced
by the policies so that fine-grained access control can be enforced.

(3) CREATE AUTHENTICATION FUNCTION
In our mechanism, we ask web application developers provide an
authentication function which normally takes user id and password as
inputs and returns a row containing all the identifiers (attributes)
of the corresponding application-level user. Let us call the place
storing the current application-level user's identifiers as
"identifier store".

For example,

This is an authentication function provided by a app developer.

CREATE AUTHENTICATION FUNCTION auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
...
$$ LANGUAGE plpgsql SECURITY DEFINER;

Under the hood, two regular functions will be defined

# the same function with a different name
CREATE FUNCTION _auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
  # copy the function body from the CREATE AUTHENTICATION FUNCTION above
$$ LANGUAGE plpgsql SECURITY DEFINER;

# the function which is actually called in the server code
CREATE FUNCTION auth (user_id TEXT, password TEXT)
RETURNS table(uid BIGINT, permissions TEXT[]) AS $$
  # clear the identifier store
  # execute function _auth and insert the result into the identifier store.
  # return the row in the identifier store
$$ LANGUAGE plpgsql SECURITY DEFINER;

The whole point of this CREATE AUTHENTICATION FUNCTION syntax is to
reduce developers' work.  By giving developers very specific
instructions on how to write an authentication function, we hope they
would find it easy to write one. Admittedly, however, what CREATE
AUTHENTICATION FUNCTION does can be achieved by CREATE FUNCTION.

Please let me know if you have any other questions.

Zhaomo

On Thu, Sep 4, 2014 at 6:53 PM, Stephen Frost  wrote:
> Zhaomo,
>
> * Zhaomo Yang (zhy...@cs.ucsd.edu) wrote:
>> I am a graduate student from UC San Diego. My adviser, Dr. Kirill
>> Levchenko, and I have been working on a web/DB security project for
>> the last few months. Since fine-grained access control in DBMS is part
>> of our project and the PostgreSQL community is also working on it now,
>> we would like to exchange some ideas on the issue with you.
>
> Fantastic!  Very interested.
>
>> 1. Background
>> Nowadays people entrust more and more sensitive data to web
>> applications, while security vulnerabilities are common in these
>> applications. The normal structure of web applications consists of
>> three tiers: client-side code, server-side code and a database. In
>> server-side code a database user representing all the
>> application-level users interacts with the database with full
>> privileges. Since currently database built-in access control is too
>> coarse, no access control mechanism is used in most of web
>> applications. This situation is not ideal since a malicious
>> application user can tamper with other users’ data by exploiting
>> vulnerabilities in the application, and if the application is
>> completely under the malicious user’s control so is the database.
>
&

[HACKERS] A mechanism securing web applications in DBMS

2014-09-04 Thread Zhaomo Yang
. Collaboration.

We are looking forward to joining forces with PostgreSQL community on
this problem. More specifically, we have the following suggestions
which we hope the community would consider:

1)  Using GRANT-WHERE syntax for the row-level access control. If for
any reasons it is impossible, we hope at least a clause will be added
to the “ALTER TABLE … SET ROW SECURITY” syntax such that a recipient
can be specified;

2) Restarting Global Temporary Table project;

3) Accepting “CREATE AUTHENTICATION FUNCTION” as a variant of “CREATE
FUNCTION” command.

We’d like to discuss the mechanisms above with the community further,
and if any of the mechanisms are accepted by the community, we also
would like to take part in implementing it. Please feel free to
contact us if you have any questions.


Zhaomo Yang  zhy...@ucsd.edu
Kirill Levchenko   klevc...@ucsd.edu


References:
[1] S. Chaudhuri, T. Dutta, and S. Sudarashan. Fine Grained
Authorization Through Predicated Grants. In Proceedings of the 23rd
IEEE International Conference on Data Engineering (ICDE), pages
1174–1183, April 2007.

[2] idea: global temp tables
http://www.postgresql.org/message-id/162867790904271344s1ec96d90j6cde295fdcc78...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] questions about ON SELECT rules

2013-01-22 Thread Zhaomo Yang
Hi folks,

I'm working on a project where I need to run some commands/functions when
SELECT are executed on certain tables. I looked at the rule system and
found the following paragraph in the documentation:

  "Currently, there can be only one action in an ON SELECT rule, and it
must be an unconditional SELECT action that is INSTEAD. This
  restriction was required to make rules safe enough to open them for
ordinary users, and it restricts ON SELECT rules to act like views."

These restrictions make it impossible  to implement what I need using the
rule system. I'm think about tweaking how rule statement is executed to
lift some restrictions (more actions, other type actions, etc.) but not
sure what impacts this will introduce into the whole system. Why are rules
not safe when these restrictions are removed?

Thanks,
Zhaomo


[HACKERS] GSoC 2011 Project: Enhancing Foreign-data Wrapper (FDW) functionality for PostgreSQL

2011-04-26 Thread Zheng Yang
Hi Hackers!

This is Zheng Yang from National University of Singapore. 

I'm very glad to know that my proposal, "Enhancing Foreign-data Wrapper (FDW) 
functionality for PostgreSQL",  has been accepted 
by this year's Google Summer of Code!  
Thank you guys for the valuable comments and suggestions for my proposal 
(Especially thanks to my mentor Dave Page).


Cheers!
Zheng Yang
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use array in a dynamic statement

2009-03-05 Thread Sophie Yang

I don't see how the problem can be solved by merging the function logic into 
the calling function. The int[][] array is supplied by user, and I still have 
the issue to pass it into a dynamic statement, no matter in which function, 
unless I serialize it to string and deserialize back to int[][].


- Original Message 
From: Robert Haas 
To: Sophie Yang 
Cc: Pavel Stehule ; pgsql-hackers@postgresql.org
Sent: Thursday, March 5, 2009 6:51:48 PM
Subject: Re: [HACKERS] Use array in a dynamic statement

On Thu, Mar 5, 2009 at 8:32 PM, Sophie Yang  wrote:
>
> I was wondering why USING clause is not supported in pl/pgsql dynamic 
> statement. Serialization is the option I tried to avoid, but it seems there 
> is no better approach available.
>
> Just to say a few more about the usage of my function. In dag_tree_1, (rid, 
> rtid) is the primary key, which identifies a node in a tree structure. The 
> idx field is a kind of dewy index. for example:
> rid   rtid idx
> 1123   1   .0006.0033
> 3231   1   .0006
> 786 6   .0007.8853
> 80923  2   .0007.8853.2382
>
> The function takes in a list of rid and rtid pair (nids), sort them by the 
> length of the dewy index, which is equivalent to sort the nodes by their tree 
> depth. That's what I try to achieve. Maybe someone has different idea to 
> implement the function?

Well the function is pretty simple.  Maybe you could merge the logic
encapsulated by the function into the containing query?

...Robert



  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use array in a dynamic statement

2009-03-05 Thread Sophie Yang

I was wondering why USING clause is not supported in pl/pgsql dynamic 
statement. Serialization is the option I tried to avoid, but it seems there is 
no better approach available.

Just to say a few more about the usage of my function. In dag_tree_1, (rid, 
rtid) is the primary key, which identifies a node in a tree structure. The idx 
field is a kind of dewy index. for example:
rid   rtid idx
1123   1   .0006.0033
3231   1   .0006
786 6   .0007.8853
80923  2   .0007.8853.2382

The function takes in a list of rid and rtid pair (nids), sort them by the 
length of the dewy index, which is equivalent to sort the nodes by their tree 
depth. That's what I try to achieve. Maybe someone has different idea to 
implement the function?

Thanks,
Sophie
  

- Original Message 
From: Pavel Stehule 
To: Sophie Yang 
Cc: pgsql-hackers@postgresql.org
Sent: Thursday, March 5, 2009 12:06:24 AM
Subject: Re: [HACKERS] Use array in a dynamic statement

Hello

you can't to use parameters inside literal. There hasn't any sense (to
8.3, 8.4 will support USING).

you have to use serialisation to string and quoting.

some like

CREATEOR REPLACE FUNCTION foo(int[])
RETURNSSETOF int AS $$
DECLAREr record;
BEGIN
  FOR r IN EXECUTE
  'SELECT (' || quote_literal($1::text) ||
'::int[])[i] AS x
  FROM generate_series(1, array_upper(' ||
quote_literal($1::text) || '::int[],1)) g(i)'
  LOOP
RETURN NEXT r.x;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM foo(ARRAY[1,2,3]);


CREATE OR REPLACE FUNCTION foo84(int[])
RETURNS SETOF int AS $$
DECLARE r record;
BEGIN
  FOR r IN EXECUTE
  'SELECT $1[i] AS x FROM generate_series(1,
array_upper($1,1)) g(i)' USING $1
  LOOP
RETURN NEXT r.x;
  END LOOP;
  RETURN;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM foo84(ARRAY[1,2,3]);

regards
Pavel Stehule

2009/3/5 Sophie Yang :
>
> Hi,
>
> I am trying to implement a PL/PgSQL function as following:
> CREATE OR REPLACE FUNCTION sort_by_d_idx (nids INT[][], tbl_name VARCHAR)
> RETURNS varchar[]
> AS $$
> DECLARE
>  result varchar[];
> BEGIN
>
>  EXECUTE 'SELECT ARRAY(SELECT t.idx FROM generate_series(array_lower($1,1), 
> array_upper($1,1)) AS s(i), '
>||tbl_name||' t WHERE $1[s.i][1] = t.rid and $1[s.i][2] = t.rtid ORDER 
> BY length(t.idx))'
>  INTO result;
>
>  RETURN result;
> END;
> $$ LANGUAGE plpgsql;
>
> I got an error "ERROR: there is no parameter $1" when I test the function 
> with:
> select sort_by_d_idx('{{8148, 1}, {8149, 1}, {300, 2}, {8163, 1}, {8170, 
> 1}}'::int[][], 'd_tree_1');
>
> The error is understandable, but my question is how to supply the int[][] 
> array into the dynamic SQL?
>
> To help understand the dynamic statement, the structure of d_tree_1 is (rid, 
> rtid, idx). The PK is (rid, rtid) pair.
>
> If the tbl_name is fixed, the following function works well:
> CREATE OR REPLACE FUNCTION sort_by_d_idx2 (nids INT[][])
> RETURNS varchar[]
> LANGUAGE SQL
> AS $$
> SELECT ARRAY(
>SELECT t.idx
>FROM
>generate_series(array_lower($1,1), array_upper($1,1)) AS s(i), 
> d_tree_1 t
>WHERE $1[s.i][1] = t.rid and $1[s.i][2] = t.rtid
>ORDER BY length(t.idx)
> );
> $$;
>
> Unfortunately, the tbl_name is determined at query time.
>
> Please help.
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Use array in a dynamic statement

2009-03-04 Thread Sophie Yang

Hi,

I am trying to implement a PL/PgSQL function as following:
CREATE OR REPLACE FUNCTION sort_by_d_idx (nids INT[][], tbl_name VARCHAR)
RETURNS varchar[]
AS $$
DECLARE
  result varchar[];
BEGIN

  EXECUTE 'SELECT ARRAY(SELECT t.idx FROM generate_series(array_lower($1,1), 
array_upper($1,1)) AS s(i), '
||tbl_name||' t WHERE $1[s.i][1] = t.rid and $1[s.i][2] = t.rtid ORDER 
BY length(t.idx))'
  INTO result;

  RETURN result;
END;
$$ LANGUAGE plpgsql;

I got an error "ERROR: there is no parameter $1" when I test the function with:
select sort_by_d_idx('{{8148, 1}, {8149, 1}, {300, 2}, {8163, 1}, {8170, 
1}}'::int[][], 'd_tree_1');

The error is understandable, but my question is how to supply the int[][] array 
into the dynamic SQL? 

To help understand the dynamic statement, the structure of d_tree_1 is (rid, 
rtid, idx). The PK is (rid, rtid) pair. 

If the tbl_name is fixed, the following function works well:
CREATE OR REPLACE FUNCTION sort_by_d_idx2 (nids INT[][])
RETURNS varchar[]
LANGUAGE SQL
AS $$
SELECT ARRAY(
SELECT t.idx
FROM
generate_series(array_lower($1,1), array_upper($1,1)) AS s(i), d_tree_1 
t
WHERE $1[s.i][1] = t.rid and $1[s.i][2] = t.rtid
ORDER BY length(t.idx)
);
$$;

Unfortunately, the tbl_name is determined at query time. 

Please help.



  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Question on adding new indexes to Postgresql

2003-09-25 Thread Zhen Yang
Thanks. Currently I am also referring to different implementations
of an index to see the most suitable to implement. For an R+ tree
( a variant of R tree ), I have found an example implementation in
C. Does anyone know of an R+ tree implementation done with GIST
(generalized search tree)? Since a GIST structure is also available
in Postgresql.

Thanks again.

Zhen Yang


--- Tom Lane <[EMAIL PROTECTED]> wrote:
> Zhen Yang <[EMAIL PROTECTED]> writes:
> > What are the C interface functions in Postgresql to add indexes?
> 
> You will need to study the source code for the existing index types.
> There is not much documentation other than the sources for the index
> access method APIs.
> 
>   regards, tom lane
> 
>
>
> > From: [EMAIL PROTECTED]
> > Hello, I am posting to ask for some help and suggestions.
> > 
> > For a project I am doing in university, I need to add indexing
> > mechanisms to the Postgresql database (R+ tree, bitmap indexes).
> > Version of Postgresql is 7.3.3. I would like to ask how I can
> > integrate the structures and algorithms for the indexes into
> > the Postgresql DB. What are the C interface functions in Postgresql
> > to add indexes? Do I need to handle things like memory buffer
> > management, disk storage management when implementing the indexes?
> > Thanks in advance.
> > 
> > 
> > Zhen Yang
> > 

__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Question on adding new indexes to Postgresql

2003-09-23 Thread Zhen Yang

Hello, I am posting to ask for some help and suggestions.

For a project I am doing in university, I need to add indexing
mechanisms to the Postgresql database (R+ tree, bitmap indexes).
Version of Postgresql is 7.3.3. I would like to ask how I can
integrate the structures and algorithms for the indexes into
the Postgresql DB. What are the C interface functions in Postgresql
to add indexes? Do I need to handle things like memory buffer
management, disk storage management when implementing the indexes?
Thanks in advance.


Sincerely,
Zhen Yang

__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

---(end of broadcast)---
TIP 8: explain analyze is your friend