date:20130429

Re: [HACKERS] Substituting Checksum Algorithm (was: Enabling Checksums)

2013-04-29 Thread Simon Riggs

On 28 April 2013 13:22, Ants Aasma  wrote:

> I have updated the base patch. This is supposed to go under the
> cflags-vector patch that Jeff posted yesterday.


I've committed your patch, with two changes
* one comment extended
* adding the .h file from Jeff's last main patch

Please can Ants and Jeff review the commit and confirm that is what we
wanted. Thanks.

I'm about to light up the build farm with a trial commit of the
compiler instructions stuff.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-29 Thread Ashutosh Bapat

On Mon, Apr 29, 2013 at 7:31 PM, Tom Lane  wrote:

> Ashutosh Bapat  writes:
> > Is there any reason why do we want to check the functional dependencies
> at
> > the time of parsing and not after rewrite? Obviously, by doing so, we
> will
> > allow creation of certain views which will start throwing errors after
> the
> > underlying table changes the primary key. Is it mandatory that we throw
> > "functional dependency" related errors at the time of creation of views?
>
> From a usability standpoint, I would think so.  And really the only
> excuse for the functional-dependency feature to exist at all is
> usability; it adds nothing you couldn't do without it.
>
> If we wanted to do something like this, I think the clean way to do it
> would be to invent a notion of unique/not-null/pkey constraints on
> views, so that the creator of a view could declaratively say that he
> wants such a property exposed.  That is, the example would become
> something like
>
> create table t1 (id int primary key, ... other stuff ...);
> create view v1 as select * from t1;
> alter view v1 add primary key(id);
> create view v2 as select * from v1 group by id;
>
> The pkey constraint on v1 is just a catalog entry with a dependency on
> t1's pkey constraint; there's no actual index there.  But now, v2 can
> be built with a dependency on v1's pkey, not t1's, and the action-at-
> a-distance problem goes away.  For example, a "CREATE OR REPLACE v1"
> command could check that the new view definition still provides
> something for v1's pkey to depend on, and throw error or not without any
> need to examine the contents of other views.  Dropping various elements
> of this schema would work unsurprisingly, too.
>
> This would, of course, require a significant chunk of new code, and
> personally I do not think the feature would be worth it.  But it
> would be a clean and usable design.
>
>
Yes, this looks better design. But I do not see any interest as such. So,
if I have to spend time here, there is higher chance it would go waste.

Will it be useful to have primary key grouping functionality extended to
the subqueries? For example,

CREATE TEMP TABLE products (product_id int, name text, price numeric);
CREATE TEMP TABLE sales (product_id int, units int);
ALTER TABLE products ADD PRIMARY KEY (product_id);

SELECT product_id, p.name, (sum(s.units) * p.price) AS sales FROM (select *
from products) p LEFT JOIN (select * from sales) s USING (product_id) GROUP
BY product_id;

This subquery gives error (p.name should be part of group by clause), but
functionally it's grouping based on primary key. Is there a better way to
use the functional dependency for grouping?


> regards, tom lane
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Jaime Casanova

On Mon, Apr 29, 2013 at 10:51 PM, Peter Eisentraut  wrote:
> On Sun, 2013-04-28 at 22:55 -0700, Atri Sharma wrote:
>> If we add support for weighted graphs, we can probably add support for
>> some common graph algorithms, such as Djikstra's algorithm, Bellman
>> Ford algorithm, a MST making algorithm, network flow algorithms.
>
> You might find that pgRouting already does much of this.
>

actually, i was going to suggest to Atri to take a look at that,
pgrouting is currently in develop of v2.0 which will rewrite some
parts (including some of the algorithms).

Maybe this datatype could fit better as part of pgrouting

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The missing pg_get_*def functions

2013-04-29 Thread Joel Jacobson

On Tue, Apr 30, 2013 at 12:58 AM, Tom Lane  wrote:
> Noah Misch  writes:
>> Note also that minor releases can readily fix bugs in C-language functions,
>> but we have no infrastructure to update sql-language functions after initdb.
>> That flexibility is unfortunate to lose, particularly for something that
>> pg_dump depends on.
>
> That alone would probably be sufficient reason why we would never allow
> pg_dump to depend on any such thing (not that I see a compelling
> argument for it to do so anyway...).

It would be better to find a way to update sql-language functions in
minor upgrades, instead of shutting that door entirely for all future
implementation ideas involving sql-language functions in the
pg_catalog.

> The long and the short of it here is that there isn't any very good
> reason to migrate any of the existing pg_dump-side functionality into
> server-side functions, and especially not server-side functions that
> aren't in C.  One of the things that we frequently recommend when doing
> upgrades is that you do the dump with the newer version's pg_dump, so
> as to get the benefits of any bug fixes that are in it.  The more
> dump functionality is on the server side, the less opportunity we have
> to repair things that way.

There are two very good reasons in particular of doing exactly that:

a) Make it possible to develop tools which need to be able or create
(and/or drop) schema objects, such as a schema migration tool, capable
of migrating between different revisions of the schema. I started
working on such a project two years ago, but realised most of the
actual code I wrote was code to define how to create/drop objects, and
in what order based on pg_depend, and I thought such code would be
better to put in the core, and put the project on hold awaiting a
complete coverage of the pg_get_*def functions, but they never
happened, so here we are two years later. This is the project I'm
talking about: https://github.com/gluefinance/pov

b) Reduce the amount of hard-coded sql queries in pg_dump.c. Won't
help for previous versions, but perhaps the sql queries to support
older versions can be hidden in some separately included file at least
so you don't have to see it, which would make sense since the sql for
the old versions won't change and won't need the same amount of active
development as the code for the latest version. Makes no sense of
keeping the sql for all versions at the same place.

> It may be that the functions Joel proposes are worth having for other
> tools to use, but I'm not in favor of making pg_dump use them.

I don't agree with you as I don't think it's a very hard problem to
deal with updates of sql-language functions in the pg_catalog schema.
If we fix that first, this and a whole range of problems, where the
solution involves taking in-data from pg_catalog, and producing
out-data based on that, becomes super easy to achieve by writing
simple sql-language functions with direct and simple access to the
pg_catalog, instead of writing C functions with inline hard-coded sql.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Peter Eisentraut

On Sun, 2013-04-28 at 22:55 -0700, Atri Sharma wrote:
> If we add support for weighted graphs, we can probably add support for
> some common graph algorithms, such as Djikstra's algorithm, Bellman
> Ford algorithm, a MST making algorithm, network flow algorithms. 

You might find that pgRouting already does much of this.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The missing pg_get_*def functions

2013-04-29 Thread Joel Jacobson

On Tue, Apr 30, 2013 at 12:46 AM, Noah Misch  wrote:
> Note that while the sql procedural language is fair game, plpgsql currently is
> not.  We install it by default, but the DBA is free to drop it.

Right. The "sql" procedural language was what I had in mind.

> Those existing functions give a mostly-SnapshotNow picture of their objects,
> but an sql-language implementation would give a normally-snapshotted picture.

I assume "normally" is better than "mostly"?

> That status quo is perhaps more an implementation accident than a designed
> behavior.  Before proliferating functions like this, we should pick a snapshot
> policy and stick to it.  See the block comment at the top of pg_dump.c.

I didn't think there would be any reason to migrate the existing
functions from C to SQL, but this snapshot problem seems like a good
motive to do it. If they would all be written in SQL, the snapshot
problem would be solved, right?

> Note also that minor releases can readily fix bugs in C-language functions,
> but we have no infrastructure to update sql-language functions after initdb.
> That flexibility is unfortunate to lose, particularly for something that
> pg_dump depends on.  Now, the right thing is probably to design a mechanism
> for applying simple catalog updates in concert with a minor release.  In the
> mean time, its absence puts the sql PL at a nontrivial disadvantage here.

What do you mean with "infrastructure"? Isn't it as simple as CREATE
OR REPLACE FUNCTION? As long as the interface the pg_get_*def
functions don't change, I cannot see how simply replacing the existing
functions in a minor release upgrade could do any harm.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back branches vs. gcc 4.8.0

2013-04-29 Thread Peter Eisentraut

On Sat, 2013-04-06 at 12:59 -0400, Tom Lane wrote:
> The reason I'm thinking it's a good idea is that it would expose any
> remaining places where we have nominally var-length arrays embedded in
> larger structs.  Now that I've seen the failures with gcc 4.8.0, I'm
> quite worried that there might be some more declarations like that
> which we've not identified yet, but that by chance aren't causing
> obvious failures today. 

Here is a rough patch that replaces almost all occurrences of
something[1] in a struct with FLEXIBLE_ARRAY_MEMBER.  It crashes left
and right (because of sizeof issues, probably), but at least so far the
compiler hasn't complained about any flexible-array members not at the
end of the struct, which is what it did last time.  So the answer to
your concern so far is negative.

Completing this patch will be quite a bit more debugging work.  Some
kind of electric fence for palloc would be helpful.

diff --git a/contrib/cube/cubedata.h b/contrib/cube/cubedata.h
index fd0c26a..01e0818 100644
--- a/contrib/cube/cubedata.h
+++ b/contrib/cube/cubedata.h
@@ -6,7 +6,7 @@ typedef struct NDBOX
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	unsigned int dim;
-	double		x[1];
+	double		x[FLEXIBLE_ARRAY_MEMBER];
 } NDBOX;
 
 #define DatumGetNDBOX(x)	((NDBOX*)DatumGetPointer(x))
diff --git a/contrib/intarray/_int.h b/contrib/intarray/_int.h
index 56aa23c..c35e862 100644
--- a/contrib/intarray/_int.h
+++ b/contrib/intarray/_int.h
@@ -72,7 +72,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		flag;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } GISTTYPE;
 
 #define ALLISTRUE		0x04
@@ -132,7 +132,7 @@ typedef struct QUERYTYPE
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;			/* number of ITEMs */
-	ITEM		items[1];		/* variable length array */
+	ITEM		items[FLEXIBLE_ARRAY_MEMBER];
 } QUERYTYPE;
 
 #define HDRSIZEQT	offsetof(QUERYTYPE, items)
diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h
index 563970a..bf25f4c 100644
--- a/contrib/ltree/ltree.h
+++ b/contrib/ltree/ltree.h
@@ -9,7 +9,7 @@
 typedef struct
 {
 	uint16		len;
-	char		name[1];
+	char		name[FLEXIBLE_ARRAY_MEMBER];
 } ltree_level;
 
 #define LEVEL_HDRSIZE	(offsetof(ltree_level,name))
@@ -19,7 +19,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	uint16		numlevel;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } ltree;
 
 #define LTREE_HDRSIZE	MAXALIGN( offsetof(ltree, data) )
@@ -33,7 +33,7 @@ typedef struct
 	int32		val;
 	uint16		len;
 	uint8		flag;
-	char		name[1];
+	char		name[FLEXIBLE_ARRAY_MEMBER];
 } lquery_variant;
 
 #define LVAR_HDRSIZE   MAXALIGN(offsetof(lquery_variant, name))
@@ -50,7 +50,7 @@ typedef struct
 	uint16		numvar;
 	uint16		low;
 	uint16		high;
-	char		variants[1];
+	char		variants[FLEXIBLE_ARRAY_MEMBER];
 } lquery_level;
 
 #define LQL_HDRSIZE MAXALIGN( offsetof(lquery_level,variants) )
@@ -71,7 +71,7 @@ typedef struct
 	uint16		numlevel;
 	uint16		firstgood;
 	uint16		flag;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } lquery;
 
 #define LQUERY_HDRSIZE	 MAXALIGN( offsetof(lquery, data) )
@@ -106,7 +106,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	int32		size;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } ltxtquery;
 
 #define HDRSIZEQT		MAXALIGN(VARHDRSZ + sizeof(int32))
@@ -205,7 +205,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	uint32		flag;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } ltree_gist;
 
 #define LTG_ONENODE 0x01
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index ed649b8..f030558 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -63,7 +63,7 @@ typedef struct
 {
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	uint8		flag;
-	char		data[1];
+	char		data[FLEXIBLE_ARRAY_MEMBER];
 } TRGM;
 
 #define TRGMHDRSIZE		  (VARHDRSZ + sizeof(uint8))
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index f48c0bc..4323184 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -1066,7 +1066,7 @@ static bool MatchNamedCall(HeapTuple proctup, int nargs, List *argnames,
 		 */
 		effective_nargs = Max(pronargs, nargs);
 		newResult = (FuncCandidateList)
-			palloc(sizeof(struct _FuncCandidateList) - sizeof(Oid)
+			palloc(offsetof(struct _FuncCandidateList, args)
    + effective_nargs * sizeof(Oid));
 		newResult->pathpos = pathpos;
 		newResult->oid = HeapTupleGetOid(proctup);
@@ -1580,7 +1580,7 @@ static bool MatchNamedCall(HeapTuple proctup, int nargs, List *argnames,
 	 * separate palloc for each operator, but profiling revealed that the
 	 * pallocs used an unreasonably large fraction of parsing time.
 	 */
-#define SPACE_PER_OP MAXALIGN(sizeof(struct _FuncCandidateList) + sizeof(Oid))
+#define SPACE_PER_OP MAXAL

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Robert Haas

On Mon, Apr 29, 2013 at 9:44 PM, Stephen Frost  wrote:
> * Kevin Grittner (kgri...@ymail.com) wrote:
>> If they modified the heap files that way while the server was
>> running, the results would be somewhat unpredictable.  If they did
>> it while the server was stopped, starting the server and attempting
>> to access the matview would generate:
>
> Right, the point being that they could (ab)use it as a flag to trigger
> something to happen.  I'd also be worried about failure cases where
> files appear to be zero-length.

If you assume that people are going to modify files while the backend
is running, nothing we do anywhere is safe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl non-idempotent behavior change

2013-04-29 Thread Peter Eisentraut

On Sat, 2013-04-27 at 14:24 -0400, Tom Lane wrote:
> Yeah, this is not tolerable.  We could think about improving the logic
> to have a stronger check on whether the old server is really there or
> not (ie it should be doing something more like pg_ping and less like
> just checking if the pidfile is there).  But given how close we are to
> beta, maybe the best thing is to revert that change for now and put it
> back on the to-think-about-for-9.4 list.  Peter?

Reverted.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Stephen Frost

* Kevin Grittner (kgri...@ymail.com) wrote:
> If they modified the heap files that way while the server was
> running, the results would be somewhat unpredictable.  If they did
> it while the server was stopped, starting the server and attempting
> to access the matview would generate:

Right, the point being that they could (ab)use it as a flag to trigger
something to happen.  I'd also be worried about failure cases where
files appear to be zero-length.

> > Or we end up wanting to have that file be non-zero and considered
> > 'empty' later, but we don't want pg_upgrade running around
> > touching all of the existing files out there?
> 
> I didn't follow this one; could you restate it, please?

Down the road we decide that we shouldn't have any zero-length files
(perhaps due to checksums..?), yet we have to special case around these
mat views and figure out a way to deal with them during pg_upgrade.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] The missing pg_get_*def functions

2013-04-29 Thread Robert Haas

On Mon, Apr 29, 2013 at 7:58 PM, Tom Lane  wrote:
> The long and the short of it here is that there isn't any very good
> reason to migrate any of the existing pg_dump-side functionality into
> server-side functions, 

There are a number of good reasons to do just that, which have been
previously discussed.

> and especially not server-side functions that
> aren't in C.

+1.

> One of the things that we frequently recommend when doing
> upgrades is that you do the dump with the newer version's pg_dump, so
> as to get the benefits of any bug fixes that are in it.  The more
> dump functionality is on the server side, the less opportunity we have
> to repair things that way.

But why wouldn't we be able to fix the version in the server, if it
turns out to be buggy?  I suppose we wouldn't fix bugs discovered
after EOL, but I'm not sure that's a sufficient objection.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Robert Haas

On Mon, Apr 29, 2013 at 3:34 PM, Kevin Grittner  wrote:
> The hysteria and FUD about using the currently-supported technique
> for unlogged tables to implement unlogged matviews are very
> discouraging.  And the notion that we would release a matview
> feature which allowed false results (in the form of treating a
> never-populated matview as a legal empty matview) is truly scary to
> me.

I think that labeling other people's concerns as hysteria and FUD is
not something which will advance the debate.  I might be
misunderstanding the nature of Tom's concern, but I believe the
concern is not so much that whatever bugs exist now can't be fixed,
but that the use of this mechanism might leave us vulnerable to a
steady stream of future bugs or make it hard to do things which, in
the future, we may want to do.

To take one not-particularly-hypothetical example of that, I was
recently discussing with someone, probably Noah, the idea of adding a
command to pre-extend a relation to a certain number of blocks (and
prevent vacuum from truncating those extra blocks away again).  If
you're counting on the number of blocks in the relation to mean
something other than the number of blocks in the relation, there's a
potential conflict there.   Now maybe the fact that you've defined 0
to mean non-scannable and 1+ to mean scannable means there's no real
problem with that particular idea... but suppose you'd defined it the
other way around.  I don't think it's purely unreasonable to think
that this could conflict with future developments in other areas.

The basic problem here is that the way unlogged tables work is kind of
a crock.  I didn't fully realize that at the time I implemented it,
but it's boxed us in a little in terms of trying to extend that
feature; for example, we still don't have a way to convert logged
relations to unlogged relations, or visca versa, and we're not going
to grow a way without some kind of reworking of that mechanism.  This
is another aspect of that problem.  To solve these problems cleanly,
we need either a relation metapage, or some more state in pg_class.
Short of that, this isn't a scalable solution.  Even if you think it's
workable to coax one more bit of state out of the file length, what
will you do when you need a second bit (or say another 32 bits)?

> If we can't tell the difference between those two things, I
> don't think we should be releasing the feature.

So, speaking of hysteria...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Gavin Flower


On 30/04/13 12:33, Любен Каравелов wrote:

- Цитат от Christopher Browne (cbbro...@gmail.com), на 29.04.2013 в 23:18 
-

The one place where I *could* see a special type having a contribution
is for there to be a data type that can contain an arbitrary number of
links.  That means you have one tuple per node, and, instead of
needing a tuple for each link between nodes, you have one attribute
indicating *all* the links.  (And "interesting" is for that one
attribute to be usable for foreign key purposes.)  That has a hard
time scaling in cases where nodes are over-connected, which is,
broadly speaking, an acceptable sort of scenario.
...


Hello,

 From the start of the discussion I was trying to get what this graph
data type should be... I could not grasp it.

With the current postgres, in the most simple case we could do something
like:

create table node (
   node_id serial primary key,
   ...
);
create table edge(
   from integer references node,
   to   integer[] -- each element references node
);

With the addition of foreign keys constraint on arrays elements (that I
understand is work in progress), we could guarantee referential
integrity of the graph - I really hope that it will be ready for 9.4.

Without the array elements foreign keys constraints, if we have to
guarantee the integrity we could do:

create table edge(
   from integer referecens node,
   to   integer references node,
   weight real,
   ...
);

What is missing are some algorithms. I have personaly implemented some
algorithms using recursive queries and it is doable (most of my
experience was with Oracle but lately I have put in production some
postgresql schemas with recursive queries that are working just fine).
For large scale eigen values factorisation (think pagerank) this
sparse matrix form is reasonable data organisation (though I have
doubts that the best place to run this job is in the database)

Best regards
luben


For directed graphs, where each node has a set of lines each with their 
own weight, I would suggest:


create table node
(
id   int primary key,
to_node_id   integer[], -- references node(id)
weight   float[],   -- weight of line
  ...
);

So if nodeA has a line going to nodeB - then you can't go from nodeB 
back to nodeA, unless nodeB has nodeA as a destination for one of its 
lines. Would be best to ensure that the arrays have the same length!


This way lines can also have asymmetric weights.

Though it would be better to have something like:
create table node
(
id  int primary key,
edges   directed_line[],
  ...
);

Where using a 'directed_line' instance could be an object in its own 
right - with functions to extract start/end nodes along with the weight 
and to add/remove (destination_node, weight) pairs. Yet the system could 
internally just store the start node once for an array of directed_lines 
(though something other than array would be more appropriate).



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Любен Каравелов


- Цитат от Christopher Browne (cbbro...@gmail.com), на 29.04.2013 в 23:18 
-
>
> The one place where I *could* see a special type having a contribution 
> is for there to be a data type that can contain an arbitrary number of
> links.  That means you have one tuple per node, and, instead of 
> needing a tuple for each link between nodes, you have one attribute 
> indicating *all* the links.  (And "interesting" is for that one 
> attribute to be usable for foreign key purposes.)  That has a hard
> time scaling in cases where nodes are over-connected, which is,
> broadly speaking, an acceptable sort of scenario.
> ...


Hello,

From the start of the discussion I was trying to get what this graph 
data type should be... I could not grasp it.

With the current postgres, in the most simple case we could do something 
like:

create table node (
  node_id serial primary key,
  ...
);
create table edge(
  from integer references node,
  to   integer[] -- each element references node
);

With the addition of foreign keys constraint on arrays elements (that I
understand is work in progress), we could guarantee referential
integrity of the graph - I really hope that it will be ready for 9.4.

Without the array elements foreign keys constraints, if we have to
guarantee the integrity we could do:

create table edge(
  from integer referecens node,
  to   integer references node,
  weight real,
  ...
);

What is missing are some algorithms. I have personaly implemented some
algorithms using recursive queries and it is doable (most of my
experience was with Oracle but lately I have put in production some
postgresql schemas with recursive queries that are working just fine).
For large scale eigen values factorisation (think pagerank) this
sparse matrix form is reasonable data organisation (though I have
doubts that the best place to run this job is in the database)

Best regards
luben





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Josh Berkus


> Hm.  I defended that restriction earlier, but it now occurs to me to
> wonder if it doesn't create a dump/reload sequencing hazard.  I don't
> recall that pg_dump is aware of any particular constraints on the order
> in which it dumps privilege-grant commands.  If it gets this right,
> that's mostly luck, I suspect.

For that matter, it raises a serious practical obstacle to implementing
schema-specific default privs by script, if you have to first check
whether the user in question has create privs ... something we don't
make it at all easy to do.

For 9.4, I'm going to argue that the ALTER DEFAULT PRIVs feature has
completely failed in its goal to make database permissions easier to
manage.  Expect more detail on that after beta.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The missing pg_get_*def functions

2013-04-29 Thread Tom Lane

Noah Misch  writes:
> Note also that minor releases can readily fix bugs in C-language functions,
> but we have no infrastructure to update sql-language functions after initdb.
> That flexibility is unfortunate to lose, particularly for something that
> pg_dump depends on.

That alone would probably be sufficient reason why we would never allow
pg_dump to depend on any such thing (not that I see a compelling
argument for it to do so anyway...).

The long and the short of it here is that there isn't any very good
reason to migrate any of the existing pg_dump-side functionality into
server-side functions, and especially not server-side functions that
aren't in C.  One of the things that we frequently recommend when doing
upgrades is that you do the dump with the newer version's pg_dump, so
as to get the benefits of any bug fixes that are in it.  The more
dump functionality is on the server side, the less opportunity we have
to repair things that way.

It may be that the functions Joel proposes are worth having for other
tools to use, but I'm not in favor of making pg_dump use them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The missing pg_get_*def functions

2013-04-29 Thread Noah Misch

On Mon, Apr 29, 2013 at 10:16:22PM +0100, Joel Jacobson wrote:
> Existing:
> 
> pg_get_constraintdef
> pg_get_indexdef
> pg_get_viewdef
> pg_get_triggerdef
> pg_get_functiondef
> pg_get_ruledef
> 
> Missing:
> 
> pg_get_ts_templatedef
> pg_get_ts_parserdef
> pg_get_ts_configdef
> pg_get_ts_dictdef
> pg_get_databasedef
> pg_get_namespacedef
> pg_get_languagedef
> pg_get_conversiondef
> pg_get_castdef
> pg_get_amprocdef
> pg_get_operatordef
> pg_get_amopdef
> pg_get_opfamilydef
> pg_get_opclassdef
> pg_get_tabledef
> pg_get_sequencedef
> pg_get_typedef
> pg_get_attrdef
> 
> I'm planning to implement these. Is there any reason why the remaining
> functions cannot be written as SQL functions instead of C? The input data
> to the C-functions in pg_dump.c comes from pg_catalog anyway, so I thought
> it would be a lot easier just to construct the definitions in SQL where you
> have convenient access to the pg_catalog.

Note that while the sql procedural language is fair game, plpgsql currently is
not.  We install it by default, but the DBA is free to drop it.

Those existing functions give a mostly-SnapshotNow picture of their objects,
but an sql-language implementation would give a normally-snapshotted picture.
That status quo is perhaps more an implementation accident than a designed
behavior.  Before proliferating functions like this, we should pick a snapshot
policy and stick to it.  See the block comment at the top of pg_dump.c.

Note also that minor releases can readily fix bugs in C-language functions,
but we have no infrastructure to update sql-language functions after initdb.
That flexibility is unfortunate to lose, particularly for something that
pg_dump depends on.  Now, the right thing is probably to design a mechanism
for applying simple catalog updates in concert with a minor release.  In the
mean time, its absence puts the sql PL at a nontrivial disadvantage here.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Tom Lane

Noah Misch  writes:
> The particular restriction at hand, namely that a role have CREATE rights on a
> schema before assigning role-specific default privileges, seems like needless
> paternalism.  It would be akin to forbidding ALTER ROLE ... PASSWORD on a
> NOLOGIN role.  I'd support removing it when such a proposal arrives.

Hm.  I defended that restriction earlier, but it now occurs to me to
wonder if it doesn't create a dump/reload sequencing hazard.  I don't
recall that pg_dump is aware of any particular constraints on the order
in which it dumps privilege-grant commands.  If it gets this right,
that's mostly luck, I suspect.

> If
> anything, require that the user executing the ALTER DEFAULT PRIVILEGES, not
> the subject of the command, has CREATE rights on the schema.

That would be just as dangerous from this angle.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Noah Misch

On Mon, Apr 29, 2013 at 01:25:47PM -0400, Tom Lane wrote:
> Josh Berkus  writes:
> > On 04/29/2013 09:59 AM, Tom Lane wrote:
> >> As I pointed out to you last night, it does already say that.
> >> I think the problem here is that we're just throwing a generic
> >> permissions failure rather than identifying the particular permission
> >> needed.
> 
> > Yeah, a better error message would help a lot.  My first thought was
> > "WTF?  I'm the superuser, whaddya mean, 'permission denied'"?
> 
> Right.  I wonder if there's any good reason why we shouldn't extend
> aclerror() to, in all cases, add a DETAIL line along the lines of
> 
>   ERROR:  permission denied for schema web
>   DETAIL:  This operation requires role X to have privilege Y.
> 
> Is there any scenario where this'd be exposing too much info?

Can't think of one.  Seems safe and helpful.

The particular restriction at hand, namely that a role have CREATE rights on a
schema before assigning role-specific default privileges, seems like needless
paternalism.  It would be akin to forbidding ALTER ROLE ... PASSWORD on a
NOLOGIN role.  I'd support removing it when such a proposal arrives.  If
anything, require that the user executing the ALTER DEFAULT PRIVILEGES, not
the subject of the command, has CREATE rights on the schema.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add --throttle option to pgbench [patch 2]

2013-04-29 Thread Fabien COELHO




It does seem to me that we should Poissonize the throttle time, then
subtract the average overhead, rather than Poissonizing the difference.


After thinking again about Jeff's point and failing to sleep, I think that 
doing exactly that is better because:

 - it is "right"
 - the code is simpler and shorter
 - my transaction stuck sequence issue is not that big an issue anyway

Here is a patch to schedule transactions along Poisson-distributed events.
This patch replaces my previous proposal.

Note that there is no reference to the current time after the stochastic 
process is initiated. This is necessary, and mean that if transactions lag 
behind the throttle at some point they will try to catch up later. Neither 
a good nor a bad thing, mostly a feature.


--
Fabiendiff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index bc01f07..0142ed0 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * whether clients are throttled to a given rate, expressed as a delay in us.
+ * 0, the default means no throttling.
+ */
+int64		throttle = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -204,6 +210,8 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	int64		trigger;		/* previous/next throttling (us) */
+	bool		throttled;  /* whether current transaction was throttled */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -361,6 +369,9 @@ usage(void)
 		   "  -S   perform SELECT-only transactions\n"
 	 "  -t NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T NUM   duration of benchmark test in seconds\n"
+		   "  -H SPEC, --throttle SPEC\n"
+		   "   delay in second to throttle each client\n"
+		   "   sample specs: 0.025 40tps 25ms 25000us\n"
 		   "  -v   vacuum all four standard tables before tests\n"
 		   "\nCommon options:\n"
 		   "  -d print debugging output\n"
@@ -1027,7 +1038,7 @@ top:
 			}
 		}
 
-		if (commands[st->state]->type == SQL_COMMAND)
+		if (!st->throttled && commands[st->state]->type == SQL_COMMAND)
 		{
 			/*
 			 * Read and discard the query result; note this is not included in
@@ -1049,26 +1060,54 @@ top:
 			discard_response(st);
 		}
 
+		/* some stuff done at the end */
 		if (commands[st->state + 1] == NULL)
 		{
-			if (is_connect)
+			/* disconnect if required and needed */
+			if (is_connect && st->con)
 			{
 PQfinish(st->con);
 st->con = NULL;
 			}
 
-			++st->cnt;
-			if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
-return clientDone(st, true);	/* exit success */
+			/* update transaction counter once, and possibly end */
+			if (!st->throttled)
+			{
+++st->cnt;
+if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+	return clientDone(st, true);	/* exit success */
+			}
+
+			/* handle throttling once, as the last post-transaction stuff */
+			if (throttle && !st->throttled)
+			{
+/* compute delay to approximate a Poisson distribution
+ * 100 => 13.8 .. 0 multiplier
+ * if transactions are too slow or a given wait shorter than
+ * a transaction, the next transaction will start right away.
+ */
+int64 wait = (int64)
+	throttle * -log(getrand(thread, 1, 100)/100.0);
+st->trigger += wait;
+st->sleeping = 1;
+st->until = st->trigger;
+st->throttled = true;
+if (debug)
+	fprintf(stderr, "client %d throttling %d us\n",
+			st->id, (int) wait);
+return true;
+			}
 		}
 
 		/* increment state counter */
 		st->state++;
 		if (commands[st->state] == NULL)
 		{
+			/* reset */
 			st->state = 0;
 			st->use_file = (int) getrand(thread, 0, num_files - 1);
 			commands = sql_files[st->use_file];
+			st->throttled = false;
 		}
 	}
 
@@ -2086,6 +2125,7 @@ main(int argc, char **argv)
 		{"unlogged-tables", no_argument, &unlogged_tables, 1},
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_argument, NULL, 5},
+		{"throttle", required_argument, NULL, 'H'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -2152,7 +2192,7 @@ main(int argc, char **argv)
 	state = (CState *) pg_malloc(sizeof(CState));
 	memset(state, 0, sizeof(CState));
 
-	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:", long_options, &optindex)) != -1)
+	while ((c = getopt_long(argc, argv, "ih:nvp:dqSNc:j:Crs:t:T:U:lf:D:F:M:H:", long_options, &optindex)) != -1)
 	{
 		switch (c)
 		{
@@ -2307,6 +2347,26 @@ main(int argc, char **argv)
 	exit(1);
 }
 break;
+			case 'H':
+			{
+/* get a double from the beginning of option value */
+double throttle_value = atof(optarg);
+if (throttle_value <= 0.0)
+{
+	fprintf(stderr, "invalid throttle

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Josh Berkus

Tom,

I'm also seeing that the DEFAULT PRIVs I get seem to be dependant on my
login role, not my current role.  That is, if role "dba" has default
privs set, and role "josh", which is a member of "dba" does not, and
"josh" does "set role dba" before creating some tables, those tables do
NOT get dba's default privs.  At least, that's what happens in my testing.

Is that expected?  If so, that should certainly be documented.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Josh Berkus


>> what happens when an admin figures out that they can 'hack' the
>> system to do what they want by truncating that file?

That can't possibly be a serious objection.  DBAs who mess with DB files
are on their own, and should expect unpredictable behavior.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The missing pg_get_*def functions

2013-04-29 Thread Joel Jacobson

Existing:

pg_get_constraintdef
pg_get_indexdef
pg_get_viewdef
pg_get_triggerdef
pg_get_functiondef
pg_get_ruledef

Missing:

pg_get_ts_templatedef
pg_get_ts_parserdef
pg_get_ts_configdef
pg_get_ts_dictdef
pg_get_databasedef
pg_get_namespacedef
pg_get_languagedef
pg_get_conversiondef
pg_get_castdef
pg_get_amprocdef
pg_get_operatordef
pg_get_amopdef
pg_get_opfamilydef
pg_get_opclassdef
pg_get_tabledef
pg_get_sequencedef
pg_get_typedef
pg_get_attrdef

I'm planning to implement these. Is there any reason why the remaining
functions cannot be written as SQL functions instead of C? The input data
to the C-functions in pg_dump.c comes from pg_catalog anyway, so I thought
it would be a lot easier just to construct the definitions in SQL where you
have convenient access to the pg_catalog.

Re: [HACKERS] Proposal to add --single-row to psql

2013-04-29 Thread Jim Nasby


On 4/28/13 7:50 AM, Craig Ringer wrote:

I find it frustrating that I've never seen an @paraccel email address here and 
that few of the other vendors of highly customised Pg offshoots are 
contributing back. It's almost enough to make me like the GPL.


FWIW, I think there's a pretty large barrier to these folks contributing back. 
Would the community really want to add a bunch of hooks to support something 
like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have to change 
significant amounts of PG code, so much so that it's actually hard for them to 
stay current (which is why most of them just fork).

I do think this is a shame, but I'm not sure of any good way to fix it.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Kevin Grittner

Stephen Frost  wrote:

> what happens when an admin figures out that they can 'hack' the
> system to do what they want by truncating that file?

If they modified the heap files that way while the server was
running, the results would be somewhat unpredictable.  If they did
it while the server was stopped, starting the server and attempting
to access the matview would generate:

ERROR:  materialized view "matview_name" has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.

> Or we end up wanting to have that file be non-zero and considered
> 'empty' later, but we don't want pg_upgrade running around
> touching all of the existing files out there?

I didn't follow this one; could you restate it, please?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Analyzing bug 8049

2013-04-29 Thread Jim Nasby


On 4/28/13 7:00 PM, Tom Lane wrote:

Thoughts?  Anybody know of a counterexample to the idea that no plug-ins
call query_planner()?


I would assume that anyone writing anything that calls such a low-level 
function reads -hackers regularly and would easily be able to handle whatever 
changes to their code that the new callback parameter required. We would 
obviously want to send notice about this ahead-of-time in addition to sticking 
something in the release notes.

That said, I'd be rather surprised if anything actually calls planner functions 
directly.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Christopher Browne

On Mon, Apr 29, 2013 at 10:50 AM, Merlin Moncure  wrote:

> On Mon, Apr 29, 2013 at 9:25 AM, Atri Sharma  wrote:
> >>
> >> This is an interesting idea.  Historically I've always decomposed
> >> graphs into relational structures because that's the only practical
> >> way to query them.   Graphs are not currently able to be transported
> >> out of the database currently via JSON so one of the areas to focus
> >> your research will be how the client will consume the data.
> >> libpqtypes is one way to do it, but that will really restrict you
> >> audience so you'll probably need a rich set of functions present the
> >> internal data (just like hstore).
> >
> > I completely agree. Initially, I was thinking of exposing the data to
> > user via HStore. But now, after Robert's suggestions, I think it will
> > be better to have an alternate representation. JSON seems to be an
> > excellent idea for that.
>
> I don't agree with this; JSON is not really designed to store graphs.
> You will probably need a customized internal representation, just like
> hstore, that expresses a graph like structure.
>
> This is not a trivial project.
>

Not trivial, indeed.

I see there being two directions where a data type goes.

1.  We created JSON and XML types as ways of storing data that has a robust
validation system.

They're still, in a sense, just "plain old text", but it's "plain old text"
that the user can be certain satisfies the respective rules for
representations.

2.  Some types support special operations to allow the data to be queried
in novel ways.

That's NOT the case, at this point, for JSON or XML.

But it certainly IS the case for Jeff Davis' "range types", which expose
access to some new sorts of data validation and indexing.

It is true for the inet type, which behaves rather differently from our
other types.

It is true for the tsearch indexes, that enable interesting random access
within some large "blobs" of stored data.

I'm not sure quite what we *would* want as the merits of graph-related
types.

I suspect that the best answer is NOT one where a graph is represented as a
value in a table; that has the implication that modifying The Graph
requires altering a single tuple, and that seems likely to become a
horrible bottleneck.  I'm suspicious that using HSTORE leads down that
path, where you'll have a database that has a table with just one tuple in
it, and where it's nearly impossible to alter that tuple.

I'm having a hard time thinking about what it looks like to have a graph as
table except to effectively compose the graph as a set of nodes, one tuple
per node, and I'm not sure that a new data type has anything to contribute
to that.

The one place where I *could* see a special type having a contribution is
for there to be a data type that can contain an arbitrary number of links.
That means you have one tuple per node, and, instead of needing a tuple for
each link between nodes, you have one attribute indicating *all* the
links.  (And "interesting" is for that one attribute to be usable for
foreign key purposes.)  That has a hard time scaling in cases where nodes
are over-connected, which is, broadly speaking, an acceptable sort of
scenario.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Stephen Frost

* Kevin Grittner (kgri...@ymail.com) wrote:
> Many people weighed in on the need to differentiate between an
> empty matview and one which has not been populated.  Many have also
> weighed in on the benefits of unlogged matviews.

Sure, I think I did that- we should be able to distinguish between those
two cases and unlogged matviews are great.

> > I see absolutely no reason to change my opinion on this.  Keeping
> > relisscannable state in the form of is-the-file-of-nonzero-size
> > is something we WILL regret
> 
> ... or change in a subsequent major release.  I see no reason why
> using this technique now make it harder to do something else later.
> Could you elaborate on the technical challenges you see to doing
> so?

I've not followed this all that closely, to be honest, but I tend to
side with Tom on this, making PG depend on the file size for an
attribute of a relation is a bad idea.  For one thing, what happens when
an admin figures out that they can 'hack' the system to do what they
want by truncating that file?  Or we end up wanting to have that file be
non-zero and considered 'empty' later, but we don't want pg_upgrade
running around touching all of the existing files out there?

> I would guess that about half the use-cases for materialized views
> will stay with tables in spite of the added hassle, if they have to
> degrade performance by adding logging where they currently have
> none.

Life's tough.  There's quite a few things that I wish had made it into
9.3 which didn't.  One might also point out that a lot of individuals
may be, understandably, cautious about this new feature and not use it
in the first major rev it's released in anyway (I'm talking about
matviews entirely here..).  Basically, I don't believe we should be
acting like we've promised unlogged-matviews will be in 9.3 just because
it's been committed.

> > The way forward to unlogged matviews that behave the way Kevin
> > wants is to improve crash recovery so that we can update catalog
> > state after completing recovery, which is something there are
> > other reasons to want anyway.
> 
> That would certainly be for the best.

Then let's forgo having this specific feature in 9.3 and implement it
correctly for 9.4.

> The hysteria and FUD about using the currently-supported technique
> for unlogged tables to implement unlogged matviews are very
> discouraging.  

Perhaps I'm all wet here, but it's not obvious to me that what's done
for unlogged tables really equates to what's being done here.

> And the notion that we would release a matview
> feature which allowed false results (in the form of treating a
> never-populated matview as a legal empty matview) is truly scary to
> me.  If we can't tell the difference between those two things, I
> don't think we should be releasing the feature.

I agree that it's important to distinguish the two.  I'm not sure that
I've heard anyone explicitly say otherwise..

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Jim Nasby


On 4/29/13 2:20 PM, Florian Pflug wrote:

On Apr29, 2013, at 21:00 , Atri Sharma  wrote:

I think we find work arounds or make shifts at the moment if we need
to use graphs in our database in postgres. If we have a datatype
itself, with support for commonly used operations built inside the
type itself, that will greatly simplify user's tasks, and open up a
whole new avenue of applications for us, such as recommender systems,
social network analysis, or anything that can be done with graphs.


Usually though, you'd be interested a large graphs which include
information for lots of records (e.g., nodes are individual users,
or products, or whatever). A graph datatype is not well suited for
that, because it'd store each graph as a single value, and updating
the graph would mean rewriting that whole value. If you're e.g. doing
social network analysis, and each new edge between two users requires
you to pull the whole graph from disk, update it, and write it back,
you'll probably hit problems once you reach a few hundred users or
so… Which really isn't a lot for that kind of application.

I'd love to see more support for those kinds of queries in postgres,
(although WITH RECURSIVE already was a *huge* improvement in this
area!). But storing each graph as a graph type would do isn't the
way forward, IMHO.


My $0.02:

I believe it would be best to largely separate the questions of storage and 
access. Partly because of Florian's concern that you'd frequently want only one 
representation of the whole graph, but also because the actual storage 
interface does NOT have to be user friendly if we have a good access layer. In 
particular, if rows had a low overhead, we'd probably just store graphs that 
way. That's obviously not the case in PG, so is there some kind of hybrid 
approach we could use? Perhaps sections of a graph could be stored with one 
piece of MVCC overhead per section?

That's why I think separating access from storage is going to be very 
important; if we do that up-front, we can change the storage latter as we get 
real experience with this.

Second, we should consider how much the access layer should build on WITH 
RECURSIVE and the like. Being able to detect specific use patterns of CTE/WITH 
RECURSIVE seems like it could add a lot of value; but I also worry that it's 
way to magical to be practical.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Misa Simic

On Monday, April 29, 2013, Atri Sharma wrote:

> On Mon, Apr 29, 2013 at 10:12 PM, Misa Simic 
> >
> wrote:
> > Hi Atri,
> >
> > What is an example of custom internal representation and its JSON
> > representation (though and JSON and HStore represent its value as text)?
> >
> > I also think  that the key question is:  "what operations would you
> support
> > on this
> > data type?"
> >
> > Or what kind of problems it will solve? (what can't be solved now - or
> can
> > now - but new type will allow the better way...)
> >
> > Thanks,
> >
> > Misa
> >
> >
>
> Hi Misa,
>
> Thanks for thinking it through.
>
> I have not thought about it yet(I was going with the HStore
> representation till the moment, which I showed in my first mail in
> this thread) I believe that something on these lines could be done:
>
> Entity 1:
>
> Node: Node1
>
> Adjacency list: node2, node3, node4
>
> Entity 2:
>
> Node: Node 2
>
> Adjacency list: node1, node5
>
> Entity 3:
>
> Node: Node 3
>
> Adjacency list: node1, node4
>
> Adjacency list sets:
>
> "Node1"=>"Entity1","Node2"=>"Entity2","Node3"=>"Entity3"
>
> I mentioned the potential operations we could have in a previous
> mail.Specifically,
>
> I can think of the standard tasks, i.e. searching if two nodes are
> connected or not,adding new nodes and edges, traversing the adjacency
> lists of nodes.
>
> If we add support for weighted graphs, we can probably add support for
> some common graph algorithms, such as Djikstra's algorithm, Bellman
> Ford algorithm, a MST making algorithm, network flow algorithms.
>
> The main idea is to allow user to work with graphs pretty easily, and
> allow the user to use the data present in his database to make graphs
> and then process them.
>
> I think we find work arounds or make shifts at the moment if we need
> to use graphs in our database in postgres. If we have a datatype
> itself, with support for commonly used operations built inside the
> type itself, that will greatly simplify user's tasks, and open up a
> whole new avenue of applications for us, such as recommender systems,
> social network analysis, or anything that can be done with graphs.
>
>
  Hm...

Have you considered maybe ltree datatype?

To me all described sounds solveable on pure sql way ( + ltree datatype to
help with indexes and performance as materialised path to avoid recursive
query all the time...)

Though would be nice to see something new what would simplify the tasks...

Cheers,

Misa

Re: [HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Jim Nasby


On 4/29/13 1:08 PM, Fabien COELHO wrote:



While I don't understand the part about his laptop battery, I think that
there is a good use case for this.  If you are looking at latency
distributions or spikes, you probably want to see what they are like with a
load which is like the one you expect having, not the load which is the
highest possible.   Although for this use case you would almost surely be
using custom transaction files, not default ones, so I think you could just
use \sleep.  However, I don't know if there is an easy way to dynamically
adjust the sleep value by subtracting off the overhead time and randomizing
it a bit, like is done here.


Indeed, my thoughts:-) Having regularly (\sleep n) or uniformly distributed 
(\sleep :random_value) is not very realistic, and I would have to do some 
measures to find the right value for a target load.


+1 to being able to throttle to make latency measurements.

I'm also wondering if it would be useful to be able to set a latency target and 
have something adjust concurrency to see how well you can hit it. Certainly 
feature creep for the proposed patch; I only bring it up because there may be 
enough similarity to consider that use case at this time, even if we don't 
implement it yet.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining beta blockers

2013-04-29 Thread Kevin Grittner

Tom Lane  wrote:

> [ shrug... ]  You and Kevin essentially repeated your claims that
> the current implementation is OK; nobody else weighed in.

Many people weighed in on the need to differentiate between an
empty matview and one which has not been populated.  Many have also
weighed in on the benefits of unlogged matviews.

> I see absolutely no reason to change my opinion on this.  Keeping
> relisscannable state in the form of is-the-file-of-nonzero-size
> is something we WILL regret

... or change in a subsequent major release.  I see no reason why
using this technique now make it harder to do something else later.
Could you elaborate on the technical challenges you see to doing
so?

> Pollyanna-ish refusal to believe that is not an adequate reason
> for painting ourselves into a corner, especially not for a
> second-order feature like unlogged matviews.

I would guess that about half the use-cases for materialized views
will stay with tables in spite of the added hassle, if they have to
degrade performance by adding logging where they currently have
none.

> The way forward to unlogged matviews that behave the way Kevin
> wants is to improve crash recovery so that we can update catalog
> state after completing recovery, which is something there are
> other reasons to want anyway.

That would certainly be for the best.

> But it's far too late to do that in this cycle.

Yes it is.

> In the meantime I remain convinced that we're better off dropping
> the feature until we have an implementation that's not going to
> bite us in the rear in the future.

I haven't caught up with you on how it will do that.

> I also note that there are acknowledged-even-by-you bugs in the
> current implementation, which no patch has emerged for, so
> *something's* got to be done.  I'm done waiting for something to
> happen, and am going to go fix it in the way I think best.

Are you referring to the case where if you refresh a matview with
over 8GB and it winds up empty, that vacuum can make it look
invalid until the next REFRESH?  This is one of many ways that
could be fixed.

diff --git a/src/backend/commands/vacuumlazy.c 
b/src/backend/commands/vacuumlazy.c
index 8a1ffcf..b950b16 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -230,7 +230,13 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
      *
      * Don't even think about it unless we have a shot at releasing a goodly
      * number of pages.  Otherwise, the time taken isn't worth it.
+     *
+     * Leave a populated materialized view with at least one page.
      */
+    if (onerel->rd_rel->relkind == RELKIND_MATVIEW &&
+        vacrelstats->nonempty_pages == 0)
+        vacrelstats->nonempty_pages = 1;
+
     possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
     if (possibly_freeable > 0 &&
         (possibly_freeable >= REL_TRUNCATE_MINIMUM ||

The hysteria and FUD about using the currently-supported technique
for unlogged tables to implement unlogged matviews are very
discouraging.  And the notion that we would release a matview
feature which allowed false results (in the form of treating a
never-populated matview as a legal empty matview) is truly scary to
me.  If we can't tell the difference between those two things, I
don't think we should be releasing the feature.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Florian Pflug

On Apr29, 2013, at 21:00 , Atri Sharma  wrote:
> I think we find work arounds or make shifts at the moment if we need
> to use graphs in our database in postgres. If we have a datatype
> itself, with support for commonly used operations built inside the
> type itself, that will greatly simplify user's tasks, and open up a
> whole new avenue of applications for us, such as recommender systems,
> social network analysis, or anything that can be done with graphs.

Usually though, you'd be interested a large graphs which include
information for lots of records (e.g., nodes are individual users,
or products, or whatever). A graph datatype is not well suited for
that, because it'd store each graph as a single value, and updating
the graph would mean rewriting that whole value. If you're e.g. doing
social network analysis, and each new edge between two users requires
you to pull the whole graph from disk, update it, and write it back,
you'll probably hit problems once you reach a few hundred users or 
so… Which really isn't a lot for that kind of application.

I'd love to see more support for those kinds of queries in postgres,
(although WITH RECURSIVE already was a *huge* improvement in this
area!). But storing each graph as a graph type would do isn't the
way forward, IMHO.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Fabien COELHO




I'm having a hard time understanding the use-case for this feature.


Here is an example functional use case I had in mind.

Let us say I'm teaching a practice session about administrating 
replication. Students have a desktop computer on which they can install 
several instances or postgresql, or possibly use virtual machines. I'd 
like them to setup one server, put it under a continuous load, then create 
a first slave, then a second, and things like that. The thing I do not 
want is the poor desktop and its hard drive to be at maximum speed for the 
whole afternoon while doing the session, making it hard to do anything 
else on the host. So I want something both realistic (the database is 
under a load, the WAL is advancing, let us dump it, base backup it, 
replicate it, monitor it, update it, whatever...), but gentle all the 
same.


Using pgbench with --throttle basically provides the adjustable continuous 
load I need. I understand that this is not at all the intent for which it 
was developed.


Note that I will probably propose another patch to provide a heart beat 
while things are going on, but I thought that one patch at a time was 
enough.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Atri Sharma

On Mon, Apr 29, 2013 at 10:12 PM, Misa Simic  wrote:
> Hi Atri,
>
> What is an example of custom internal representation and its JSON
> representation (though and JSON and HStore represent its value as text)?
>
> I also think  that the key question is:  "what operations would you support
> on this
> data type?"
>
> Or what kind of problems it will solve? (what can't be solved now - or can
> now - but new type will allow the better way...)
>
> Thanks,
>
> Misa
>
>

Hi Misa,

Thanks for thinking it through.

I have not thought about it yet(I was going with the HStore
representation till the moment, which I showed in my first mail in
this thread) I believe that something on these lines could be done:

Entity 1:

Node: Node1

Adjacency list: node2, node3, node4

Entity 2:

Node: Node 2

Adjacency list: node1, node5

Entity 3:

Node: Node 3

Adjacency list: node1, node4

Adjacency list sets:

"Node1"=>"Entity1","Node2"=>"Entity2","Node3"=>"Entity3"

I mentioned the potential operations we could have in a previous
mail.Specifically,

I can think of the standard tasks, i.e. searching if two nodes are
connected or not,adding new nodes and edges, traversing the adjacency
lists of nodes.

If we add support for weighted graphs, we can probably add support for
some common graph algorithms, such as Djikstra's algorithm, Bellman
Ford algorithm, a MST making algorithm, network flow algorithms.

The main idea is to allow user to work with graphs pretty easily, and
allow the user to use the data present in his database to make graphs
and then process them.

I think we find work arounds or make shifts at the moment if we need
to use graphs in our database in postgres. If we have a datatype
itself, with support for commonly used operations built inside the
type itself, that will greatly simplify user's tasks, and open up a
whole new avenue of applications for us, such as recommender systems,
social network analysis, or anything that can be done with graphs.

Regards,

Atri
--
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing statistics problem related to vacuum truncation termination

2013-04-29 Thread Kevin Grittner

Kevin Grittner  wrote:

> After reviewing the threads and thinking about the various
> historical behaviors, I'm suggesting that we apply the attached,
> back-patched to 9.0, to fix the unintended changes from historical
> behavior related to lack of statistics generation in some cases
> where statistics were generated before
> b19e4250b45e91c9cbdd18d35ea6391ab5961c8d, and to generate them in
> some cases where they were historically suppressed.

Pushed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Fabien COELHO



Hello Jeff,


While I don't understand the part about his laptop battery, I think that
there is a good use case for this.  If you are looking at latency
distributions or spikes, you probably want to see what they are like with a
load which is like the one you expect having, not the load which is the
highest possible.   Although for this use case you would almost surely be
using custom transaction files, not default ones, so I think you could just
use \sleep.  However, I don't know if there is an easy way to dynamically
adjust the sleep value by subtracting off the overhead time and randomizing
it a bit, like is done here.


Indeed, my thoughts:-) Having regularly (\sleep n) or uniformly 
distributed (\sleep :random_value) is not very realistic, and I would have 
to do some measures to find the right value for a target load.



It does seem to me that we should Poissonize the throttle time, then
subtract the average overhead, rather than Poissonizing the difference.


I actually thought about doing it the way you suggested, because it was 
"right". However I did not do it, because if the Poisson gives, possibly 
quite frequently, a time below the transaction time, one ends up with an 
artificial sequence of stuck transactions, as a client cannot start the 
second transaction while the previous one is not finished, and this does 
not seem realistic. To really do that more cleanly, it would require 
distributing the events between clients, so having some kind of 
coordination between clients, which would really be another test 
application. Having an approximation of that seemed good enough for my 
purpose.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Josh Berkus


> Right.  I wonder if there's any good reason why we shouldn't extend
> aclerror() to, in all cases, add a DETAIL line along the lines of
> 
>   ERROR:  permission denied for schema web
>   DETAIL:  This operation requires role X to have privilege Y.
> 
> Is there any scenario where this'd be exposing too much info?

Not that I can think of.  The fact that role X doesn't have create on
schema Y isn't exactly privileged info.  Further, to make any use of
that information, you'd have to be able to SET ROLE X, in which case you
can just test for yourself if X has CREATE permission.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Fabien COELHO



Hello Tom,


I'm having a hard time understanding the use-case for this feature.
Surely, if pgbench is throttling its transaction rate, you're going
to just end up measuring the throttle rate.


Indeed, I do not want to measure the tps if I throttle it.

The point is to generate a continuous but not necessarily maximal load, 
and to test other things under such load such as possiby cascading 
replication, failover, various dump strategies, whatever.



I wanted that to test the impact of various load levels, and for
functionnal tests on my laptop which should not drain the battery.


How does causing a test to take longer result in reduced battery drain?


If I test a replication setup on my laptop at maximum load, I can see the 
battery draining in a few seconds by looking at the effect on the time 
left widget. This remark is mostly for functional tests, not for 
performance test.


If I want to test the maximum load of a setup, obviously I will not do 
that on my laptop, and I will not use --throttle...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Tom Lane

Josh Berkus  writes:
> On 04/29/2013 09:59 AM, Tom Lane wrote:
>> As I pointed out to you last night, it does already say that.
>> I think the problem here is that we're just throwing a generic
>> permissions failure rather than identifying the particular permission
>> needed.

> Yeah, a better error message would help a lot.  My first thought was
> "WTF?  I'm the superuser, whaddya mean, 'permission denied'"?

Right.  I wonder if there's any good reason why we shouldn't extend
aclerror() to, in all cases, add a DETAIL line along the lines of

ERROR:  permission denied for schema web
DETAIL:  This operation requires role X to have privilege Y.

Is there any scenario where this'd be exposing too much info?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Josh Berkus

On 04/29/2013 09:59 AM, Tom Lane wrote:
> Josh Berkus  writes:
>> This moves the general brokenness of this feature from a bug to (a) a
>> documentation issue and (b) unusably fussy.  For (a), I think we need
>> the following line in the docs:
> 
>> DEFAULT PRIVILEGES may only be granted to a ROLE which already has
>> CREATE permission on the specified schema.
> 
> As I pointed out to you last night, it does already say that.
> I think the problem here is that we're just throwing a generic
> permissions failure rather than identifying the particular permission
> needed.

Yeah, a better error message would help a lot.  My first thought was
"WTF?  I'm the superuser, whaddya mean, 'permission denied'"?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Tom Lane

Josh Berkus  writes:
> This moves the general brokenness of this feature from a bug to (a) a
> documentation issue and (b) unusably fussy.  For (a), I think we need
> the following line in the docs:

> DEFAULT PRIVILEGES may only be granted to a ROLE which already has
> CREATE permission on the specified schema.

As I pointed out to you last night, it does already say that.
I think the problem here is that we're just throwing a generic
permissions failure rather than identifying the particular permission
needed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Jeff Janes

On Mon, Apr 29, 2013 at 8:27 AM, Tom Lane  wrote:

> Fabien COELHO  writes:
> > Please find attached a small patch to add a throttling capability to
> > pgbench, that is pgbench aims at a given client transaction rate instead
> > of maximizing the load. The throttling relies on Poisson-distributed
> > delays inserted after each transaction.
>
> I'm having a hard time understanding the use-case for this feature.
> Surely, if pgbench is throttling its transaction rate, you're going
> to just end up measuring the throttle rate.
>

While I don't understand the part about his laptop battery, I think that
there is a good use case for this.  If you are looking at latency
distributions or spikes, you probably want to see what they are like with a
load which is like the one you expect having, not the load which is the
highest possible.   Although for this use case you would almost surely be
using custom transaction files, not default ones, so I think you could just
use \sleep.  However, I don't know if there is an easy way to dynamically
adjust the sleep value by subtracting off the overhead time and randomizing
it a bit, like is done here.

It does seem to me that we should Poissonize the throttle time, then
subtract the average overhead, rather than Poissonizing the difference.

Cheers,

Jeff

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Josh Berkus


> rhaas=# create user bob;
> CREATE ROLE
> rhaas=# create schema we_like_bob;
> CREATE SCHEMA
> rhaas=# alter default privileges for role bob in schema we_like_bob
> grant select on tables to bob;
> ERROR:  permission denied for schema we_like_bob
> rhaas=# grant create on schema we_like_bob to bob;
> GRANT
> rhaas=# alter default privileges for role bob in schema we_like_bob
> grant select on tables to bob;
> ALTER DEFAULT PRIVILEGES

H.  Must have got something tangled up there; starting over with a
clean database and new users I got it to work.  I'll see if I can
reproduce the issue I'm getting on my production schema.

This moves the general brokenness of this feature from a bug to (a) a
documentation issue and (b) unusably fussy.  For (a), I think we need
the following line in the docs:

DEFAULT PRIVILEGES may only be granted to a ROLE which already has
CREATE permission on the specified schema.

For (b), I'll take it up in the 9.4 dev cycle.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Misa Simic

Hi Merlin,

" Graphs are not currently able to be transported
out of the database currently via JSON"

What does it mean?

(I probably dont understand graphs well - but from my point of view - any
data can be transported out of DB via JSON)

Thanks,

Misa


2013/4/29 Merlin Moncure 

> On Mon, Apr 29, 2013 at 12:55 AM, Atri Sharma  wrote:
> >> It's probably pretty easy to add this, but I think the question is
> >> what would make it better than storing the same representation in a
> >> text field.
> >
> > I completely agree. The main point in making a new datatype would be
> > to add support for operations that are normally done with graphs.
> >
> >
> >>Obviously you get validation that the input is in the
> >> correct format, but you could do that with a CHECK constraint, too, or
> >> otherwise handle it in the application.  So I think the really
> >> interesting question is: what operations would you support on this
> >> data type?
> >
> > I can think of the standard tasks, i.e. searching if two nodes are
> > connected or not,adding new nodes and edges, traversing the adjacency
> > lists of nodes.
> >
> > If we add support for weighted graphs, we can probably add support for
> > some common graph algorithms, such as Djikstra's algorithm, Bellman
> > Ford algorithm, a MST making algorithm, network flow algorithms.
> >
> > The main idea is to allow user to work with graphs pretty easily, and
> > allow the user to use the data present in his database to make graphs
> > and then process them.
> >
> >> One of the problems you're likely to run into if you store the whole
> >> graph as a single object is that it may make many of the things you
> >> want to do with it not very efficient.
> >
> > Yes, I agree. On further thought, I believe it would be more of a pain
> > if we stick to representing the whole thing as one.Rather,making
> > multiple components will be more flexible and modular, and allow us to
> > modify different components of the same graph without modifying or
> > interfering with other components of the graph.
> >
> > I will think of a new design. I am still thinking of using HStore to
> > store adjacency lists. This should have good performance for access of
> > lists and similar tasks, IMO.
>
> This is an interesting idea.  Historically I've always decomposed
> graphs into relational structures because that's the only practical
> way to query them.   Graphs are not currently able to be transported
> out of the database currently via JSON so one of the areas to focus
> your research will be how the client will consume the data.
> libpqtypes is one way to do it, but that will really restrict you
> audience so you'll probably need a rich set of functions present the
> internal data (just like hstore).
>
> Another area to focus research will be on searchability: how to use
> GIST/GIN indexes to pull data out via an internal query string. An
> overview of the current GIST based type implementations (like ltree)
> couldn't hurt.
>
> merlin
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Misa Simic

Hi Atri,

What is an example of custom internal representation and its JSON
representation (though and JSON and HStore represent its value as text)?

I also think  that the key question is:  "what operations would you support
on this
data type?"

Or what kind of problems it will solve? (what can't be solved now - or can
now - but new type will allow the better way...)

Thanks,

Misa


2013/4/29 Atri Sharma 

> >
> > I don't agree with this; JSON is not really designed to store graphs.
> > You will probably need a customized internal representation, just like
> > hstore, that expresses a graph like structure.
>
> Yes, we will have a custom internal representation. I was thinking of
> ways to export the graph into user parsable type, hence JSON.
>
>
>
> Regards,
>
> Atri
>
>
> --
> Regards,
>
> Atri
> l'apprenant
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Tom Lane

Fabien COELHO  writes:
> Please find attached a small patch to add a throttling capability to 
> pgbench, that is pgbench aims at a given client transaction rate instead 
> of maximizing the load. The throttling relies on Poisson-distributed 
> delays inserted after each transaction.

I'm having a hard time understanding the use-case for this feature.
Surely, if pgbench is throttling its transaction rate, you're going
to just end up measuring the throttle rate.

> I wanted that to test the impact of various load levels, and for 
> functionnal tests on my laptop which should not drain the battery.

How does causing a test to take longer result in reduced battery drain?
You still need the same number of transactions if you want an honest
test, so it seems to me the machine would have to be on longer and thus
you'd eat *more* battery to get an equivalently trustworthy result.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Atri Sharma

>
> I don't agree with this; JSON is not really designed to store graphs.
> You will probably need a customized internal representation, just like
> hstore, that expresses a graph like structure.

Yes, we will have a custom internal representation. I was thinking of
ways to export the graph into user parsable type, hence JSON.



Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Merlin Moncure

On Mon, Apr 29, 2013 at 9:25 AM, Atri Sharma  wrote:
>>
>> This is an interesting idea.  Historically I've always decomposed
>> graphs into relational structures because that's the only practical
>> way to query them.   Graphs are not currently able to be transported
>> out of the database currently via JSON so one of the areas to focus
>> your research will be how the client will consume the data.
>> libpqtypes is one way to do it, but that will really restrict you
>> audience so you'll probably need a rich set of functions present the
>> internal data (just like hstore).
>
> I completely agree. Initially, I was thinking of exposing the data to
> user via HStore. But now, after Robert's suggestions, I think it will
> be better to have an alternate representation. JSON seems to be an
> excellent idea for that.

I don't agree with this; JSON is not really designed to store graphs.
You will probably need a customized internal representation, just like
hstore, that expresses a graph like structure.

This is not a trivial project.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Atri Sharma

>
> This is an interesting idea.  Historically I've always decomposed
> graphs into relational structures because that's the only practical
> way to query them.   Graphs are not currently able to be transported
> out of the database currently via JSON so one of the areas to focus
> your research will be how the client will consume the data.
> libpqtypes is one way to do it, but that will really restrict you
> audience so you'll probably need a rich set of functions present the
> internal data (just like hstore).

I completely agree. Initially, I was thinking of exposing the data to
user via HStore. But now, after Robert's suggestions, I think it will
be better to have an alternate representation. JSON seems to be an
excellent idea for that.

I am thinking of having the functions for working on the graph present
inside the data type itself.


> Another area to focus research will be on searchability: how to use
> GIST/GIN indexes to pull data out via an internal query string. An
> overview of the current GIST based type implementations (like ltree)

I will definitely research that. Actually, I have never looked at
GIST/GIN code before, so I have no idea how they can be used here.

Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Functional dependencies and GROUP BY - for subqueries

2013-04-29 Thread Tom Lane

Ashutosh Bapat  writes:
> Is there any reason why do we want to check the functional dependencies at
> the time of parsing and not after rewrite? Obviously, by doing so, we will
> allow creation of certain views which will start throwing errors after the
> underlying table changes the primary key. Is it mandatory that we throw
> "functional dependency" related errors at the time of creation of views?

>From a usability standpoint, I would think so.  And really the only
excuse for the functional-dependency feature to exist at all is
usability; it adds nothing you couldn't do without it.

If we wanted to do something like this, I think the clean way to do it
would be to invent a notion of unique/not-null/pkey constraints on
views, so that the creator of a view could declaratively say that he
wants such a property exposed.  That is, the example would become
something like

create table t1 (id int primary key, ... other stuff ...);
create view v1 as select * from t1;
alter view v1 add primary key(id);
create view v2 as select * from v1 group by id;

The pkey constraint on v1 is just a catalog entry with a dependency on
t1's pkey constraint; there's no actual index there.  But now, v2 can
be built with a dependency on v1's pkey, not t1's, and the action-at-
a-distance problem goes away.  For example, a "CREATE OR REPLACE v1"
command could check that the new view definition still provides
something for v1's pkey to depend on, and throw error or not without any
need to examine the contents of other views.  Dropping various elements
of this schema would work unsurprisingly, too.

This would, of course, require a significant chunk of new code, and
personally I do not think the feature would be worth it.  But it
would be a clean and usable design.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Graph datatype addition

2013-04-29 Thread Merlin Moncure

On Mon, Apr 29, 2013 at 12:55 AM, Atri Sharma  wrote:
>> It's probably pretty easy to add this, but I think the question is
>> what would make it better than storing the same representation in a
>> text field.
>
> I completely agree. The main point in making a new datatype would be
> to add support for operations that are normally done with graphs.
>
>
>>Obviously you get validation that the input is in the
>> correct format, but you could do that with a CHECK constraint, too, or
>> otherwise handle it in the application.  So I think the really
>> interesting question is: what operations would you support on this
>> data type?
>
> I can think of the standard tasks, i.e. searching if two nodes are
> connected or not,adding new nodes and edges, traversing the adjacency
> lists of nodes.
>
> If we add support for weighted graphs, we can probably add support for
> some common graph algorithms, such as Djikstra's algorithm, Bellman
> Ford algorithm, a MST making algorithm, network flow algorithms.
>
> The main idea is to allow user to work with graphs pretty easily, and
> allow the user to use the data present in his database to make graphs
> and then process them.
>
>> One of the problems you're likely to run into if you store the whole
>> graph as a single object is that it may make many of the things you
>> want to do with it not very efficient.
>
> Yes, I agree. On further thought, I believe it would be more of a pain
> if we stick to representing the whole thing as one.Rather,making
> multiple components will be more flexible and modular, and allow us to
> modify different components of the same graph without modifying or
> interfering with other components of the graph.
>
> I will think of a new design. I am still thinking of using HStore to
> store adjacency lists. This should have good performance for access of
> lists and similar tasks, IMO.

This is an interesting idea.  Historically I've always decomposed
graphs into relational structures because that's the only practical
way to query them.   Graphs are not currently able to be transported
out of the database currently via JSON so one of the areas to focus
your research will be how the client will consume the data.
libpqtypes is one way to do it, but that will really restrict you
audience so you'll probably need a rich set of functions present the
internal data (just like hstore).

Another area to focus research will be on searchability: how to use
GIST/GIN indexes to pull data out via an internal query string. An
overview of the current GIST based type implementations (like ltree)
couldn't hurt.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] add --throttle option to pgbench

2013-04-29 Thread Fabien COELHO



Hello,

Please find attached a small patch to add a throttling capability to 
pgbench, that is pgbench aims at a given client transaction rate instead 
of maximizing the load. The throttling relies on Poisson-distributed 
delays inserted after each transaction.


I wanted that to test the impact of various load levels, and for 
functionnal tests on my laptop which should not drain the battery.


 sh> ./pgbench -T 10 -c 2 --throttle 10tps test
 starting vacuum...end.
 transaction type: TPC-B (sort of)
 scaling factor: 1
 query mode: simple
 number of clients: 2
 number of threads: 1
 duration: 10 s
 number of transactions actually processed: 214
 tps = 21.054216 (including connections establishing)
 tps = 21.071253 (excluding connections establishing)

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index bc01f07..c11aa26 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -137,6 +137,12 @@ int			unlogged_tables = 0;
 double		sample_rate = 0.0;
 
 /*
+ * whether clients are throttled to a given rate, expressed as a delay in us.
+ * 0, the default means no throttling.
+ */
+int64		throttle = 0;
+
+/*
  * tablespace selection
  */
 char	   *tablespace = NULL;
@@ -204,6 +210,8 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring transaction latencies */
 	instr_time	stmt_begin;		/* used for measuring statement latencies */
+	instr_time	txn_time;		/* cumulated transaction time for throttling */
+	bool		throttled;  /* whether current transaction was throttled */
 	int			use_file;		/* index in sql_files for this client */
 	bool		prepared[MAX_FILES];
 } CState;
@@ -361,6 +369,9 @@ usage(void)
 		   "  -S   perform SELECT-only transactions\n"
 	 "  -t NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T NUM   duration of benchmark test in seconds\n"
+		   "  -H SPEC, --throttle SPEC\n"
+		   "   delay in second to throttle each client\n"
+		   "   sample specs: 0.025 40tps 25ms 25000us\n"
 		   "  -v   vacuum all four standard tables before tests\n"
 		   "\nCommon options:\n"
 		   "  -d print debugging output\n"
@@ -1027,7 +1038,7 @@ top:
 			}
 		}
 
-		if (commands[st->state]->type == SQL_COMMAND)
+		if (!st->throttled && commands[st->state]->type == SQL_COMMAND)
 		{
 			/*
 			 * Read and discard the query result; note this is not included in
@@ -1049,26 +1060,64 @@ top:
 			discard_response(st);
 		}
 
+		/* some stuff done at the end */
 		if (commands[st->state + 1] == NULL)
 		{
-			if (is_connect)
+			/* disconnect if required and needed */
+			if (is_connect && st->con)
 			{
 PQfinish(st->con);
 st->con = NULL;
 			}
 
-			++st->cnt;
-			if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
-return clientDone(st, true);	/* exit success */
+			/* update transaction counter once, and possibly end */
+			if (!st->throttled)
+			{
+++st->cnt;
+if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
+	return clientDone(st, true);	/* exit success */
+			}
+
+			/* handle throttling once, as the last post-transaction stuff */
+			if (throttle && !st->throttled)
+			{
+instr_time now, run;
+int64 avg_txn_time;
+st->throttled = true;
+INSTR_TIME_SET_CURRENT(now);
+run = now;
+INSTR_TIME_SUBTRACT(run, st->txn_begin);
+INSTR_TIME_ADD(st->txn_time, run);
+avg_txn_time = INSTR_TIME_GET_MICROSEC(st->txn_time) / st->cnt;
+if (avg_txn_time < throttle)
+{
+	/* compute delay to approximate a Poisson distribution
+	 * based on the client's current transaction mean time
+	 * 100 => 13.8 .. 0 multiplier
+	 */
+	int64 wait = (int64)
+		((throttle - avg_txn_time) *
+		 -log(getrand(thread, 1, 100)/100.0));
+	if (debug)
+		fprintf(stderr, "client %d throttling %d us\n",
+st->id, (int) wait);
+	st->sleeping = 1;
+	st->until = INSTR_TIME_GET_MICROSEC(now) + wait;
+	return true;
+}
+/* else WARNING transactions too long, cannot throttle... */
+			}
 		}
 
 		/* increment state counter */
 		st->state++;
 		if (commands[st->state] == NULL)
 		{
+			/* reset */
 			st->state = 0;
 			st->use_file = (int) getrand(thread, 0, num_files - 1);
 			commands = sql_files[st->use_file];
+			st->throttled = false;
 		}
 	}
 
@@ -1087,8 +1136,8 @@ top:
 		INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
 	}
 
-	/* Record transaction start time if logging is enabled */
-	if (logfile && st->state == 0)
+	/* Record transaction start time if logging or throttling is enabled */
+	if ((logfile || throttle) && st->state == 0)
 		INSTR_TIME_SET_CURRENT(st->txn_begin);
 
 	/* Record statement start time if per-command latencies are requested */
@@ -2086,6 +2135,7 @@ main(int argc, char **argv)
 		{"unlogged-tables", no_argument, &unlogged_tables, 1},
 		{"sampling-rate", required_argument, NULL, 4},
 		{"aggregate-interval", required_ar

Re: [HACKERS] ALTER DEFAULT PRIVILEGES FOR ROLE is broken

2013-04-29 Thread Robert Haas

On Sun, Apr 28, 2013 at 9:39 PM, Josh Berkus  wrote:
>> The fine manual notes that the target role has to already have CREATE
>> privileges on the target schema --- maybe that's what's biting you in
>> this case?
>
> Nope.  That was the first thing I thought of.  It really is that the
> target role must *own* the schema.  So clearly a bug.

wfm.

rhaas=# create user bob;
CREATE ROLE
rhaas=# create schema we_like_bob;
CREATE SCHEMA
rhaas=# alter default privileges for role bob in schema we_like_bob
grant select on tables to bob;
ERROR:  permission denied for schema we_like_bob
rhaas=# grant create on schema we_like_bob to bob;
GRANT
rhaas=# alter default privileges for role bob in schema we_like_bob
grant select on tables to bob;
ALTER DEFAULT PRIVILEGES

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

54 matches

Mail list logo