Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-03-24 Thread Gianni Ciolli
On Thu, Mar 22, 2012 at 03:02:45PM -0400, Tom Lane wrote:
 It's even less clear about what the semantics are in multi-key
 cases.  Right offhand I would say that multi-key cases are
 nonsensical and should be forbidden outright, because there is no
 way to figure out which collections of elements of different arrays
 should be considered to be a referencing item.

Currently multi-column keys with more than one EACH column are
unsupported, mainly because it's unclear how they should work (and I
agree that they might not work at all).

 Could we see a specification of what the referencing semantics are
 intended to be, please?

You are right, the discussion has never been put together in a single
place, as it should have.

Please find below an updated version of the specification, which Marco
and I put together from the discussion in this list, and taking into
account the changes happened in the review phase. Some comments have
also been added to explain why some choices have been forbidden.

Best regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

---8--8--8--8--8--8--8--8--8---

ON (DELETE | UPDATE) actions for EACH foreign keys
==

-- --- ---
  |ON |ON |
Action|  DELETE   |  UPDATE   |
-- --- ---
CASCADE   |Row| Forbidden |
SET NULL  |Row|Row|
SET DEFAULT   |Row|Row|
EACH CASCADE  |  Element  |  Element  |
EACH SET NULL |  Element  |  Element  |
EACH SET DEFAULT  | Forbidden | Forbidden |
NO ACTION | - | - |
RESTRICT  | - | - |
-- - -

Example 1. Table C references table B via a (non-array) foreign key.

Example 2. The referencing table A is constructed as GROUP BY of table
C in Example 1. There is an EACH foreign key on A which references B,
representing the same relationship as the foreign key in Example 1.

Remark 3. Examples 1 and 2 are related, because they represent the
same model; in making choices about a certain action on Example 2 we
will considering its relationship with Example 1.

Example 4. Assume that the FK in Example 1 has a ON DELETE CASCADE
action.  Deleting one row on table B will delete all the referencing
rows in table A.  The state that we get after the DELETE is the same
obtained by Example 2 with the ON DELETE EACH CASCADE action after
removing the same row.

Example 4 suggests to associate the Element behaviour to the ON
DELETE EACH CASCADE action.

The user can choose between two different options for a CASCADE-style
action when a referenced row is deleted; both of them have use cases,
as the following Example 5 shows.

Example 5. If you remove a vertex from a polygon (represented as an
array of vertices), you can either destroy the polygon (ON DELETE
CASCADE) or transform it into a polygon with less vertices (ON DELETE
EACH CASCADE).

ON UPDATE SET NULL has its own purpose as a different behaviour than
ON UPDATE EACH SET NULL; again, both options are provided to the user,
essentially like with ON DELETE CASCADE and ON DELETE EACH CASCADE.

ON (UPDATE | DELETE) EACH SET DEFAULT is forbidden, because table A
does not carry a default value for an array element. In theory the
default value could be retrieved from the referenced table B, but that
would be unusual and in any case different from the corresponding case
of Example 1 with ON (UPDATE | DELETE) SET DEFAULT.

ON UPDATE CASCADE is forbidden because, as far as we can see, the only
meaningful action to propagate updated values is ON UPDATE EACH
CASCADE.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Weak-memory specific problem in ResetLatch/WaitLatch (follow-up analysis)

2012-03-24 Thread Michael Tautschnig
Hi again,

[...]
 
 However, your example is enough unlike the actual code that the
 conclusion you state following the word clearly isn't actually clear
 to me.  According to latch.h, the correct method of using a latch is
 like this:
 
  * for (;;)
  * {
  * ResetLatch();
  * if (work to do)
  * Do Stuff();
  * WaitLatch();
  * }
 
 Meanwhile, anyone who is creating additional work to do should add the
 work to the queue and then set the latch.

When writing the above statement, including the clearly, we were possibly too
much thinking of the above usage hint, which just uses ResetLatch and WaitLatch,
and not considering that SetLatch is to be part of Do Stuff().

So here are once again our version, and the more properly translated one, this
time including SetLatch (in line 16).

Our version: In PostgreSQL-function terms:

 1 #define WORKERS 2  1 #define WORKERS 2
 2 volatile _Bool latch[WORKERS]; 2 volatile _Bool latch[WORKERS];
 3 volatile _Bool flag[WORKERS];  3 volatile _Bool flag[WORKERS];
 44
 5 void worker(int i) 5 void worker(int i)
 6 {  6 {
 7   while(!latch[i]);7   WaitLatch(i);
 8   for(;;)  8   for(;;)
 9   {9   {
10assert(!latch[i] || flag[i]);  10assert(!latch[i] || flag[i]);
11latch[i] = 0;  11ResetLatch(i);
12if(flag[i])12if(flag[i])
13{  13{
14  flag[i] = 0; 14  flag[i] = 0;
15  flag[i+1 % WORKERS] = 1; 15  flag[i+1 % WORKERS] = 1;
16  latch[i+1 % WORKERS] = 1;16  SetLatch(i+1 % WORKERS);
17}  17}
18   18
19while(!latch[i]);  19WaitLatch(i);
20  }20  }
21 } 21 }

 
 So it seems to me that we could potentially fix this by inserting
 barriers at the end of ResetLatch and at the beginning of SetLatch and
 WaitLatch.  Then the latch has to get reset before we check whether
 there's work to do; and we've got to finish checking for work before
 we again try to wait for the latch.  Similarly, any work that was in
 progress before SetLatch was called will be forced to be committed to
 memory before SetLatch does anything else.  Adding that many barriers
 might not be very good for performance but it seems OK from a
 correctness point of view, unless I am missing something, which is
 definitely possible.  I'd appreciate any thoughts you have on this, as
 this is clearly subtle and tricky to get exactly right.
 

So we had suggested the following bugfixes:

 
  In [3] it was suggested to fix the problem by placing a barrier in 
  ResetLatch,
  which corresponds to placing it between lines 11 and 12 in the code above.  
  This
  amounts to placing a barrier between the two reads (lines 7/19 and 12, i.e.,
  between WaitLatch and the if(flag[1]) ) of Worker 1.
  
  Placing a sync (i.e., the strongest Power barrier) accordingly would, 
  however,
  still be insufficient for the second problem, as it would only fix the
  reordering of read-read pairs by Worker 1 and the store atomicity issue from
  Worker 0. But the writes on Worker 0 could still be reordered (problem 
  number
  2). One possible fix consists of placing a sync between the two writes on 
  Worker
  0, and an address dependency between the two reads on Worker 1. Clearly,
  however, these are changes that cannot any longer be hidden behind the
  ResetLatch/WaitLatch interface, but rather go in the code using these.
 
 

Here, the two writes on Worker 0 corresponds to lines 15 and 16. And indeed
line 16 is exactly the call to SetLatch. For solving problem 1, the mp idiom,
the following options are possible (in all cases stronger synchronisation
primitives may be used, i.e., the strongest Power barrier, sync, may be used, or
lwsync may be used instead of an address dependency):

1. An lwsync at the beginning of SetLatch, and lwsync in ResetLatch (preferably
after the write).
2. An lwsync at the beginning of SetLatch, and an address dependency in
ResetLatch.

To address the second problem, the lb idiom, an address dependency has to be put
either in WaitLatch or SetLatch.

To fix both problems, the performance-wise cheapest option would thus be placing
an address dependency in ResetLatch and an lwsync in SetLatch. For practical
reasons, however, placing an lwsync in both places (at the beginning of SetLatch
and after the write in ResetLatch) might be preferable, as address dependencies
may be optimised away by the C compiler or require inline assembly in a form not
as easy to factor out as lwsync, plus the interface of 

Re: [HACKERS] Fix PL/Python metadata when there is no result

2012-03-24 Thread Peter Eisentraut
On ons, 2012-03-07 at 17:14 -0500, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On ons, 2012-03-07 at 16:49 -0500, Tom Lane wrote:
  Still, it seems rather arbitrary to say that the row count property is
  the thing to test for that purpose and no other is.  Why not return None
  for any property that's not sensible?
 
  Hmm, above you said you were in favor of throwing an error rather than
  returning None?
 
 I said it was a reasonable alternative, not that it was the only one
 we should consider.  The behavior of .nrows() might be accidental,
 but perhaps it is a preferable model to adopt.

It turns out I was mistaken about the .nrows() behavior.  It returns 0
even for utility commands, because the value comes straight from
SPI_processed.  But SPI_processed is a C variable, which can't have a
not applicable value, so that doesn't necessarily mean other languages
can't handle this differently.

After pondering this for several days now I still think the best
approach is to change .nrows() to return None for utility commands and
have the other metadata functions throw exceptions.  Then the
programming style would be if there are rows, give me metadata about
them.

The alternative would be to introduce another function has_rows or
something, but then how would that be more intuitive than saying
nrows() is not None?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsoc2012 Idea --- Social Network database schema

2012-03-24 Thread Joshua Berkus
Qi,

Yeah, I can see that.  That's a sign that you had a good idea for a project, 
actually: your idea is interesting enough that people want to debate it.  Make 
a proposal on Monday and our potential mentors will help you refine the idea.

- Original Message -
 
 
 
 
  Date: Thu, 22 Mar 2012 13:17:01 -0400
  Subject: Re: [HACKERS] Gsoc2012 Idea --- Social Network database
  schema
  From: cbbro...@gmail.com
  To: kevin.gritt...@wicourts.gov
  CC: pgsql-hackers@postgresql.org
  
  On Thu, Mar 22, 2012 at 12:38 PM, Kevin Grittner
  kevin.gritt...@wicourts.gov wrote:
   Tom Lane t...@sss.pgh.pa.us wrote:
   Robert Haas robertmh...@gmail.com writes:
   Well, the standard syntax apparently aims to reduce the number
   of
   returned rows, which ORDER BY does not. Maybe you could do it
   with ORDER BY .. LIMIT, but the idea here I think is that we'd
   like to sample the table without reading all of it first, so
   that
   seems to miss the point.
   
   I think actually the traditional locution is more like
 ! ;  WHERE random()  constant
   where the constant is the fraction of the table you want. And
   yeah, the presumption is that you'd like it to not actually read
   every row. (Though unless the sampling density is quite a bit
   less than 1 row per page, it's not clear how much you're really
   going to win.)
   
   It's all going to depend on the use cases, which I don't think
   I've
   heard described very well yet.
   
   I've had to pick random rows from, for example, a table of
   disbursements to support a financial audit. In those cases it has
   been the sample size that mattered, and order didn't. One
   interesting twist there is that for some of these financial
   audits
   they wanted the probability of a row being selected to be
   proportional ! to the dollar amount of the disbursement. I don't
   t hink you can do this without a first pass across the whole data
   set.
  
  This one was commonly called Dollar Unit Sampling, though the
  terminology has gradually gotten internationalized.
  http://www.dummies.com/how-to/content/how-does-monetary-unit-sampling-work.html
  
  What the article doesn't mention is that some particularly large
  items
  might wind up covering multiple samples. In the example, they're
  looking for a sample every $3125 down the list. If there was a
  single
  transaction valued at $3, that (roughly) covers 10 of the
  desired
  samples.
  
  It isn't possible to do this without scanning across the entire
  table.
  
  If you want repeatability, you probably want to instantiate a copy
  of
  enough information to indicate the ordering chosen. That's probably
  something that needs to be captured as part of the work of the
  audit,
  so n! ot only does it need to involve a pass across the data, it
  probably requires capturing a fair bit of data for posterity.
  --
  When confronted by a difficult problem, solve it by reducing it to
  the
  question, How would the Lone Ranger handle this?
 
 
 
 
 
 
 The discussion till now has gone far beyond my understanding.
 Could anyone explain briefly what is the idea for now?
 The designing detail for me is still unfamiliar. I can only take time
 to understand while possible after being selected and put time on it
 to read relevant material.
 For now, I'm still curious why Neil's implementation is no longer
 working? The Postgres has been patched a lot, but the general idea
 behind Neil's implementation should still work, isn't it?
 Besides, whether this query is needed is still not decided. Seems
 this is another hard to decide point. Is it that this topic is still
 not so prepared for th e Gsoc yet? If really so, I think I still
 have time to switch to other topics. Any suggestion?
 
 
 Thanks.
 
 Best Regards and Thanks
 Huang Qi Victor
 Computer Science of National University of Singapore

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] query cache

2012-03-24 Thread Joshua Berkus
Billy,

 I've done a brief search of the postgresql mail archives, and I've
 noticed a few projects for adding query caches to postgresql, (for
 example, Masanori Yamazaki's query cache proposal for GSOC 2011),

... which was completed, btw.  Take a look at the current release of pgPool.

Are you proposing this for GSOC2012, or is this just a general idea?

 I'm wondering if anyone would be interested in a query cache as a
 backend to postgresql? I've been playing around with the postgresql
 code, and if I'm understanding the code, I believe this is possible.

Well, you'd have to start by demonstrating the benefit of it.  The advantage of 
query caches in proxies and clients is well-known, because you can offload some 
of the work of the database onto other servers, this increasing capacity.  
Adding a query cache to the database server would require the query identity 
recognition of the cache to be far cheaper (as in 10X cheaper) than planning 
and running the query, which seems unlikely at best.

There are a number of proven caching models which PostgreSQL currently does not 
yet implement.  I'd think it would be more profitable to pursue one of those, 
such as:

* parse caching in the client (JDBC has this, but libpq does not).
* shared cached plans between sessions (snapshot issues here could be nasty)
* fully automated materialized views

If you want to do something radical and new, then come up with a way for a 
client to request and then reuse a complete query plan by passing it to the 
server.  That would pave the way for client-side plan caching (and plan 
manipulation) code written in a variety of languages, and thus further 
innovation through creative algorithms and other ideas.

--Josh Berkus




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix PL/Python metadata when there is no result

2012-03-24 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On ons, 2012-03-07 at 17:14 -0500, Tom Lane wrote:
 I said it was a reasonable alternative, not that it was the only one
 we should consider.  The behavior of .nrows() might be accidental,
 but perhaps it is a preferable model to adopt.

 After pondering this for several days now I still think the best
 approach is to change .nrows() to return None for utility commands and
 have the other metadata functions throw exceptions.

OK, I don't have strong feelings about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] query cache

2012-03-24 Thread Tom Lane
Joshua Berkus j...@agliodbs.com writes:
 If you want to do something radical and new, then come up with a way
 for a client to request and then reuse a complete query plan by
 passing it to the server.

[ raised eyebrow ]  That seems like a complete nonstarter on two
different grounds: cache invalidation needs (client won't know if plan
is stale) and security issues (pass broken plan to server, crash
server).  Those problems could be avoided if the client simply has a
token for a plan that's kept on the server side ... but how is that
concept different from a prepared statement?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] query cache

2012-03-24 Thread Billy Earney
On Sat, Mar 24, 2012 at 3:22 PM, Joshua Berkus j...@agliodbs.com wrote:

 Billy,

  I've done a brief search of the postgresql mail archives, and I've
  noticed a few projects for adding query caches to postgresql, (for
  example, Masanori Yamazaki's query cache proposal for GSOC 2011),

 ... which was completed, btw.  Take a look at the current release of
 pgPool.

 Are you proposing this for GSOC2012, or is this just a general idea?


just a general idea, but if someone wants to work on it for GSOC2012, I
wouldn't mind giving a helping hand.  I'm not a student, so GSOC probably
doesn't apply to me.


  I'm wondering if anyone would be interested in a query cache as a
  backend to postgresql? I've been playing around with the postgresql
  code, and if I'm understanding the code, I believe this is possible.

 Well, you'd have to start by demonstrating the benefit of it.  The
 advantage of query caches in proxies and clients is well-known, because you
 can offload some of the work of the database onto other servers, this
 increasing capacity.  Adding a query cache to the database server would
 require the query identity recognition of the cache to be far cheaper (as
 in 10X cheaper) than planning and running the query, which seems unlikely
 at best.

 I figured I'd create the md5 digest of the sourceText of a query, and then
look that up in a hash.  I don't think that will be very expensive.  I'll
have another hash to keep track of which queries are dependent on which
relations, so that when a relation is changed somehow (and committed), the
query is then invalidated and removed from the query hash.


Billy


Re: [HACKERS] query cache

2012-03-24 Thread Tatsuo Ishii
 Well, you'd have to start by demonstrating the benefit of it.  The
 advantage of query caches in proxies and clients is well-known, because you
 can offload some of the work of the database onto other servers, this
 increasing capacity.  Adding a query cache to the database server would
 require the query identity recognition of the cache to be far cheaper (as
 in 10X cheaper) than planning and running the query, which seems unlikely
 at best.

 I figured I'd create the md5 digest of the sourceText of a query, and then
 look that up in a hash.  I don't think that will be very expensive.  I'll
 have another hash to keep track of which queries are dependent on which
 relations, so that when a relation is changed somehow (and committed), the
 query is then invalidated and removed from the query hash.

From the experience of implementing query cache in pgool-II there are
some suggestions:

- A query result cache should not be created if the transaction
  including the SELECT is not committed.

- Since a transaction could have many SELECTs, you need to keep those
  query results somewhere in a temporary storage. You could either
  discard or register them to the query cache storage depending on the
  transaction's fate, either aborted or committed.

- If a SELECT has non-immutable functions, then the query result
  should not be cached.

- If a SELECT uses temporary tables, then the query result should not
  be cached.

- If a SELECT uses unlogged tables, then the query result should not
  be cached because their data could vanish after crash recovery. Of
  course this is only applied if you plan to use cache storage which
  does not survive after crash.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers