Re: [HACKERS] pivot aggregation with a patched intarray

2014-06-06 Thread Marc Mamin
 From: Ali Akbar [mailto:the.ap...@gmail.com]
 Sent: Freitag, 6. Juni 2014 03:44
 Subject: Re: [HACKERS] pivot aggregation with a patched intarray
 
 2014-06-05 17:18 GMT+07:00 Marc Mamin m.ma...@intershop.de:
  I'm thinking about adding a final function to my aggregate that would
  replace zero values will nulls, hence transforming the intarray into
 a standard int[], possibly with nullbitmap and a lowerbound that can be
  1.
  This will probably degrade the performance considerably, but may
 reduce the size of the end result for spare data and not too small
 integers...
  Performances should greatly depend on the data distribution and order
 as they influence the number of palloc.
  My first tests shown as well better and poorer results.
 
  My target is not to get better performances at the first place, but
 to get a pivot structure in an early aggregation stage.
 
 Usually for pivot, i use crosstab function from tablefunc
 (http://www.postgresql.org/docs/9.4/static/tablefunc.html#AEN158550).
 If your patch doesn't perform better, it's more easier to just use
 crosstab. For storing it efficiently, the result can be transformed
 into array manually.


Hello,

crosstab is too restrictive for my use case: 
it supports only one row_name column and you have to know the number of 
returned columns (categories).
What I'm doing is to combine a count aggregate with a pivot. This is not the 
same as what crosstab offers.

 
 PS: as Michael Paquier said above, its better if you could send the
 patch in the .patch file format (see:
 https://wiki.postgresql.org/wiki/Working_with_GIT).

I'm first waiting for some positive feedback on the idea itself before to 
eventually submit this officially.

best regards, 

Marc Mamin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing join removals for more join types

2014-06-06 Thread David Rowley
On Fri, Jun 6, 2014 at 11:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Noah Misch n...@leadboat.com writes:
  On Thu, Jun 05, 2014 at 02:12:33AM +0200, Andres Freund wrote:
  A bit more crazy, but how about trying trying to plan joins with a added
  one-time qual that checks the size of the deferred trigger queue? Then
  we wouldn't even need special case plans.

  That, too, sounds promising to investigate.

 Not terribly.  You can't actually do join removal in such a case, so it's
 not clear to me that there's much win to be had.  The planner would be at
 a loss as to what cost to assign such a construct, either.

 Moreover, what happens if the trigger queue gets some entries after the
 query starts?


In the scripts below I've created a scenario (scenario 1)  that the inner
query which I've put in a trigger function does see the the referenced
table before the RI triggers execute, so it gives 1 row in the SELECT j2_id
FROM j1 WHERE NOT EXISTS(SELECT 1 FROM j2 WHERE j2_id = j2.id) query. This
works and I agree it's a problem that needs looked at in the patch.

I'm also trying to create the situation that you describe where the RI
trigger queue gets added to during the query. I'm likely doing it wrong
somehow, but I can't see what I'm doing wrong.

Here's both scripts. I need help with scenario 2 to create the problem you
describe, I can't get my version to give me any stale non-cascaded records.


-- Scenario 1: Outer command causes a foreign key trigger to be queued
-- and this results in a window of time where we have records
-- in the referencing table which don't yet exist in the
-- referenced table.

DROP TABLE IF EXISTS j1;
DROP TABLE IF EXISTS j2;
DROP TABLE IF EXISTS records_violating_fkey;

CREATE TABLE j2 (id INT NOT NULL PRIMARY KEY);
CREATE TABLE j1 (
  id INT PRIMARY KEY,
  j2_id INT NOT NULL REFERENCES j2 (id) MATCH FULL ON DELETE CASCADE ON
UPDATE CASCADE
);

INSERT INTO j2 VALUES(10),(20);
INSERT INTO j1 VALUES(1,10),(2,20);

-- create a table to store records that 'violate' the fkey.
CREATE TABLE records_violating_fkey (j2_id INT NOT NULL);

CREATE OR REPLACE FUNCTION j1_update() RETURNS TRIGGER AS $$
BEGIN
  RAISE notice 'Trigger fired';
  INSERT INTO records_violating_fkey SELECT j2_id FROM j1 WHERE NOT
EXISTS(SELECT 1 FROM j2 WHERE j2_id = j2.id);
  RETURN NEW;
  END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER j1_update_trigger BEFORE UPDATE ON j2 FOR EACH ROW EXECUTE
PROCEDURE j1_update();

UPDATE j2 SET id = id+1;

-- returns 1 row.
SELECT * FROM records_violating_fkey;


--
-- Scenario 2: Inner command causes a foreign key trigger to be queued.

DROP TABLE IF EXISTS j1;
DROP TABLE IF EXISTS j2;

CREATE TABLE j2 (id INT NOT NULL PRIMARY KEY);

CREATE TABLE j1 (
  id INT PRIMARY KEY,
  j2_id INT NOT NULL REFERENCES j2 (id) MATCH FULL ON DELETE CASCADE ON
UPDATE CASCADE
);

INSERT INTO j2 VALUES(10),(20);
INSERT INTO j1 VALUES(1,10),(2,20);

CREATE OR REPLACE FUNCTION update_j2(p_id int) RETURNS int AS $$
BEGIN
  RAISE NOTICE 'Updating j2 id = % to %', p_id, p_id + 1;
  UPDATE j2 SET id = id + 1 WHERE id = p_id;
  RETURN 1;
END;
$$ LANGUAGE plpgsql;

-- try and get some records to be returned by causing an update on the
record that is not the current record.
SELECT * FROM j1 WHERE NOT EXISTS(SELECT 1 FROM j2 WHERE j2_id = id) AND
update_j2((SELECT MIN(j2_id) FROM j1 ij1 WHERE ij1.j2_id  j1.j2_id)) = 1;

Regards

David Rowley


[HACKERS] cancelling statement due to user request error occurs but the transaction has committed.

2014-06-06 Thread Naoya Anzai
Hi All,

When log_duration is true ( or log_min_duration_statement=0 ),
If a transaction has internally been commited receives a SIGINT signal
then a query cancellation error is output.

For example,
1. A query like a TRUNCATE is removing bigger table files.
2. The session receives SIGINT signal.
3. Query cancellation error occurs.
4. But the query has commited.

e.g.)
---
naoya=# \d
   List of relations
 Schema | Name | Type  | Owner
+--+---+---
 public | hoge | table | naoya
(1 row)

naoya=# set log_duration=on;
SET
naoya=# select count(*) from hoge;
 count

 10
(1 row)

naoya=# truncate hoge;
Cancel request sent
ERROR:  canceling statement due to user request
naoya=# select count(*) from hoge;
 count
---
 0
(1 row)
---

This is because  ProcessInterrupts function is called by errfinish ( in 
query-duration ereport).

I think this cancellation request must not interrupt the internal commited 
transaction.

This is because clients may misunderstand the transaction has rollbacked. 

Now,
I tried to fix the problem.

--- postgresql-fe7337f/src/backend/utils/error/elog.c   2014-06-06 
11:57:44.0 +0900
+++ postgresql-fe7337f.new/src/backend/utils/error/elog.c   2014-06-06 
13:10:51.0 +0900
@@ -580,7 +580,8 @@
 * can stop a query emitting tons of notice or warning messages, even if
 * it's in a loop that otherwise fails to check for interrupts.
 */
-   CHECK_FOR_INTERRUPTS();
+   if (IsTransactionState()) 
+   CHECK_FOR_INTERRUPTS();
 }

Thereby,
When ereport(non error level) calls and not in-transaction state,
PostgreSQL never calls ProcessInterrupts function by errfinish.

But I have a anxiety to fix errfinish function because 
errfinish is called in many many situations..

Could you please confirm it?

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: anzai-na...@mxu.nes.nec.co.jp
---





postgresql-fe7337f_elog.patch
Description: postgresql-fe7337f_elog.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposing pg_hibernate

2014-06-06 Thread Gurjeet Singh
On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila amit.kapil...@gmail.com wrote:

 Another thing is don't you want to handle SIGQUIT signal in bg saver?

I think bgworker_quickdie registered in StartBackgroundWorker() serves
the purpose just fine.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-06 Thread furuyao
 -Original Message-
   Flush is not performed every time write, it is performed
   collectively like walrecever.
 
  I only glanced at this, but afaics you're only flushing at the end
  every WAL segment. That will result in absolutely horrible performance,
 right?
  Walreceiver does flush more frequently than that. It basically syncs
  every chunk of received WAL...
 
 IMO the completion of the write loop was completion of received WAL.
 And Walreceiver same.
 
 I confirm it about the flush position.
As you say,Walreceiver does flush more frequently than that.
However, it seems difficult to apply as same way.
So, I have tried a different approach.

1. select ()  time-out 100msec setting.
2. flush check is time-out of the select ().
3. wirte() only when flush.

I think this is what cause the problem, but I don't have some good idea to 
solve it.
Can someone please advise me?

Regards,

-- 
Furuya Osamu


pg_receivexlog-add-synchronous-mode-v2.patch
Description: pg_receivexlog-add-synchronous-mode-v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-06-06 Thread Pavel Stehule
Hello

where we are with this feature?

Is there some barriers to commit bugfix?

Regards

Pavel


2014-05-18 19:46 GMT+02:00 Sergey Muraviov sergey.k.murav...@gmail.com:

 I found some new bugs and fix them.
 And I had to make many changes.


 2014-05-17 21:31 GMT+04:00 Greg Stark st...@mit.edu:

 Sorry, a couple things still look to not be quite right.

 1) The width of the table when linestyle=old-ascii and border=0 or
 border=1 (and expanded=on and format=wrapped) seems to off by one.

 2) The hyphens following the RECORD NN are short by one

 I'm surprised the last patch was so big since it sounded like a simple
 off-by-one bug. It looks like you've removed the leading space on the
 border=0 expanded case. I guess that makes sense but we should
 probably stop making significant changes now and just focus on fixing
 the off by one bugs.




 --
 Best regards,
 Sergey Muraviov



Re: [HACKERS] Proposing pg_hibernate

2014-06-06 Thread Gurjeet Singh
On Thu, Jun 5, 2014 at 11:32 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jun 5, 2014 at 5:39 PM, Gurjeet Singh gurj...@singh.im wrote:

  On Tue, Jun 3, 2014 at 5:43 PM, Gurjeet Singh gurj...@singh.im wrote:
 Case 2 also won't cause any buffer restores because the save-files are
 created only on clean shutdowons; not on a crash or immediate
 shutdown.

 How do you ensure that buffers are saved only on clean shutdown?

Postmaster sends SIGTERM only in smart or fast shutdown requests.

 Buffer saver process itself can crash while saving or restoring
 buffers.

True. That may lead to partial list of buffers being saved. And the
code in Reader process tries hard to read only valid data, and punts
at the first sight of data that doesn't make sense or on ERROR raised
from Postgres API call.

 IIUC on shutdown request, postmaster will send signal to BG Saver
 and BG Saver will save the buffers and then postmaster will send
 signal to checkpointer to shutdown.  So before writing Checkpoint
 record, BG Saver can crash (it might have saved half the buffers)

Case handled as described above.

 or may BG saver saves buffers, but checkpointer crashes (due to
 power outage or any such thing).

Checkpointer process' crash seems to be irrelevant to Postgres
Hibernator's  workings.

I think you are trying to argue the wording in my claim save-files
are created only on clean shutdowons; not on a crash or immediate
shutdown, by implying that a crash may occur at any time during and
after the BufferSaver processing. I agree the wording can be improved.
How about

... save-files are created only when Postgres is requested to shutdown
in normal (smart or fast) modes.

Note that I am leaving out the mention of crash.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it JSQuery?

2014-06-06 Thread Oleg Bartunov
Jsquery - is QUERY language, JsonPath - is language to EXTRACT json parts.

On Fri, Jun 6, 2014 at 4:34 AM, David E. Wheeler da...@justatheory.com wrote:
 On Jun 5, 2014, at 5:25 PM, Andrew Dunstan and...@dunslane.net wrote:

 My understanding is that it's meant to be analogous to tsquery.

 At first glance, JsonPath doesn't seem to support AND and OR operators, 
 which would make it rather less expressive than I gather JSQuery is meant to 
 be.

 Yes, but perhaps it could be a superset.

 I guess my real question is: Should it not be based on some existing dialect, 
 preferably something in fairly wide use outside the Postgres community?

 Unless that something is awful, of course.

 David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it JSQuery?

2014-06-06 Thread David E. Wheeler
On Jun 6, 2014, at 6:54 AM, Oleg Bartunov obartu...@gmail.com wrote:

 Jsquery - is QUERY language, JsonPath - is language to EXTRACT json parts.
 
Sure, but could we not potentially build on its syntax, instead of building a 
new one? I’m not saying we *should*, but if we don’t, I think there should be a 
discussion about why not. For example, I think it would not be a good idea to 
follow [JSONiq](http://www.jsoniq.org/) because who wants to write queries in 
JSON? (Have we learned nothing from XSLT?).

Here’s a (partial) list of existing JSON query languages:

  http://stackoverflow.com/a/7812073/79202

The arguments might be:

* [JSONiq](http://jsoniq.org/): Queries in JSON? Gross!
* [UNQL](http://www.unqlspec.org/): Too similar to SQL
* [JAQL](https://code.google.com/p/jaql/): Too different from SQL
* [JSONPath](http://goessner.net/articles/JsonPath/): Too verbose
* [JSON Query](https://github.com/mmckegg/json-query): Too little there
* [Mongo](http://www.mongodb.org/display/DOCS/Inserting#Inserting-JSON): Gross 
syntax
* [LINQ](http://james.newtonking.com/archive/2008/03/02/json-net-2-0-beta-2): 
Too similar to SQL
* [searchjs](https://github.com/deitch/searchjs): Queries in JSON? Gross!
* [JQuery](http://jquery.org/): It's for HTML, not JSON
* [SpahQL](http://danski.github.io/spahql/): More like XPath
* [ObjectPath](http://adriank.github.io/ObjectPath/): Too verbose
* [JFunk](https://code.google.com/p/jfunk/): XPathy
* [JData](http://jaydata.org): Queries in JavaScript? C’mon.

These are just off-the-cuff evaluations in 10 minutes of looking -- surely not 
all of them are accurate. Some of them maybe *are* useful to emulate. It’s 
definitely worthwhile, IMHO, to evaluate prior art and decide what, if any of 
it, should inspire the JSQuery syntax, and there should be reasons why and why 
not.

I do think that the name should be changed if we don’t follow an existing 
standard, as [JSQuery](https://code.google.com/p/gwtquery/wiki/JsQuery) is 
already a thing.

Best,

David



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-06 Thread Fujii Masao
On Fri, Jun 6, 2014 at 8:05 PM,  furu...@pm.nttdata.co.jp wrote:
 -Original Message-
   Flush is not performed every time write, it is performed
   collectively like walrecever.
 
  I only glanced at this, but afaics you're only flushing at the end
  every WAL segment. That will result in absolutely horrible performance,
 right?
  Walreceiver does flush more frequently than that. It basically syncs
  every chunk of received WAL...

 IMO the completion of the write loop was completion of received WAL.
 And Walreceiver same.

 I confirm it about the flush position.
 As you say,Walreceiver does flush more frequently than that.

No. IIUC walreceiver does flush *less* frequently than what you implemented
on pg_receivexlog. Your version of pg_receivexlog tries to do flush every time
when it receives one WAL chunk. OTOH, walreceiver does flush only when
there is no extra WAL chunk in receive buffer. IOW, after writing WAL chunk,
if there is another WAL chunk that walreceiver can receive immediately, it
postpones flush later.

 However, it seems difficult to apply as same way.

Why? ISTM that's not so difficult.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build

2014-06-06 Thread Alvaro Herrera
Bruce Momjian wrote:
 On Wed, May 28, 2014 at 12:29:28PM -0400, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   I think this is caused because the variable is not defined as SOCKET. 
   The attached patch fixes this.  This should prevent the warning.
  
  Surely that's just going to move the errors somewhere else.  The call
  site still expects the argument to be int[].
 
 Ah, yes, you are right.  This is a similar problem I had with libpq
 where PQsocket() had to return an int.
 
 Attached is an updated patch which follows my previous coding of
 checking for PGINVALID_SOCKET, and if not equal, assigns the value to an
 integer handle.  I would also like to rename variable 's' to
 'listen_sock', but that is not in the patch, for clarity reasons.
 
 Should this be held for 9.5?  I think it is only warning removal.  On
 the other hand, portability is what we do during beta testing.

I think this should go in 9.4, but as you say it's only warning removal
so there is probably little point in patching further back (this code
dates back to 9.3.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposing pg_hibernate

2014-06-06 Thread Jim Nasby

On 6/4/14, 8:56 AM, Andres Freund wrote:

On 2014-06-04 09:51:36 -0400, Robert Haas wrote:

On Wed, Jun 4, 2014 at 2:08 AM, Andres Freundand...@2ndquadrant.com  wrote:

 On 2014-06-04 10:24:13 +0530, Amit Kapila wrote:

 Incase of recovery, the shared buffers saved by this utility are
 from previous shutdown which doesn't seem to be of more use
 than buffers loaded by recovery.

 
 Why? The server might have been queried if it's a hot standby one?


I think that's essentially the same point Amit is making.  Gurjeet is
arguing for reloading the buffers from the previous shutdown at end of
recovery; IIUC, Amit, you, and I all think this isn't a good idea.

I think I am actually arguing for Gurjeet's position. If the server is
actively being queried (i.e. hot_standby=on and actually used for
queries) it's quite reasonable to expect that shared_buffers has lots of
content that is*not*  determined by WAL replay.


Perhaps instead of trying to get data actually into shared buffers it would be 
better to just advise the kernel that we think we're going to need it? ISTM 
it's reasonably fast to pull data from disk cache into shared buffers.

On a related note, what I really wish for is the ability to restore the disk 
cash after a restart/unmount...
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-06 Thread Tom Lane
I've been looking at the complaint Tim Wilson posted in pgsql-performance
about badly inaccurate reltuples updates coming from VACUUM.  There seem
to be a number of problems leading to that.  The key point is that when
VACUUM has scanned only part of the relation, it assumes that the
live-tuple density in that part of the relation is num_tuples (I'm
speaking of the counter accumulated in lazy_scan_heap) divided by
scanned_pages, and then it tries to extrapolate that information to the
rest of the relation.  Now, the validity of that extrapolation is a bit
questionable given that VACUUM is considering a highly nonrandom subset of
the table's pages, but the real problem is the values are wrong even for
the pages we did look at.  To wit:

* scanned_pages is not reliably the number of pages we scanned, because
somebody thought it would be cute to bump it even for pages we decided
didn't need to be scanned because they contain no freezable tuples.
So we have an increment in scanned_pages, but no corresponding increment
in the tuple count, leading to a density underestimate.  This seems to
only happen in vacuum-for-wraparound cases, but it's still wrong.  We need
to separate the logic about whether we skipped any pages from the
statistical counters.

* num_tuples has very little to do with the number of live tuples, because
it actually counts all nonremovable tuples, including RECENTLY_DEAD,
INSERT_IN_PROGRESS, and DELETE_IN_PROGRESS tuples.  In the case Tim is
complaining about, the VACUUM happens concurrently with a long transaction
that is bulk-updating most tuples in the relation, some of them several
times, so that VACUUM sees multiple images of every tuple (one
INSERT_IN_PROGRESS, the rest DELETE_IN_PROGRESS), and thus arrives at a
horrid overestimate of the number of live tuples.

I figured it'd be easy enough to get a better estimate by adding another
counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
assuming that in-progress inserts and deletes will both commit).  I did
that, and found that it helped Tim's test case not at all :-(.  A bit of
sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
whether the transaction has since marked it for deletion:

/*
 * It'd be possible to discern between INSERT/DELETE in progress
 * here by looking at xmax - but that doesn't seem beneficial for
 * the majority of callers and even detrimental for some. We'd
 * rather have callers look at/wait for xmin than xmax. It's
 * always correct to return INSERT_IN_PROGRESS because that's
 * what's happening from the view of other backends.
 */
return HEAPTUPLE_INSERT_IN_PROGRESS;

It did not use to blow this question off: back around 8.3 you got
DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
less laziness + fuzzy thinking here.  Maybe we should have a separate
HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
the case that callers other than VACUUM itself are okay with failing
to make this distinction?  I'm dubious: there are very few if any
callers that treat the INSERT and DELETE cases exactly alike.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it JSQuery?

2014-06-06 Thread Josh Berkus
On 06/06/2014 09:12 AM, David E. Wheeler wrote:
 On Jun 6, 2014, at 6:54 AM, Oleg Bartunov obartu...@gmail.com wrote:
 
 Jsquery - is QUERY language, JsonPath - is language to EXTRACT json parts.
  
 Sure, but could we not potentially build on its syntax, instead of building a 
 new one? I’m not saying we *should*, but if we don’t, I think there should be 
 a discussion about why not. For example, I think it would not be a good idea 
 to follow [JSONiq](http://www.jsoniq.org/) because who wants to write queries 
 in JSON? (Have we learned nothing from XSLT?).
 
 Here’s a (partial) list of existing JSON query languages:
 
   http://stackoverflow.com/a/7812073/79202
 
 The arguments might be:
 
 * [JSONiq](http://jsoniq.org/): Queries in JSON? Gross!

Also overly complex for the functionality we support.  There's also no
way to make the jsquery strings valid JSON without adding a bunch of
extra text.

 * [UNQL](http://www.unqlspec.org/): Too similar to SQL

... also intended to be a *complete* replacement for SQL, whereas we
just want a syntax to search JSON fields.

 * [JAQL](https://code.google.com/p/jaql/): Too different from SQL
 * [JSONPath](http://goessner.net/articles/JsonPath/): Too verbose

I don't agree with the too verbose, but lacking AND|OR is pretty crippling.

 * [JSON Query](https://github.com/mmckegg/json-query): Too little there
 * [Mongo](http://www.mongodb.org/display/DOCS/Inserting#Inserting-JSON): 
 Gross syntax
 * [LINQ](http://james.newtonking.com/archive/2008/03/02/json-net-2-0-beta-2): 
 Too similar to SQL
 * [searchjs](https://github.com/deitch/searchjs): Queries in JSON? Gross!
 * [JQuery](http://jquery.org/): It's for HTML, not JSON
 * [SpahQL](http://danski.github.io/spahql/): More like XPath
 * [ObjectPath](http://adriank.github.io/ObjectPath/): Too verbose
 * [JFunk](https://code.google.com/p/jfunk/): XPathy
 * [JData](http://jaydata.org): Queries in JavaScript? C’mon.
 
 These are just off-the-cuff evaluations in 10 minutes of looking -- surely 
 not all of them are accurate. Some of them maybe *are* useful to emulate. 
 It’s definitely worthwhile, IMHO, to evaluate prior art and decide what, if 
 any of it, should inspire the JSQuery syntax, and there should be reasons why 
 and why not.

Well, I'd also say that we don't care about syntaxes which are not
already popular.  There's no point in being compatible with something
nobody uses.  How many of the above have any uptake?

Also, the explosion of query languages in this area is not an
encouraging sign for us being able to pick the right one.  UUID-OSSP
anyone?

So the advantage of the current jsquery syntax is that it's similar to
tsquery, which already has some adoption in our userbase.  On the other
hand, I'm not sure how many people actually understand the tsquery
syntax, and jsquery will be different enough to trip people up.

 I do think that the name should be changed if we don’t follow an existing 
 standard, as [JSQuery](https://code.google.com/p/gwtquery/wiki/JsQuery) is 
 already a thing.

I saw that too, but I don't get the impression that Google jsquery is
all that active.   No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

2014-06-06 Thread Andres Freund
On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
 I figured it'd be easy enough to get a better estimate by adding another
 counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
 assuming that in-progress inserts and deletes will both commit).  I did
 that, and found that it helped Tim's test case not at all :-(.  A bit of
 sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
 INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
 whether the transaction has since marked it for deletion:
 
 /*
  * It'd be possible to discern between INSERT/DELETE in progress
  * here by looking at xmax - but that doesn't seem beneficial for
  * the majority of callers and even detrimental for some. We'd
  * rather have callers look at/wait for xmin than xmax. It's
  * always correct to return INSERT_IN_PROGRESS because that's
  * what's happening from the view of other backends.
  */
 return HEAPTUPLE_INSERT_IN_PROGRESS;

That's only the case of a couple of days ago. I really wasn't sure
wheter to go that way or discern the two cases. That changed in the wake
of:
http://www.postgresql.org/message-id/20140530143150.GA11051@localhost

I tried to solicit feedback (e.g. by CCing you :)) but I mostly
failed. Alvaro agreed, on IM, that it's better this way.

 It did not use to blow this question off: back around 8.3 you got
 DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
 less laziness + fuzzy thinking here.

My argument for not discerning wasn't that it's hard to do, but that it
might confuse callers more the other way round. E.g. doing a
XactLockTableWait(xmax) might not be sufficient for the tuple being
alive.


 Maybe we should have a separate
 HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?

Maybe.

 Is it *really*
 the case that callers other than VACUUM itself are okay with failing
 to make this distinction?  I'm dubious: there are very few if any
 callers that treat the INSERT and DELETE cases exactly alike.

I looked through all of them and saw none that'd be problematic. And
some, like predicate.c, where the new behaviour seems to be better. Most
of the ones that care about INSERT/DELETE_IN_PROGRESS wait on xmin/xmax
respectively.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-06 Thread Jim Nasby

On 6/5/14, 9:54 PM, Tom Lane wrote:

Rod Taylor rod.tay...@gmail.com writes:

On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

I'm not entirely convinced that it's worth the extra planning cycles,
though.  Given the small number of complaints to date, it might not
be worth doing this.  Thoughts?



Would this avoid execution of expensive functions in views when their
output is discarded?


Yes, as long as they're not marked volatile and don't return sets.


That would certainly make it useful for us.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it JSQuery?

2014-06-06 Thread David E. Wheeler
On Jun 6, 2014, at 12:51 PM, Josh Berkus j...@agliodbs.com wrote:

 * [JAQL](https://code.google.com/p/jaql/): Too different from SQL
 * [JSONPath](http://goessner.net/articles/JsonPath/): Too verbose
 
 I don't agree with the too verbose, but lacking AND|OR is pretty crippling.

I had enough people complain about Test::XPath, which tests the structure of 
XML and HTML documents using XPath. They didn't like how verbose XPath was, 
preferring CSS selectors. So I ended up with a patch to support CSS syntax, 
too. CSS-style syntax is part of what people like about JQuery, too.

 Well, I'd also say that we don't care about syntaxes which are not
 already popular.  There's no point in being compatible with something
 nobody uses.  How many of the above have any uptake?

I think there is JQuery, JSONPath, and everything else, really. If we can draw 
some parallels, I think that would be sufficient to make people comfortable.

 I do think that the name should be changed if we don’t follow an existing 
 standard, as [JSQuery](https://code.google.com/p/gwtquery/wiki/JsQuery) is 
 already a thing.
 
 I saw that too, but I don't get the impression that Google jsquery is
 all that active.   No?

It’s Google. You really want to wrangle with their attorneys?

David




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] Why is it JSQuery?

2014-06-06 Thread Josh Berkus
On 06/06/2014 03:23 PM, David E. Wheeler wrote:
 On Jun 6, 2014, at 12:51 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, I'd also say that we don't care about syntaxes which are not
 already popular.  There's no point in being compatible with something
 nobody uses.  How many of the above have any uptake?
 
 I think there is JQuery, JSONPath, and everything else, really. If we can 
 draw some parallels, I think that would be sufficient to make people 
 comfortable.

Well, then those are the only ones worth considering.

 I do think that the name should be changed if we don’t follow an existing 
 standard, as [JSQuery](https://code.google.com/p/gwtquery/wiki/JsQuery) is 
 already a thing.

 I saw that too, but I don't get the impression that Google jsquery is
 all that active.   No?
 
 It’s Google. You really want to wrangle with their attorneys?

Google is not going to sue us over a minor OSS project which isn't a
commercial product.

The relevant question is: are users liable to confuse our jsquery with
Google jsquery?

Maybe we should call it jsonesque  ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-06-06 Thread Peter Geoghegan
On Thu, Jun 5, 2014 at 5:37 PM, Peter Geoghegan p...@heroku.com wrote:
 One thing that isn't all that obvious about this worst case is that
 it's in general very qsort() friendly, and therefore the startup costs
 (copying) totally dominates. Actually, you're not even sorting -
 you're verifying that the tuples are already exactly in order (a
 questionable optimization we apply at every level).

Kevin mentioned something about the Wisconsin courts having columns
that all began with The State of Wisconsin Vs. in the dev meeting in
Ottawa. I thought that this was an interesting case, because it is
representative of reality, which is crucially important to consider
here. I decided to simulate it. In my original test database:

postgres=# create table wisconsin(casen text);
CREATE TABLE
postgres=# insert into wisconsin select 'The State of Wisconsin Vs. '
|| city from cities;
INSERT 0 317102

sort-wisconsin.sql: select * from (select * from wisconsin order by
casen offset 100) d;

Master:

pgbench -M prepared -f sort-wisconsin.sql -T 300 -n
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 55
latency average: 5454.545 ms
tps = 0.181191 (including connections establishing)
tps = 0.181191 (excluding connections establishing)

Patch (most recent revision, with ameliorations, HyperLogLog, etc):

pgbench -M prepared -f sort-wisconsin.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 55
latency average: 5454.545 ms
tps = 0.182593 (including connections establishing)
tps = 0.182594 (excluding connections establishing)

Earlier patch (no ameliorations for Heikki's case):

pgbench -M prepared -f sort-wisconsin.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 54
latency average: .556 ms
tps = 0.176914 (including connections establishing)
tps = 0.176915 (excluding connections establishing)

With my most recent revision, the ameliorating measures are effective
enough that with the sortsupport shim and fmgr trampoline avoided, we
still come out ahead even for this case. Great. But you may be
surprised that the regression is so small in the case of the patch
without any ameliorating measures (the original patch). That's because
the data isn't *perfectly* logically/physically correlated here, as in
Heikki's worst case. So, the 317,102 wasted strxfrm() calls are
relatively inexpensive. Consider how cost_sort() models the cost of a
sort when an in memory quicksort is anticipated:

/* We'll use plain quicksort on all the input tuples */
startup_cost += comparison_cost * tuples * LOG2(tuples);

In the case of this quicksort, the planner guesses there'll be 317102
* LOG2(317102) comparisons -- about 5,794,908 comparisons, which
implies over 10 times as many strcoll() calls as wasted strxfrm()
calls. The cost of those strxfrm() calls begins to look insignificant
before n gets too big (at n = 100, it's 100 wasted strxfrm() calls to
about 664 strcoll() calls). Unless, of course, you have a bubble sort
best case where everything is already completely in order, in which
case there'll be a 1:1 ratio between wasted strxfrm() calls and
strcoll() calls. This optimization was something that we added to our
qsort(). It doesn't appear in the original NetBSD implementation, and
it doesn't appear in the Bentley/McIlroy paper, and it doesn't appear
anywhere else that I'm aware of. I'm not the only person to regard it
with suspicion - Tom has in the past expressed doubts about that too
[1]. Also, note that no sorting algorithm can do better than O(n log
n) in the average case - that's the information-theoretical lower
bound on the average-case speed of any comparison-based sorting
algorithm.

To be clear: I'm certainly not saying that we shouldn't fix Heikki's
worst case, and indeed I believe I have, but we should also put this
worst case in perspective.

By the way, I just realized that I failed to fully remove client
overhead (I should have put an extra 0 on the end of my offset for the
city.sql query), which added noise to the city/country/province
tests. Revised figures are as follows (these are better than before):

Master:
==

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
duration: 300 s
number of transactions actually processed: 278
latency average: 1079.137 ms
tps = 0.924358 (including connections establishing)
tps = 0.924362 (excluding connections establishing)

Patch:
=

pgbench -M prepared -f sort-city.sql -T 300 -n

transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 1
number of