Re: [HACKERS] GSoC 2017

2017-01-10 Thread Atri Sharma
Count me in as a mentor

On 10-Jan-2017 3:24 PM, "Alexander Korotkov" 
wrote:

> Hi all!
>
> In 2016 PostgreSQL project didn't pass to GSoC program.  In my
> understanding the reasons for that are following.
>
> 1. We did last-minute submission of our application to GSoC.
> 2. In 2016 GSoC application form for mentoring organizations has been
> changed.  In particular, it required more detailed information about
> possible project.
>
> As result we didn't manage to make a good enough application that time.
> Thus, our application was declined. See [1] and [2] for details.
>
> I think that the right way to manage this in 2017 would be to start
> collecting required information in advance.  According to GSoC 2017
> timeline [3] mentoring organization can submit their applications from
> January 19 to February 9.  Thus, now it's a good time to start collecting
> project ideas and make call for mentors.  Also, we need to decide who would
> be our admin this year.
>
> In sum, we have following questions:
> 1. What project ideas we have?
> 2. Who are going to be mentors this year?
> 3. Who is going to be project admin this year?
>
> BTW, I'm ready to be mentor this year.  I'm also open to be an admin if
> needed.
>
> [1] https://www.postgresql.org/message-id/flat/CAA-
> aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-
> PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com
> [3] https://developers.google.com/open-source/gsoc/timeline
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Atri Sharma
On Tue, Mar 1, 2016 at 9:30 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:52 AM, Tom Lane  wrote:
> > Aleksander Alekseev  writes:
> >> There are applications that create and delete a lot of temporary
> >> tables. Currently PostgreSQL doesn't handle such a use case well.
>
>
> Yeah, I have a really hard time believing this can ever work.  There
> are MANY catalog tables potentially involved here - pg_class,
> pg_attribute, pg_attrdef, pg_description, pg_trigger, ... and loads
> more - and they all can have OID references to each other.  If you
> create a bunch of fake relcache and syscache entries, you're going to
> need to give them OIDs, but where will those OIDs come from?  What
> guarantees that they aren't in use, or won't be used later while your
> temporary object still exists?  I think making this work would make
> parallel query look like a minor feature.
>
>
Fair point, that means inventing a whole new OID generation structure..



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Atri Sharma
>
>
>
> I think you have no concept how invasive that would be.  Tables not
> represented in the catalogs would be a disaster, because *every single
> part of the backend* would have to be modified to deal with them as
> a distinct code path --- parser, planner, executor, loads and loads
> of utility commands, etc.  I do not think we'd accept that.  Worse yet,
> you'd also break client-side code that expects to see temp tables in
> the catalogs (consider psql \d, for example).
>
>
I might be missing a point here, but I really do not see why we would need
an alternate code path for every part of the backend. I agree that all
utility commands, and client side code would break, but if we abstract out
the syscache API and/or modify only the syscache's underlying access paths,
then would the backend really care about whether the tuple comes from
physical catalogs or in memory catalogs?



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Atri Sharma
On Tue, Mar 1, 2016 at 8:55 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Hello
>
> There are applications that create and delete a lot of temporary
> tables. Currently PostgreSQL doesn't handle such a use case well.
> Consider the following benchmark/example.
>
>
FWIW, I and Pavel have been spending some time discussing global temporary
tables, and I have been taking a shot at it. This is pretty inline with
that.

The approach you suggest sounds fine. I am personally a tad concerned about
the extra overhead of the locks and sanity of concurrency for the in memory
cache as well. Something I am not too clear about (I need to read your
email again), is about cache invalidation. Do all pages stay in the memory
always?

Also, are you proposing to change the behaviour of syscaches to not write
out those pages to disk? Or do you create a new set of caches?

Regards,

Atri


Re: [HACKERS] about google summer of code 2016

2016-02-18 Thread Atri Sharma
On 19 Feb 2016 8:30 am, "Chapman Flack"  wrote:
>
> On 02/18/16 19:35, Amit Langote wrote:
>
> > Apparently, the deadline is: February 20, 2016 at 04:00 (+0900 UTC)
> >
> > https://summerofcode.withgoogle.com/
>
> For anybody finding that web site as anti-navigable as I did, here
> are more direct links to the actual rules, and terms of agreement
> for the various participants:
>
> https://summerofcode.withgoogle.com/rules/
> https://summerofcode.withgoogle.com/terms/org
> https://summerofcode.withgoogle.com/terms/mentor
> https://summerofcode.withgoogle.com/terms/student
>
> Here is a question: does it ever happen that PostgreSQL acts as
> the org for a project that is PostgreSQL-related but isn't
> directly PGDG-led?
>
> ... there are definitely interesting and promising areas for further
> development in PL/Java beyond what I would ever have time to tackle
> solo, and I could easily enjoy mentoring someone through one or
> another of them over a summer, which could also help reinvigorate
> the project and get another developer familiar with it at a
> non-superficial level.  While I could easily see myself mentoring,
> I think it would feel like overkill to apply individually as a
> one-trick 'organization'.
>
> I see that there was a "based on PL/Java" GSoC'12 project, so maybe
> there is some room for non-core ideas under the PostgreSQL ægis?

FWIW it wasn't a PL/Java based project per se, it was a JDBC FDW.

I agree, there might be scope for non core projects and PL/Java sounds like
a good area.

Regards,

Atri


Re: [HACKERS] 2016-01 Commitfest

2016-01-18 Thread Atri Sharma
On Mon, Jan 18, 2016 at 11:05 PM, Alvaro Herrera 
wrote:

> Two weeks into the commitfest, things have moved a bit:
>
>  Needs review: 53.
>  Waiting on Author: 20.
>  Ready for Committer: 10.
>  Committed: 16.
> Total: 99.  https://commitfest.postgresql.org/8/
>
> We have two committed patches since last report -- not a lot for a whole
> week.  We've managed to review a few patches though: WoA went from 14 to
> 20.  Patch authors have something to do rather just twiddle their
> thumbs^W^W^W review others' patches.
>
> Remember: reviewing patches where somebody else has marked themselves as
> reviewer is still welcome.  You could spot things the other person might
> miss.
>
> Committers: There are patches for you to push!  Go pick one up right now!
> (And remember: marking yourself as committer for a patch makes it
> unlikely for anybody else to review/commit it.  If you do not intend to
> review+commit such a patch soon, it's probably better to remove yourself
> as committer.)
>
> Many patches seem stalled without review:
>Table partition + join pushdown
> https://commitfest.postgresql.org/8/325/
>multivariate statistics
> https://commitfest.postgresql.org/8/450/
>Implement failover on libpq level
> https://commitfest.postgresql.org/8/389/
>Unique Joins
> https://commitfest.postgresql.org/8/129/
>Replace buffer manager spinlock with atomic operations
>
> https://commitfest.postgresql.org/8/408/
>
>
I am guilty of ignoring unique joins patch. I am still up for reviewing it
later this week or weekend, but will not stall anybody else from picking it
up.

Apologies!


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma
On Tue, Jan 5, 2016 at 11:09 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Atri Sharma <atri.j...@gmail.com> writes:
> > I fail to see the relevance of which node is getting evaluated (its a
> Plan
> > node BTW) for this question. The concern I had was around using SPI
> inside
> > executor and its fail safety.
>
> The code path executor -> PL function -> SPI certainly works, so
> presumably omitting the intermediate PL function could still work.
> Whether it's a good idea is another question entirely.  I do not
> offhand see a good reason why knowledge of non-system tables should
> exist in the core executor; so what is the need to use SPI?
>
>
Thanks!

This was a weird requirement and managed to work around it but I will keep
this hack for future reference.

Regards,

Atri


Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma
On Mon, Jan 11, 2016 at 10:48 PM, Atri Sharma <atri.j...@gmail.com> wrote:

> Sorry, I missed this email.
>
> I was specifically targeting accessing tables inside Node evaluation hence
> do not want to add new nodes.
>
> Thanks for your inputs!
>
> Regards,
>
> Atri
>
> On Tue, Jan 5, 2016 at 11:43 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>> On 2016/01/05 14:30, Atri Sharma wrote:
>> > On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <
>> langote_amit...@lab.ntt.co.jp>
>> >> On 2016/01/05 3:53, Atri Sharma wrote:
>> >>> I was wary to use SPI inside the executor for node evaluation
>> functions.
>> >>> Does it seem safe?
>> >>
>> >> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
>> >> that you are talking about? I guess you'd know to use ExecProcNode() or
>> >> ExecEvalExpr() for them, respectively.
>> >>
>> > I fail to see the relevance of which node is getting evaluated (its a
>> Plan
>> > node BTW) for this question. The concern I had was around using SPI
>> inside
>> > executor and its fail safety.
>>
>> Sorry, I may have misunderstood your question(s). Seeing your first
>> question in the thread, I see that you're looking to query non-system
>> tables within the executor. AFAIU, most of the processing within executor
>> takes the form of some node in some execution pipeline of a plan tree.
>> Perhaps, you're imagining some kind of node, subnode or some such. By the
>> way, some places like ATRewriteTable(), validateCheckConstraint() scan
>> user tables directly using low-level utilities within a dummy executor
>> context. I think Jim suggested something like that upthread.
>>
>>

Sorry for top posting.

Regards,

Atri


Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma
Sorry, I missed this email.

I was specifically targeting accessing tables inside Node evaluation hence
do not want to add new nodes.

Thanks for your inputs!

Regards,

Atri

On Tue, Jan 5, 2016 at 11:43 AM, Amit Langote <langote_amit...@lab.ntt.co.jp
> wrote:

> On 2016/01/05 14:30, Atri Sharma wrote:
> > On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp>
> >> On 2016/01/05 3:53, Atri Sharma wrote:
> >>> I was wary to use SPI inside the executor for node evaluation
> functions.
> >>> Does it seem safe?
> >>
> >> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
> >> that you are talking about? I guess you'd know to use ExecProcNode() or
> >> ExecEvalExpr() for them, respectively.
> >>
> > I fail to see the relevance of which node is getting evaluated (its a
> Plan
> > node BTW) for this question. The concern I had was around using SPI
> inside
> > executor and its fail safety.
>
> Sorry, I may have misunderstood your question(s). Seeing your first
> question in the thread, I see that you're looking to query non-system
> tables within the executor. AFAIU, most of the processing within executor
> takes the form of some node in some execution pipeline of a plan tree.
> Perhaps, you're imagining some kind of node, subnode or some such. By the
> way, some places like ATRewriteTable(), validateCheckConstraint() scan
> user tables directly using low-level utilities within a dummy executor
> context. I think Jim suggested something like that upthread.
>
> Thanks,
> Amit
>
>
>


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Accessing non catalog table in backend

2016-01-04 Thread Atri Sharma
Thanks!

I was wary to use SPI inside the executor for node evaluation functions.
Does it seem safe?

Regards,

Atri
On 5 Jan 2016 12:20 am, "Jim Nasby" <jim.na...@bluetreble.com> wrote:

> On 1/4/16 12:07 PM, Atri Sharma wrote:
>
>> Hi All,
>>
>> I wanted to check if it is possible to query a non catalog table in
>> backend.
>>
>> I understand that cql interface is only for catalog querying hence it is
>> not usable for this purpose per se.
>>
>
> AFAIK it's possible to do it with low level routines, but I think unless
> you have a really good reason to go that route you're much better off just
> using SPI. If it's good enough for plpgsql... :)
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] Accessing non catalog table in backend

2016-01-04 Thread Atri Sharma
On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <langote_amit...@lab.ntt.co.jp>
wrote:

> On 2016/01/05 3:53, Atri Sharma wrote:
> > I was wary to use SPI inside the executor for node evaluation functions.
> > Does it seem safe?
>
> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
> that you are talking about? I guess you'd know to use ExecProcNode() or
> ExecEvalExpr() for them, respectively.
>
> Thanks,
> Amit
>
>
>
I fail to see the relevance of which node is getting evaluated (its a Plan
node BTW) for this question. The concern I had was around using SPI inside
executor and its fail safety.

Regards,

Atri


[HACKERS] Accessing non catalog table in backend

2016-01-04 Thread Atri Sharma
Hi All,

I wanted to check if it is possible to query a non catalog table in backend.

I understand that cql interface is only for catalog querying hence it is
not usable for this purpose per se.

Please advice.

Regards,

Atri


Re: [HACKERS] Threads in PostgreSQL

2015-12-20 Thread Atri Sharma
On Mon, Dec 21, 2015 at 11:51 AM, sri harsha 
wrote:

> Hi,
>
>Is it possible to use threads in Postgresql ?? I am using threads in my
> foreign data wrapper and i get the following error when i use the threads .
>
> *ERROR:  stack depth limit exceeded*
> *HINT:  Increase the configuration parameter "max_stack_depth" (currently
> 2048kB), after ensuring the platform's stack depth limit is adequate.*
>
>

PostgreSQL is a process-per-backend model.

Can you elaborate on what you are using the threads for?


Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Atri Sharma
> I think the general idea is that if Commit is WAL logged, then the
> operation is considered to committed on local node and commit should
> happen on any node, only once prepare from all nodes is successful.
> And after that transaction is not supposed to abort.  But I think you are
> trying to optimize the DTM in some way to not follow that kind of
protocol.
> By the way, how will arbiter does the recovery in a scenario where it
> crashes, won't it need to contact all nodes for the status of in-progress
or
> prepared transactions?
> I think it would be better if more detailed design of DTM with respect to
> transaction management and recovery could be updated on wiki for having
> discussion on this topic.  I have seen that you have already updated many
> details of the system, but still the complete picture of DTM is not clear.

I agree.

I have not been following this discussion but from what I have read above I
think the recovery model in this design is broken. You have to follow some
protocol, whichever you choose.

I think you can try using something like Paxos,  if you are looking at a
higher reliable model but don't want the overhead of 3PC.


Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Atri Sharma
> I'm thinking the client should get some kind of a token back from the
commit, and it could use the token on the standby, to wait for that commit
to be applied. The token could be just the XID, or the LSN of the commit
record. Or the application could generate the token and pass it to the
server in the commit, similar to how 2PC works. So the interaction would be
something like:
>
> In master:
> BEGIN;
> INSERT INTO FOO ...;
> COMMIT;
> Server returns: COMMITted with token 1234
>
> Later, in standby:
> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> SELECT * FROM foo;

+1.

The LSN should be good enough IMO.


Re: [HACKERS] count_nulls(VARIADIC any)

2015-08-13 Thread Atri Sharma
On Thu, Aug 13, 2015 at 12:55 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:



 2015-08-13 9:21 GMT+02:00 Marko Tiikkaja ma...@joh.to:

 On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:

 nnulls()


 I think I'd prefer num_nulls() over that.


 can be

 what about similar twin function num_nonulls()?


+1


Re: [HACKERS] Commitfest remaining Needs Review items

2015-08-10 Thread Atri Sharma

 * Unique Joins


 Still needs to be reviewed. Any volunteers?



Can take this one up, if its within my limits.


Re: [HACKERS] Updatable view?

2015-07-31 Thread Atri Sharma
On 31 Jul 2015 11:59, Tatsuo Ishii is...@postgresql.org wrote:

  On 31 Jul 2015 10:15, Tatsuo Ishii is...@postgresql.org wrote:
 
   I think it would be nice to have... but not to the point of working
on
   it myself.
  
   Might be worth an email to -general to see how many people have
   immediate use for it.
 
  What I am thinking about is,
 
  1) Implement certain class of updatable views allowed in SQL:1999
 (UNION ALL, natural joins)
 
  2) Anything beyond #1 (I have no idea for now)
 
  Let me see how people are interested in...
 
 
  How does the standard define it? Do they also follow the same MVCC
  semantics as normal tables?

 In my understanding there's no such concept like MVCC in the standard.
 Anyway in our implementation, we should keep the MVCC semantics of
 course.


Yes I meant our internal MVCC semantics. I will have to look at the way
MVCC handles views for exact logic though

  I am concerned that we may end up losing read
  performance for views if we implement this (unless I am missing
something)

 Why do updatable views lose read performance? I thought the only
 performance concern will be in the update/delete/insert operations.

I meant update, sorry. Pre coffee mails tend to be incorrect :)


Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Atri Sharma
On Wed, Jul 15, 2015 at 12:57 PM, Jeff Davis pg...@j-davis.com wrote:

 On Tue, 2015-07-14 at 16:19 -0400, Robert Haas wrote:
  tuplesort.c does its own accounting, and TBH that seems like the right
  thing to do here, too.  The difficulty is, I think, that some
  transition functions use an internal data type for the transition
  state, which might not be a single palloc'd chunk.  But since we can't
  spill those aggregates to disk *anyway*, that doesn't really matter.

 So would it be acceptable to just ignore the memory consumed by
 internal, or come up with some heuristic?

 Regards,
 Jeff Davis


I think a heuristic would be more suited here and ignoring memory
consumption for internal types means that we are not making the memory
accounting useful for a set of usecases.



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] [idea] table partition + hash join

2015-06-10 Thread Atri Sharma
On Wed, Jun 10, 2015 at 2:16 PM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:


 Perhaps the qual needs to be pushed all the way down
 to the Hash's underlying scan if that makes sense.



And that is a Pandora's box of troubles IMHO unless done in a very careful
manner.


Re: [HACKERS] Improving GEQO

2015-05-28 Thread Atri Sharma
On Fri, May 29, 2015 at 12:59 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, May 27, 2015 at 3:06 PM, boix b...@uclv.cu wrote:
  Hello, my partner and me are working with the goal of improve the GEQO's
  performance, we tried with Ant Colony Optimization, but it does not
 improve,
  actually we are trying with a new variant of Genetic Algorithm,
 specifically
  Micro-GA. This algorithm finds a better solution than GEQO in less time,
  however the total query execution time is higher. The fitness is
 calculated
  by geqo_eval function. Does anybody know why this happens?
 
  We attach the patch made with the changes in postgresql-9.2.0.

 can you submit more details?  for example 'explain analyze' (perhaps
 here: http://explain.depesz.com/) of the plans generated GEQO vs GA vs
 stock?  It sounds like you might be facing an estimation miss which is
 not really an issue a better planner could solve.

 That said, assuming you're getting 'better' plans in less time suggest
 you might be on to something.

 merlin



What sort of tests are you running? I suspect that anything which is not
too well thought out and tested might end up performing well only on small
subset of tests.

Also, what is the consistency of the plans generated? If you are only
targeting planning time, I feel it might be of lesser value. However, if
you can get large order joins to be executed in a near optimal (brute
force) solution, you might be on to something.

Something I would like to see done is remove the dead code that is present
in existing GEQO. This might alone lead to lesser compilation times.


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Support UPDATE table SET(*)=...

2015-02-13 Thread Atri Sharma
Hi all,

Sorry for the delay.

Please find attached latest version of UPDATE (*) patch.The patch
implements review comments and Tom's gripe about touching
transformTargetList is addressed now. I have added regression tests and
simplified parser representation a bit.

Regards,

Atri
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 35b0699..1f68bdf 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -25,7 +25,9 @@ PostgreSQL documentation
 UPDATE [ ONLY ] replaceable class=PARAMETERtable_name/replaceable [ * ] [ [ AS ] replaceable class=parameteralias/replaceable ]
 SET { replaceable class=PARAMETERcolumn_name/replaceable = { replaceable class=PARAMETERexpression/replaceable | DEFAULT } |
   ( replaceable class=PARAMETERcolumn_name/replaceable [, ...] ) = ( { replaceable class=PARAMETERexpression/replaceable | DEFAULT } [, ...] ) |
-  ( replaceable class=PARAMETERcolumn_name/replaceable [, ...] ) = ( replaceable class=PARAMETERsub-SELECT/replaceable )
+  ( replaceable class=PARAMETERcolumn_name/replaceable [, ...] ) = ( replaceable class=PARAMETERsub-SELECT/replaceable ) |
+  ( replaceable class=PARAMETER*/replaceable [, ...] ) = ( { replaceable class=PARAMETERexpression/replaceable | DEFAULT } [, ...] ) |
+  ( replaceable class=PARAMETER*/replaceable [, ...] ) = ( replaceable class=PARAMETERsub-SELECT/replaceable )
 } [, ...]
 [ FROM replaceable class=PARAMETERfrom_list/replaceable ]
 [ WHERE replaceable class=PARAMETERcondition/replaceable | WHERE CURRENT OF replaceable class=PARAMETERcursor_name/replaceable ]
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index a68f2e8..6d08dbd 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1937,6 +1937,101 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
 	nsitem-p_lateral_only = false;
 	nsitem-p_lateral_ok = true;
 
+	/*
+	 * Check if (SET(*) = SELECT ...)  is present. If it is present we
+	 * need to resolve and populate the remaining needed MultiAssignRefs in the
+	 * target list. We modify target list in place and add needed MultiAssignRefs.
+	 */
+	if (list_length(stmt-targetList) == 1)
+	{
+		ResTarget *current_val = linitial(stmt-targetList);
+
+		/* Currently there is no way that ResTarget node's name value in UPDATE command
+		 * is set to NULL except for UPDATE SET (*) case.
+		 * Hence we can safely depend on name value being NULL as a check for
+		 * presence of UPDATE SET (*) case.
+		 */
+		if (current_val-name == NULL)
+		{
+			List *rel_cols_list;
+			List *expanded_tlist = NULL;
+			List *sub_list = NULL;
+			ListCell *lc_val;
+			ListCell *lc_relcol;
+			int rteindex = 0;
+			int sublevels_up = 0;
+			int i = 0;
+
+			rteindex = RTERangeTablePosn(pstate, pstate-p_target_rangetblentry,
+		 sublevels_up);
+
+			expandRTE(pstate-p_target_rangetblentry, rteindex, sublevels_up,
+	  current_val-location, false,
+	  (rel_cols_list), NULL);
+
+
+			/* SET(*) = (SELECT ...) case */
+			if (IsA(current_val-val, MultiAssignRef))
+			{
+MultiAssignRef *orig_val = (MultiAssignRef *) (current_val-val);
+
+orig_val-ncolumns = list_length(rel_cols_list);
+
+Assert(sub_list == NULL);
+
+sub_list = list_make1(orig_val);
+
+/* Change targetlist to have corresponding ResTarget nodes
+ * as corresponding to the columns in target relation */
+for (i = 1;i  list_length(rel_cols_list);i++)
+{
+	MultiAssignRef *r = makeNode(MultiAssignRef);
+
+	r-source = orig_val-source;
+	r-colno = i + 1;
+	r-ncolumns = orig_val-ncolumns;
+
+	lappend(sub_list, r);
+}
+			}
+			else if (IsA(current_val-val, List))  /* SET(*) = (val, val,...) case */
+			{
+
+Assert(sub_list == NULL);
+sub_list = (List *) (current_val-val);
+			}
+			else
+			{
+elog(ERROR, Unknown type in UPDATE command);
+			}
+
+			if (list_length(rel_cols_list) != list_length(sub_list))
+elog(ERROR, number of columns does not match number of values);
+
+			/* Change targetlist to have corresponding ResTarget nodes
+			 * as corresponding to the columns in target relation */
+			forboth(lc_val, sub_list, lc_relcol, rel_cols_list)
+			{
+ResTarget *current_res = makeNode(ResTarget);
+
+current_res-name = strVal(lfirst(lc_relcol));
+current_res-val = (Node *) (lfirst(lc_val));
+
+if (expanded_tlist == NULL)
+{
+	expanded_tlist = list_make1(current_res);
+}
+else
+{
+	lappend(expanded_tlist, current_res);
+}
+			}
+
+			stmt-targetList = expanded_tlist;
+		}
+	}
+
+
 	qry-targetList = transformTargetList(pstate, stmt-targetList,
 		  EXPR_KIND_UPDATE_SOURCE);
 
@@ -1982,18 +2077,20 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
 			continue;
 		}
 		if (origTargetList == NULL)
-			elog(ERROR, UPDATE target count mismatch --- internal error);
+elog(ERROR, UPDATE target count mismatch 

Re: [HACKERS] [pgsql-advocacy] GSoC 2015 - mentors, students and admins.

2015-02-09 Thread Atri Sharma
I am up for mentoring again.
On 10 Feb 2015 02:23, Thom Brown t...@linux.com wrote:

 Hi all,

 Google Summer of Code 2015 is approaching.  I'm intending on registering
 PostgreSQL again this year.

 Before I do that, I'd like to have an idea of how many people are
 interested in being either a student or a mentor.

 I've volunteered to be admin again, but if anyone else has a strong
 interest of seeing the projects through this year, please let yourself be
 known.

 Who would be willing to mentor projects this year?

 Project ideas are welcome, and I've copied many from last year's
 submissions into this year's wiki page.  Please feel free to add more (or
 delete any that stand no chance or are redundant):
 https://wiki.postgresql.org/wiki/GSoC_2015

 Students can find more information about GSoC here:
 http://www.postgresql.org/developer/summerofcode

 Thanks

 Thom



Re: [HACKERS] Implementation of global temporary tables?

2015-02-02 Thread Atri Sharma
 Some was changed from 2009:

 * We have updatable CTE

 * We have unlogged tables

 Opened questions:

 1. Name and conformance with standard of this feature - because we doesn't
 support modules, a mapping ANSI - PG should not be trivial

 2. Implementation

 I see three possible ways how to implement it:

 2.a - using on demand created temp tables - most simple solution, but
 doesn't help with catalogue bloating

 2.b - using unlogged tables for holding statistics, relfilenode, and all
 necessary data

 3.c - store ephemeral metadata only in memory without MVCC


With 2.a, essentially, we are defining a global definition of a temp table,
but the actual per session objects still follow the same rules are our
current temp tables do?

2.b seems like a lot of new data, and if we are defining new paths for e.g.
statistics for global temp tables, we might end up adding new logic in
planner to use those tables. I am not seeing how this will work.

Could you elaborate a bit on 3.c please?

Something that really bothers me here, on a different note, is the catalog
churn this could cause. Of course, you mentioned in as a point in your
email, but I feel that 2.a's showstopper could be the massive catalog churn
it causes. Maybe have a way to manage such tables without getting pg_class
to bloat pretty quickly (I am assuming you do not mean metadata as the
catalog metadata in 3.c).

Regards,

Atri


Re: [HACKERS] Implementation of global temporary tables?

2015-02-02 Thread Atri Sharma


  1. Main catalogue will be stable.
  2. There is not necessary to implement new storage and it can helps with
  transaction support.

 The amount of complexity that'd be involved to store catalog data in a
 separate relation around the caches and accesses would be significant. I
 don't think that's a realistic option.


Not to mention the problems we might end up in. We still have corner cases
in our cache code, and a new heap on top of it all might be just too
painful.


3.c - store ephemeral metadata only in memory without MVCC
  
   I think that's not an option. That'd end up being a massive amount of
   duplication at a low rate of functionality.
  
 
  I don't plan to implement a storage - I expect only few functions for
  store/read data from session memory context

 What does it have to do with temp tables then?


I think what Pavel means here is that we do not need a full fledged heap
layer and rather only a minimal API from a per session memory context.
However, that might be still as painful because we will eventually end up
inventing mechanisms for syscache and typcache to work with this storage,
which IMO is the biggest pain point around this idea.


Regards,

Atri

Regards,

Atri
*l'apprenant*


Re: [HACKERS] Implementation of global temporary tables?

2015-02-02 Thread Atri Sharma
On Mon, Feb 2, 2015 at 6:34 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:



 2015-02-02 13:36 GMT+01:00 Atri Sharma atri.j...@gmail.com:


  1. Main catalogue will be stable.
  2. There is not necessary to implement new storage and it can helps
 with
  transaction support.

 The amount of complexity that'd be involved to store catalog data in a
 separate relation around the caches and accesses would be significant. I
 don't think that's a realistic option.


 Not to mention the problems we might end up in. We still have corner
 cases in our cache code, and a new heap on top of it all might be just too
 painful.


3.c - store ephemeral metadata only in memory without MVCC
  
   I think that's not an option. That'd end up being a massive amount of
   duplication at a low rate of functionality.
  
 
  I don't plan to implement a storage - I expect only few functions for
  store/read data from session memory context

 What does it have to do with temp tables then?


 I think what Pavel means here is that we do not need a full fledged heap
 layer and rather only a minimal API from a per session memory context.
 However, that might be still as painful because we will eventually end up
 inventing mechanisms for syscache and typcache to work with this storage,
 which IMO is the biggest pain point around this idea.


 It should be solvable - I see another risk - if we accelerate a work with
 temp tables, then 4 byte oid should not be enough.




Hrm, that might well be true. It might be worth the effort to find a better
way to materialize global temp tables then, like having a single OID and
only materializing a relfilenode for a session when the session inserts
into the temp table. Not sure here at all...



-- 
Regards,

Atri
*l'apprenant*


[HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-06 Thread Atri Sharma
On Tuesday, January 6, 2015, Amit Langote langote_amit...@lab.ntt.co.jp
wrote:

 On 06-01-2015 PM 04:26, Atri Sharma wrote:
  On Tue, Jan 6, 2015 at 12:43 PM, Amit Langote 
 langote_amit...@lab.ntt.co.jp javascript:;
  wrote:
  Though, I have no strong opinion on whether one thing is good or the
  other or whether they cover some particular use case all the same.
  Perhaps you can say that better.
 
 
  Personally, I think returning non ordered rows when ORDER BY clause is
  specifically specified by user is a gross violation of security and could
  lead to major user application breakdowns, since the application will
 trust
  that postgres will return the rows in order since ORDER BY was specified.
  Of course, what Ashutosh suggested makes the patch much simpler, but I
  would rather not go down that road.
 

 I think the same thing applies to IMMUTABLE declarations for example.
 Planner trusts (or take as a hint) such declarations during, say,
 constraint exclusion where quals involving non-immutable functions are
 kept out of the exclusion proof. If a miscreant declares a non-immutable
 function IMMUTABLE, then constraint violations may ensue simply because
 planner trusted the miscreant. That is, such unsafe restrict clauses
 would wrongly prove a partition as being unnecessary to scan. I am sure
 there are other sites where such bets are made. In that light, I might
 as well call them hints than anything.

 manual
 The volatility category is a *promise* to the optimizer about the
 behavior of the function
 /manual

 Though, as I said ordering behavior *may not be* a good candidate to
 make such promises.

 On the other hand, what such a thing might help with, are the situations
 where a developer is frustrated because planner would ignore (or is
 uninformed about) the order that the developer *knows* his function
 produces.

 But, if the node you propose to enforce the order is good enough, then
 it may be worthwhile to go that route, :)



The purpose of the patch is to give the planner an option to use the
preorder that the developer knows will be produced. However, since ensuring
against developer induced errors in this case is relatively cheap, I think
the new node is worth it.


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma



 The overhead of this patch is small. A new path is added for the preorder
 keys, and OrderCheck node's additional cost is pretty low, given that it
 only compares two rows and stores only a single row (previous row seen),
 hence the memory footprint is minuscule.


 We can eliminate the new node and put onus or having the right order on
 the user like we do with volatile setting of the function.



That is exactly what the new node does, since we are not re sorting right
now in case the order is wrong. Please see my explanation upthread,
OrderCheck node's primary purpose is to check for a user error in the
result rows order. The onus right now to give correct order is on user.

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote langote_amit...@lab.ntt.co.jp
 wrote:

 On 06-01-2015 PM 04:00, Ashutosh Bapat wrote:
  On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com
 wrote:
  We can eliminate the new node and put onus or having the right order on
  the user like we do with volatile setting of the function.
 
 
 
  That is exactly what the new node does, since we are not re sorting
 right
  now in case the order is wrong. Please see my explanation upthread,
  OrderCheck node's primary purpose is to check for a user error in the
  result rows order. The onus right now to give correct order is on user.
 
 
  Even checking whether the output of the function is in the right order or
  not, has its cost. I am suggesting that we can eliminate this cost as
 well.
  For example, PostgreSQL does not check whether a function is really
  immutable or not.
 

 Sounds something like ORDERED BY x implying output is known ordered by
 x perhaps enough hint for the planner to make necessary plan choices
 though I may be wrong.



I may be missing something, but isnt what you mentioned the exact
functionality this patch adds?


Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
On Tue, Jan 6, 2015 at 12:30 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:



 On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma atri.j...@gmail.com wrote:




 Even checking whether the output of the function is in the right order
 or not, has its cost. I am suggesting that we can eliminate this cost as
 well. For example, PostgreSQL does not check whether a function is really
 immutable or not.



That implies possibly returning a non ordered result set even when the user
explicitly specified an ORDER BY clause. If we are depending on an
optimization and it did not work out (even if it is a user error), I think
we should error out indicating that the order was incorrect rather than
returning non ordered rows, which could be disastrous IMO.


Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
On Mon, Jan 5, 2015 at 11:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David G Johnston david.g.johns...@gmail.com writes:
  Atri Sharma wrote
  If order of result rows is not the same as required, an error is raised:
 
  SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
  ERROR:  Order not same as specified

  First reaction for the error was unfavorable but (see below) it likely is
  the best option and does adequately cover the reason for failure -
  programmer error.

 TBH, my first reaction to this entire patch is unfavorable: it's a
 solution in search of a problem.  It adds substantial complication not
 only for users but for PG developers in order to solve a rather narrow
 performance issue.


I could agree about the scope of the performance issue, but am not sure
about the added complication. It essentially is similar to, say, a
combination of how Unique is implemented with a flavour or ORDINALITY
implementation. A new path that is added in a certain number of cases plus
a low overhead node does not seem too bad to me IMO. This is inline with a
lot of real world cases I have seen, where the data is *bubbled* up to
SRFs, which does give a possibility of an existing order. Couple it with
the power to specify ORDER BY in your SRF function and you could save a lot.

I am not sure how it complicates for hackers.  Could you please elaborate a
bit?


 What would make sense to me is to teach the planner about inlining
 SQL functions that include ORDER BY clauses, so that the performance
 issue of a double sort could be avoided entirely transparently to
 the user.


It sounds good, but inlining in current way shall restrict the scope of
optimization (which is not applicable for current design). For eg, you
cannot inline RECORD returning SRFs...


-- 
Regards,

Atri
*l'apprenant*


[HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma
Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE
FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record
as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that
might be present inherently in the SRF's input or algorithm (the above
example and think generate_series).

This allows for eliminating sorts that can be based off the known existing
preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort
by e since  existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input
to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY
clause is false. For checking the order, a new node is added which is top
node of the plan and is responsible for projecting result rows. It tracks
the previous row seen and given a sort order, ensures that the current
tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
  QUERY PLAN
---
 OrderCheck
   -  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then
stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially
useful when SRF inputs are large and/or might be pipelined from another
function (SRFs are used in pipelines in analytical systems many times, with
large data).

The overhead of this patch is small. A new path is added for the preorder
keys, and OrderCheck node's additional cost is pretty low, given that it
only compares two rows and stores only a single row (previous row seen),
hence the memory footprint is minuscule.

In the inner joins thread, Tom mentioned having a new node which has
multiple plans and executor can decide which plan to execute given runtime
conditions. I played around with the idea, and am open to experiment having
a new node which has a Sort based plan and is executed in case OrderCheck
node sees that the inherent order of result tuples is not correct. Feedback
here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri


orderbycreatefuncver1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Atri Sharma
   ChainAggregate is

 a bit like a node having two parents, a Sort and a GroupAggregate.
 However,
 the graph edge between ChainAggregate and its GroupAggregate is a
 tuplestore
 instead of the usual, synchronous ExecProcNode().


Well, I dont buy the two parents theory. The Sort nodes are intermediately
stacked amongst ChainAggregate nodes, so there is still the single edge.
However, as you rightly said, there is a shared tuplestore, but note that
only the head of chain ChainAggregate has the top GroupAggregate as its
parent.


 Suppose one node orchestrated all sorting and aggregation.  Call it a
 MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
 performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
 would directly manage two Tuplesortstate, CUR and NEXT.  The node would
 have
 an initial phase similar to ExecSort(), in which it drains the outer node
 to
 populate the first CUR.  After that, it looks more like
 agg_retrieve_direct(),
 except that CUR is the input source, and each tuple drawn is also put into
 NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
 This
 design does not add I/O consumption or require a nonstandard communication
 channel between executor nodes.  Tom, Andrew, does that look satisfactory?


So you are essentially proposing merging ChainAggregate and its
corresponding Sort node?

So the structure would be something like:

GroupAggregate
-- MultiGroupAgg (a,b)
 MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level
GroupAggregate node projects the result of the entire operation. The key to
ChainAggregate nodes is that each ChainAggregate node handles grouping sets
that fit a single ROLLUP list i.e. can be done by a single sort order.
There can be multiple lists of this type in a single GS operation, however,
our current design has only a single top GroupAggregate node but a
ChainAggregate node + Sort node per sort order. If you are proposing
replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
with a single MultiGroupAggregate node, I am not able to understand how it
will handle all the multiple sort orders. If you are proposing replacing
only ChainAggregate + Sort node with a single MultiGroupAgg node, that
still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please
correct me if I am wrong.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] mysql with postgres

2014-12-23 Thread Atri Sharma
On Tue, Dec 23, 2014 at 3:06 PM, Ravi Kiran ravi.kolanp...@gmail.com
wrote:

 hi all,


 Is postgres source code compatible with mysql database?? If it is, could
 someone could give me some links so that I can do that.

 I want to hack into the postgres source code, but as I am comfortable with
 mysql, I want to use the mysql database not postgres.

 any references would be fine.

 thank you


Eh...what?


Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Atri Sharma
On Tuesday, December 23, 2014, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Dec 22, 2014 at 11:19 AM, Andrew Gierth
 and...@tao11.riddles.org.uk javascript:; wrote:
   Tom The other reason that's a bad comparison is that I've not seen
   Tom many queries that use more than a couple of window frames,
   Tom whereas we have to expect that the number of grouping sets in
   Tom typical queries will be significantly more than a couple.
 
  I would be interested in seeing more good examples of the size and
  type of grouping sets used in typical queries.

 From what I have seen, there is interest in being able to do things
 like GROUP BY CUBE(a, b, c, d) and have that be efficient.  That will
 require 16 different groupings, and we really want to minimize the
 number of times we have to re-sort to get all of those done.  For
 example, if we start by sorting on (a, b, c, d), we want to then make
 a single pass over the data computing the aggregates with (a, b, c,
 d), (a, b, c), (a, b), (a), and () as the grouping columns.



That is what ChainAggregate node does exactly. A set of orders that fit in
a single ROLLUP list (like your example) are processed in a single go.


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Commitfest problems

2014-12-18 Thread Atri Sharma
On Thu, Dec 18, 2014 at 8:44 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Wed, Dec 17, 2014 at 5:00 PM, Stephen Frost sfr...@snowman.net wrote:
 
 
  Another thought I had was to suggest we consider *everyone* to be a
  contributor and implement a way to tie together the mailing list
  archives with the commit history and perhaps the commitfest app and make
  it searchable and indexed on some website.  eg:
 
  contributors.postgresql.org/sfrost
- Recent commits
- Recent commit mentions
- Recent emails to any list
- Recent commitfest app activity
- Recent wiki page updates
  ...
 
  Ideally with a way for individuals to upload a photo, provide a company
  link, etc, similar to what the existing Major Contributors have today.
  Obviously, this is not a small task to develop and there is some risk of
  abuse (which I expect the other folks on the infra team will point out
  and likely tar and feather me for suggesting this at all..) but it might
  be along the same lines as Bruce's PgLife..
 



+1

It does feel good to be acknowledged for our work especially when there is
a policy to acknowledge this in our community.

Regards,

Atri


Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Atri Sharma
On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs si...@2ndquadrant.com wrote:

 KaiGai, David Rowley and myself have all made mention of various ways
 we could optimize aggregates.

 Following WIP patch adds an extra function called a combining
 function, that is intended to allow the user to specify a
 semantically correct way of breaking down an aggregate into multiple
 steps.

 Gents, is this what you were thinking? If not...



A quick look at the patch makes me assume that the patch does not handle
the problem of combining transvals or move at all in that direction (which
is fine, just reconfirming).

So, essentially, we are adding a grand total on top of individual sum()
or count() operations,right?

Also, should we not have a sanity check for the user function provided?


Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Atri Sharma
On Wed, Dec 17, 2014 at 6:05 PM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

 Simon,

 Its concept is good to me. I think, the new combined function should be
 responsible to take a state data type as argument and update state object
 of the aggregate function. In other words, combined function performs like
 transition function but can update state object according to the summary
 of multiple rows. Right?

 It also needs some enhancement around Aggref/AggrefExprState structure to
 inform which function shall be called on execution time.
 Combined functions are usually no-thank-you. AggrefExprState updates its
 internal state using transition function row-by-row. However, once someone
 push-down aggregate function across table joins, combined functions have
 to be called instead of transition functions.
 I'd like to suggest Aggref has a new flag to introduce this aggregate
 expects
 state object instead of scalar value.

 Also, I'd like to suggest one other flag in Aggref not to generate final
 result, and returns state object instead.



So are you proposing not calling transfuncs at all and just use combined
functions?

That sounds counterintuitive to me. I am not able to see why you would want
to avoid transfns totally even for the case of pushing down aggregates that
you mentioned.

From Simon's example mentioned upthread:

PRE-AGGREGATED PLAN
Aggregate
- Join
 - PreAggregate (doesn't call finalfn)
  - Scan BaseTable1
 - Scan BaseTable2

finalfn wouldnt be called. Instead, combined function would be responsible
for getting preaggregate results and combining them (unless of course, I am
missing something).

Special casing transition state updating in Aggref seems like a bad idea to
me. I would think that it would be better if we made it more explicit i.e.
add a new node on top that does the combination (it would be primarily
responsible for calling combined function).

Not a good source of inspiration, but seeing how SQL Server does it
(Exchange operator + Stream Aggregate) seems intuitive to me, and having
combination operation as a separate top node might be a cleaner way.

I may be wrong though.

Regards,

Atri


Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Atri Sharma
On Wed, Dec 17, 2014 at 7:18 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 17 December 2014 at 12:35, Kouhei Kaigai kai...@ak.jp.nec.com wrote:

  Its concept is good to me. I think, the new combined function should be
  responsible to take a state data type as argument and update state object
  of the aggregate function. In other words, combined function performs
 like
  transition function but can update state object according to the summary
  of multiple rows. Right?

 That wasn't how I defined it, but I think your definition is better.

  It also needs some enhancement around Aggref/AggrefExprState structure to
  inform which function shall be called on execution time.
  Combined functions are usually no-thank-you. AggrefExprState updates its
  internal state using transition function row-by-row. However, once
 someone
  push-down aggregate function across table joins, combined functions have
  to be called instead of transition functions.
  I'd like to suggest Aggref has a new flag to introduce this aggregate
 expects
  state object instead of scalar value.
 
  Also, I'd like to suggest one other flag in Aggref not to generate final
  result, and returns state object instead.
 
  Let me use David's example but little bit adjusted.
 
  original)
  SELECT p.name, AVG(s.qty)
FROM sales s INNER JOIN product p ON s.product_id = s.product_id
GROUP BY p.name;
 
  modified)
  SELECT p.name, AVG(qty)
FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY
 product_id) s
 INNER JOIN product p
ON p.product_id = s.product_id GROUP BY p_name;
 
  Let's assume the modifier set a flag of use_combine_func on the AVG(qty)
 of
  the main query, and also set a flag of not_generate_final on the
 AVG(qty) of
  the sub-query.
  It shall work as we expected.

 That matches my thinking exactly.

 David, if you can update your patch with some docs to explain the
 behaviour, it looks complete enough to think about committing it in
 early January, to allow other patches that depend upon it to stand a
 chance of getting into 9.5. (It is not yet ready, but I see it could
 be).

 The above example is probably the best description of the need, since
 user defined aggregates must also understand this.

 Could we please call these combine functions or other? MERGE is an
 SQL Standard statement type that we will add later, so it will be
 confusing if we use the merge word in this context.

 David, your patch avoids creating any mergefuncs for existing
 aggregates. We would need to supply working examples for at least a
 few of the builtin aggregates, so we can test the infrastructure. We
 can add examples for all cases later.



I am still missing how and why we skip transfns. What am I missing,please?


Re: [HACKERS] Commitfest problems

2014-12-16 Thread Atri Sharma
On Wed, Dec 17, 2014 at 12:03 AM, Stephen Frost sfr...@snowman.net wrote:

 David,

 * David Rowley (dgrowle...@gmail.com) wrote:
  I'd just like to add something which might be flying below the radar of
  more senior people. There are people out there  (ike me)  working on
  PostgreSQL more for the challenge and perhaps the love of the product,
 who
  make absolutely zero money out of it. For these people getting credit
 where
  it's due is very important. I'm pretty happy with this at the moment and
 I
  can't imagine any situation where not crediting reviewers would be
  beneficial to anyone.

 Thanks for this.  One question which has been brought up in the past is
 the specific level of credit.  That you're happy with the credit
 you've been given thus far is great as it happens to support the side
 that I'm on. :D

 However, could you quantify what, exactly, you feel is approrpiate
 credit for reviewers and authors..?  I'll intentionally omit the options
 that have been presented in the past to try and avoid influencing your
 response.



Mentioning them in list of contributors, for one.


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-12-15 Thread Atri Sharma
On Mon, Dec 8, 2014 at 6:06 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Wed, Nov 26, 2014 at 4:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I think what's likely missing here is a clear design for the raw parse
  tree representation (what's returned by the bison grammar).  The patch
  seems to be trying to skate by without creating any new parse node types
  or fields, but that may well be a bad idea.  At the very least there
  needs to be some commentary added to parsenodes.h explaining what the
  representation actually is (cf commentary there for MultiAssignRef).
 
  Also, I think it's a mistake not to be following the MultiAssignRef
  model for the case of (*) = ctext_row.  We optimize the ROW-source
  case at the grammar stage when there's a fixed number of target columns,
  because that's a very easy transformation --- but you can't do it like
  that when there's not.  It's possible that that optimization should be
  delayed till later even in the existing case; in general, optimizing
  in gram.y is a bad habit that's better avoided ...
 Marking as returned with feedback based on those comments.


Thank you everybody for your comments.

I am indeed trying to avoid a new node since my intuitive feeling says that
we do not need a new node for a functionality which is present in other
commands albeit in different form. However, I agree that documentation
explaining parse representation is lacking and shall improve that. Also, in
accordance to Tom's comments, I shall look for not touching target list
directly and follow the path of MultiAssignRef.

I have moved patch to current CF and marked it as Waiting on Author since I
plan to resubmit after addressing the concerns.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 5:00 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 3 December 2014 at 09:29, David Rowley dgrowle...@gmail.com wrote:
  *** Method 3: Marking scans as possibly skippable during planning and
  removing redundant join nodes at executor startup (Simon's method)
 
  Pros:
  1. The plan can be executed as normal if there are any foreign key
 triggers
  pending.
  2. Does not require extra code in all join types  (see cons #2 above)
  3. Does not suffer from extra node visiting overhead (see cons #3 above)
 
  Cons:
  1. Executor must modify the plan.
  2. Planner may have generated a plan which is not optimal for
 modification
  by the executor (e.g. Sort nodes for merge join, or index scans for
  pre-sorted input won't become seqscans which may be more efficient as
  ordering may not be required after removing a merge join)
 
  With each of the methods listed above, someone has had a problem with,
 and
  from the feedback given I've made changes based and ended up with the
 next
  revision of the patch.
 
  Tom has now pointed out that he does not like the executor modifying the
  plan, which I agree with to an extent as it I really do hate the extra
  useless nodes that I'm unable to remove from the plan.

 I guess we need an Option node. Tom and I discussed that about an aeon ago.

 The Option node has a plan for each situation. At execution time, we
 make the test specified in the plan and then select the appropriate
 subplan.

 That way we can see what is happening in the plan and the executor
 doesn't need to edit anything.



So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

So is the idea essentially making the planner return a set of best plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?

IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)

I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 8:32 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  So the planner keeps all possibility satisfying plans, or it looks at the
  possible conditions (like presence of foreign key for this case, for eg)
  and then lets executor choose between them?

 Right, this was one of the thoughts that I had.

  So is the idea essentially making the planner return a set of best
 plans,
  one for each condition? Are we assured of their optimality at the local
  level i.e. at each possibility?

 We *already* have an idea of there being multiple plans (see
 plancache.c).


 Thanks for pointing me there.

What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.

My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 11:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Stephen Frost sfr...@snowman.net writes:
  * Tom Lane (t...@sss.pgh.pa.us) wrote:
  However, even granting that that is a concern, so what?  You *have* to
  do the planning twice, or you're going to be generating a crap plan for
  one case or the other.

  Yeah, I don't see a way around that..

 Also, it occurs to me that it's only necessary to repeat the join search
 part of the process, which means that in principle the mechanisms already
 exist for that; see GEQO.  This means that for small join problems, the
 total planning time would much less than double anyway.  For large
 problems, where the join search is the bulk of the time, we could hope
 that removal of unnecessary joins would reduce the join search runtime
 enough that the second search would be pretty negligible next to the
 first (which is not optional).  So I think it'll double the runtime
 is an unfounded objection, or at least there's good reason to hope it's
 unfounded.


Is it possible to only replan part of the plan in case of this
optimization? I think that we might need to only replan parts of the
original plan (as you mentioned, join search and above). So we could reuse
the original plan in part and not do a lot of replanning (an obvious case
is scan strategy, which we can assume will not change for the two plans).

I wonder if we could have a rule based system for replacement of some plan
nodes with other type of nodes. As we discover more cases like this, we
could add more rules. Wild thought though.


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 11:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  Is it possible to only replan part of the plan in case of this
  optimization? I think that we might need to only replan parts of the
  original plan (as you mentioned, join search and above). So we could
 reuse
  the original plan in part and not do a lot of replanning (an obvious case
  is scan strategy, which we can assume will not change for the two plans).

 I think you assume wrong; or at least, I certainly would not wish to
 hard-wire any such assumption.  Skipping some joins could change the
 shape of the join tree *completely*, because the cost estimates will
 change so much.  And that could in turn lead to making different choices
 of scan methods, eg, we might or might not care about sort order of
 a scan result if we change join methods.

 regards, tom lane


Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 11:27 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  Agreed, but in some cases, we could possibly make some assumptions (if
  there is no index, if a large fraction of table will be returned in scan,
  FunctionScan).

 All neat ideas but how about we get something which works in the way
 being asked for before we start trying to optimize it..?  Maybe I'm
 missing something, but getting all of this infrastructure into place and
 making sure things aren't done to the plan tree which shouldn't be (or
 done to all of them if necessary..) is enough that we should get that
 bit done first and then worry if there are ways we can further improve
 things..



Right,sorry for digressing.

I think we are in agreement as to what needs to be done (start with a plan,
note ideas and replan if necessary). The idea of executor modifying the
plan (or personally, even choosing the plan) seems counterintuitive.

Does it also make sense to recalculate the costs from scratch for the
replan? It might be, I am just asking.

Regards,

Atri


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Stephen Frost sfr...@snowman.net writes:
  * Atri Sharma (atri.j...@gmail.com) wrote:
  Agreed, but in some cases, we could possibly make some assumptions (if
  there is no index, if a large fraction of table will be returned in
 scan,
  FunctionScan).

  All neat ideas but how about we get something which works in the way
  being asked for before we start trying to optimize it..?  Maybe I'm
  missing something, but getting all of this infrastructure into place and
  making sure things aren't done to the plan tree which shouldn't be (or
  done to all of them if necessary..) is enough that we should get that
  bit done first and then worry if there are ways we can further improve
  things..

 Yeah; moreover, there's no evidence that hard-wiring such assumptions
 would save anything.  In the example of a FunctionScan, guess what:
 there's only one Path for that relation anyway.

 That is precisely what I meant :) I guess I was being too over cautious
and even trying to save the time spent in evaluating whatever paths we have
and building new FunctionScan paths...


 I think the right approach for now is to emulate the GEQO precedent as
 closely as possible.  Build all the single-relation Paths the same as
 now, then do a join search over all the relations, then (if we've noticed
 that some joins are potentially removable) do another join search over
 just the nonremovable relations.


How about using geqo more liberally when replanning (decrease the number of
relations in join before geqo is hit?)



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma
On Wed, Dec 3, 2014 at 11:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I think the right approach for now is to emulate the GEQO precedent as
  closely as possible.  Build all the single-relation Paths the same as
  now, then do a join search over all the relations, then (if we've
 noticed
  that some joins are potentially removable) do another join search over
  just the nonremovable relations.

  How about using geqo more liberally when replanning (decrease the number
 of
  relations in join before geqo is hit?)

 This is going to be quite difficult enough without overcomplicating it.
 Or as a wise man once said, premature optimization is the root of all
 evil.  Get it working in the basic way and then see if improvement is
 necessary at all.


Sure, I can take a crack at it since I am working on a patch that does
require this alternative path approach. Let me try something and report my
experimental results.


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-12 Thread Atri Sharma
On Wed, Nov 12, 2014 at 1:24 PM, David Rowley dgrowle...@gmail.com wrote:


 On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs si...@2ndquadrant.com
 wrote:


 This plan type is widely used in reporting queries, so will hit the
 mainline of BI applications and many Mat View creations.
 This will allow SELECT count(*) FROM foo to go faster also.


 We'd also need to add some infrastructure to merge aggregate states
 together for this to work properly. This means that could also work for
 avg() and stddev etc. For max() and min() the merge functions would likely
 just be the same as the transition functions.


It might make sense to make a new planner operator which can be responsible
for pulling from each of the individual parallel Agg nodes and then
aggregating over the results.


A couple of things that might be worth considering are whether we want to
enforce using parallel aggregation or let planner decide if it wants to do
a parallel aggregate or go with a single plan. For eg, the average
estimated size of groups might be one thing that planner may consider while
deciding between a parallel and a single execution plan.

I dont see merging states as an easy problem, and should perhaps be tackled
apart from this thread.

Also, do we want to allow parallelism only with GroupAggs?

Regards,

Atri


Re: [HACKERS] Representing a SRF return column in catalogs

2014-11-07 Thread Atri Sharma
On Fri, Nov 7, 2014 at 7:15 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Nov 5, 2014 at 8:24 AM, Atri Sharma atri.j...@gmail.com wrote:
  I am working on something that requires representing a SRF return column
 in
  pg_proc and being able to retrieve it, retrieve the name of the column
 and
  make a ColumnRef node from it.
 
  The catch here are aliases:
 
  SELECT generate_series(1,100) AS a ORDER BY a;
 
  I need to know that the return column of generate_series is being
 referred
  as a in this query (hence tlist, sortClause will have 'c').
 
  I discussed and think that a way can be to have position of column rather
  than the actual name but that raises the question of the functionality
  needed for back resolving position to name (to make a ColumnRef node) and
  then infer that the column is being referred as an alias in this query.

 It's not clear to me what you are trying to do, so I can't give you any
 advice.


Let me give an example:

Consider an user defined SRF (or an inbuilt one (generate_series)). I am
working on a path keys tracking project (more details on it in a separate
email). I am interested in one of the columns of the result of the SRF and
want to store it in catalogs in a manner that allows me to refer it later
when executing the SRF.

One way can be to store the raw column name. However, I am not sure how
will that work around aliases without a considerable fiddling with Alias
nodes in parsetime.

Can I store relattnos or something? I need to get the stored att in planner
and build pathkeys from it.

Please advice.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Representing a SRF return column in catalogs

2014-11-07 Thread Atri Sharma
On Saturday, November 8, 2014, Tom Lane t...@sss.pgh.pa.us wrote:


 I'm confused too.  The original example seemed to imagine that details
 of a query (not the function, but the calling query) would be stored in
 the catalogs, which is completely nuts.

 pg_proc already has provisions to remember the names of output parameters
 of a function, but it seems like you want something else than that, only
 it's not very clear what.  Are you trying to say that you'd like to
 represent the sort order of the output of a function?  If so, you'd have
 to add new columns to pg_proc for that, but I can't see why we'd represent
 that information in terms of column names.  A column number and a sort
 operator would make more sense.


Exactly. I would like to represent the sort order of the output.

Thanks for catching it, I really need to stop writing emails without
drinking coffee...


-- 
Regards,

Atri
*l'apprenant*


[HACKERS] Representing a SRF return column in catalogs

2014-11-05 Thread Atri Sharma
Hi,

I am working on something that requires representing a SRF return column in
pg_proc and being able to retrieve it, retrieve the name of the column and
make a ColumnRef node from it.

The catch here are aliases:

SELECT generate_series(1,100) AS a ORDER BY a;

I need to know that the return column of generate_series is being referred
as a in this query (hence tlist, sortClause will have 'c').

I discussed and think that a way can be to have position of column rather
than the actual name but that raises the question of the functionality
needed for back resolving position to name (to make a ColumnRef node) and
then infer that the column is being referred as an alias in this query.

Any pointers will be really helpful.

Regards,

Atri


Re: [HACKERS] How to implent CONVERT ( data_type [ ( length ) ] , expression ) function in postgreSQL

2014-11-02 Thread Atri Sharma
On Sun, Nov 2, 2014 at 3:40 PM, rohtodeveloper rohtodevelo...@outlook.com
wrote:


 Dear


 In SQLServer, there'are two functions to converte an expression of one
 data type to another.

 1. CAST ( expression AS data_type [ ( length ) ] )
 2. CONVERT ( data_type [ ( length ) ] , expression )

 However, In PostgreSQL, there's only the CAST ( expression AS data_type [
 ( length ) ] ) function. I have tried the following two ways to implenting
 the CONVERT ( data_type [ ( length ) ] , expression ) function, but both
 are failed.

 1. CREATE FUNCTION .
 The function's arguments can only be expressions but not data_type .
 2. Modifying the gram.y .
 The CONVERT ( data_type [ ( length ) ] , expression ) is in grammer
 conflict with the PostgreSQL self's
 convert(data,src_encoding_name,dest_encoding_name) function. And the
 PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name)
 function cannot be used.

 I wonder whether there's a better way to solve this problem.
 Any help will be appreciated.
 Please do not cross post to multiple lists.



Please do not cross post to various lists.

For the options you suggested:

1) Pass in datatype as string and deparse and process in the function.
2) Are you referring to pg_convert here?

IMO I do not understand why you need the convert function in the first
place. You may want to refer to
http://www.postgresql.org/docs/9.3/static/typeconv.html


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-27 Thread Atri Sharma



 To solve #1, we could redesign CREATE DATABASE so that replaying the
 DBASE_CREATE record doesn't zap the old directory, and also doesn't copy
 any files. We could instead just assume that if the transaction commits,
 all the files have been copied and fsync'd already, like we assume that if
 a CREATE INDEX commits in wal_level=minimal, the underlying file was
 fsync'd before the commit.


Do you mean that during a recovery, we just let the database directory be
and assume that it is in good shape since the transaction committed
originally?


 I wonder if we should bite the bullet and start WAL-logging all the files
 that are copied from the template database to the new database. When the
 template database is small (template0 is 6.4MB currently), that wouldn't
 generate too much WAL. We could perhaps do that only if the template
 database is small, and do the checkpoints otherwise, although I wouldn't
 like to have subtly different behavior depending on database size like that.


For the sort of workload Tomas described above (creating a lot of databases
on the fly), we may end up with a lot of WAL eventually if we do this.

Regards,

Atri


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-27 Thread Atri Sharma
On Mon, Oct 27, 2014 at 4:44 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 10/27/2014 01:06 PM, Atri Sharma wrote:




  To solve #1, we could redesign CREATE DATABASE so that replaying the
 DBASE_CREATE record doesn't zap the old directory, and also doesn't copy
 any files. We could instead just assume that if the transaction commits,
 all the files have been copied and fsync'd already, like we assume that
 if
 a CREATE INDEX commits in wal_level=minimal, the underlying file was
 fsync'd before the commit.


 Do you mean that during a recovery, we just let the database directory be
 and assume that it is in good shape since the transaction committed
 originally?


 Right.


It does make sense, however, with the checkpoint after creating the files
gone, the window between the creation of files and actual commit might be
increased, increasing the possibility of a crash during that period and
causing an orphan database. However, my understanding of the consequences
of removing the checkpoint might be incorrect, so my fears might be wrong.

Regards,

Atri


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-27 Thread Atri Sharma



 IMHO writing all the data into a WAL would be the cleanest solution.

 Also, what is a small database? I don't think a static value will work,
 because the sweet spot between the current approach (forcing two
 checkpoints) and writing everything in WAL depends on the amount of dirty
 buffers that need to be checkpointed. Which is mostly driven by the size
 of shared buffers and write activity - for small shared buffers and/or
 mostly-read workload, checkpoints are cheap, so the 'small database'
 threshold (when choosing the WAL approach) is much higher than for large
 shared buffers or write-heavy workloads.


So are you proposing having a heuristic based on the amount of data in
shared buffers and write activity? Do you have something in mind that works
for general workloads as well?


 So maybe if we could determine the amount of data to be checkpointed, and
 then base the decision on that, that'd work better? This would also have
 to take into account that writing into WAL is sequential, while
 checkpoints usually cause random writes all over the datafiles (which is
 more expensive).

 Another option might be forcing just a spread checkpoint, not the
 immediate one (which is what we do now). That would not fix the CREATE
 DATABASE duration (actually, it would make it longer), but it would lower
 the impact on other activity on the machine.



I believe this to be the cleanest way to reduce the amount of I/O
generated. If I understand correctly, the original problem you mentioned
was not the time CREATE DATABASE is taking but rather the amount of I/O
each one is generating.

This also leads me to think if it makes sense to explore group commits
around the creation of files for a new database (for a same backend, of
course). This might be on call, if the user knows he/she is going to create
a lot of databases in the near future and is fine with a large spike in I/O
at one go. Again, might be even more broken than the current scenario, but
depends on what the user wants...

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Atri Sharma
On Fri, Oct 17, 2014 at 7:45 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma atri.j...@gmail.com wrote:
 
 
  On Wednesday, October 15, 2014, Marti Raudsepp ma...@juffo.org wrote:
 
  Hi
 
  On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma atri.j...@gmail.com
 wrote:
   Please find attached a patch which implements support for UPDATE
 table1
   SET(*)=...
 
  I presume you haven't read Tom Lane's proposal and discussion about
  multiple column assignment in UPDATE:
  http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
  (Assigning all columns was also discussed there)
 
  And there's a WIP patch:
  http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
 
  Thanks for the links, but this patch only targets SET(*) case, which, if
 I
  understand correctly, the patch you mentioned doesn't directly handle
 (If I
  understand correctly, the target of the two patches is different).

 Yeah -- in fact, there was some discussion about this exact case.
 This patch solves a very important problem: when doing record
 operations to move data between databases with identical schema
 there's currently no way to 'update' in a generic way without building
 out the entire field list via complicated and nasty dynamic SQL.


Thanks!


 I'm
 not sure about the proposed syntax though; it seems a little weird to
 me.  Any particular reason why you couldn't have just done:

 UPDATE table1 SET * = a,b,c, ...

 also,

 UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);


I honestly have not spent a lot of time thinking about the exact syntax
that may be acceptable. If we have arguments for or against a specific
syntax, I will be glad to incorporate them.





[HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
Hi All,

Please find attached a patch which implements support for UPDATE table1
SET(*)=...
The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1
SET(*)=(SELECT a,b,c FROM...).
It solves the problem of doing UPDATE from a record variable of the same
type as the table e.g. update foo set (*) = (select foorec.*) where ...;

The design is simple. It basically expands the * in transformation stage,
does the necessary type checking and adds it to the parse tree. This allows
for normal execution for the rest of the stages.


Thoughts/Comments?

Regards,

Atri


updatestar_ver1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
On Wednesday, October 15, 2014, Marti Raudsepp ma...@juffo.org wrote:

 Hi

 On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma atri.j...@gmail.com
 javascript:; wrote:
  Please find attached a patch which implements support for UPDATE table1
  SET(*)=...

 I presume you haven't read Tom Lane's proposal and discussion about
 multiple column assignment in UPDATE:
 http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
 (Assigning all columns was also discussed there)

 And there's a WIP patch:
 http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us



Thanks for the links, but this patch only targets SET(*) case, which, if I
understand correctly, the patch you mentioned doesn't directly handle (If I
understand correctly, the target of the two patches is different).

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma
On Wed, Oct 15, 2014 at 2:18 PM, Atri Sharma atri.j...@gmail.com wrote:





 On Wednesday, October 15, 2014, Marti Raudsepp ma...@juffo.org wrote:

 Hi

 On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma atri.j...@gmail.com
 wrote:
  Please find attached a patch which implements support for UPDATE table1
  SET(*)=...

 I presume you haven't read Tom Lane's proposal and discussion about
 multiple column assignment in UPDATE:
 http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
 (Assigning all columns was also discussed there)

 And there's a WIP patch:
 http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us



 Thanks for the links, but this patch only targets SET(*) case, which, if I
 understand correctly, the patch you mentioned doesn't directly handle (If I
 understand correctly, the target of the two patches is different).


Digging more, I figured that the patch I posted builds on top of Tom's
patch,  since it did not add whole row cases.

Regards,

Atri


Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-14 Thread Atri Sharma
On Sunday, September 14, 2014, Mark Kirkwood mark.kirkw...@catalyst.net.nz
wrote:

 On 14/09/14 05:36, Rohit Goyal wrote:

 Hi All,

 I want to work on the code of intermediate dataset of select and update
 query.

 For example.

 Rohit's salary has been updated 4 times, so it has 4 different version
 of salary.

 I want to select  salary of person named Rohit. Now suppose , in
 intermediate result, I found 4 different versions of the data. I want to
 know the code portion which i need to look for working on all 4 versions
 in dataset. :)



 Hi Rohit,

 Currently in Postgres, these intermediate versions all exist - however a
 given session can only see one of them. Also VACUUM is allowed to destroy
 versions that no other transactions can see.

 So if I'm understanding you correctly, you would like to have some way for
 a session to see *all* these versions (and I guess preventing VACUUM from
 destroying them).



Any modifications of that sort are bound to introduce lots of pain, not to
mention performance degradation and the added responsibility of ensuring
that dead tuples don't bloat up the system (prevent vacuum from running at
regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction. If
you only want the data, use the triggers as Gavin mentioned.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-14 Thread Atri Sharma
On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 14/09/14 19:25, Atri Sharma wrote:



 On Sunday, September 14, 2014, Mark Kirkwood
 mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz

 wrote:

 On 14/09/14 05:36, Rohit Goyal wrote:

 Hi All,

 I want to work on the code of intermediate dataset of select and
 update
 query.

 For example.

 Rohit's salary has been updated 4 times, so it has 4 different
 version
 of salary.

 I want to select  salary of person named Rohit. Now suppose , in
 intermediate result, I found 4 different versions of the data. I
 want to
 know the code portion which i need to look for working on all 4
 versions
 in dataset. :)



 Hi Rohit,

 Currently in Postgres, these intermediate versions all exist -
 however a given session can only see one of them. Also VACUUM is
 allowed to destroy versions that no other transactions can see.

 So if I'm understanding you correctly, you would like to have some
 way for a session to see *all* these versions (and I guess
 preventing VACUUM from destroying them).



 Any modifications of that sort are bound to introduce lots of pain, not
 to mention performance degradation and the added responsibility of
 ensuring that dead tuples don't bloat up the system (prevent vacuum from
 running at regular intervals and you can have a xid wraparound).

 I just mentioned that in case you are planning to go in that direction.
 If you only want the data, use the triggers as Gavin mentioned.


 Obviously in the general case sure - but (as yet) we don't have much idea
 about Rohit's use case and workload. If retrieving past versions is the
 *primary* workload bias and high update concurrency is not required then
 this could well work better than a trigger based solution.

 And it does not seem too onerous to have the ability to switch this on as
 required, viz:

 ALTER TABLE table1 VERSIONING;

 (or similar syntax) which makes VACUUM leave this table alone.


How do you plan to  do all that VACUUM does for this table then?

It seems to me that you are saying to VACUUM that it need not be concerned
with table 'A' and you are assuming ownership of all the tasks performed by
VACUUM for this table. Seems pretty broken to me, not to mention the
performance degradations.

Regards,

Atri

Regards,

Atri


Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-14 Thread Atri Sharma
On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood 
mark.kirkw...@catalyst.net.nz wrote:

 On 14/09/14 19:25, Atri Sharma wrote:



 On Sunday, September 14, 2014, Mark Kirkwood
 mark.kirkw...@catalyst.net.nz mailto:mark.kirkw...@catalyst.net.nz

 wrote:

 On 14/09/14 05:36, Rohit Goyal wrote:

 Hi All,

 I want to work on the code of intermediate dataset of select and
 update
 query.

 For example.

 Rohit's salary has been updated 4 times, so it has 4 different
 version
 of salary.

 I want to select  salary of person named Rohit. Now suppose , in
 intermediate result, I found 4 different versions of the data. I
 want to
 know the code portion which i need to look for working on all 4
 versions
 in dataset. :)



 Hi Rohit,

 Currently in Postgres, these intermediate versions all exist -
 however a given session can only see one of them. Also VACUUM is
 allowed to destroy versions that no other transactions can see.

 So if I'm understanding you correctly, you would like to have some
 way for a session to see *all* these versions (and I guess
 preventing VACUUM from destroying them).



 Any modifications of that sort are bound to introduce lots of pain, not
 to mention performance degradation and the added responsibility of
 ensuring that dead tuples don't bloat up the system (prevent vacuum from
 running at regular intervals and you can have a xid wraparound).

 I just mentioned that in case you are planning to go in that direction.
 If you only want the data, use the triggers as Gavin mentioned.


 Obviously in the general case sure - but (as yet) we don't have much idea
 about Rohit's use case and workload. If retrieving past versions is the
 *primary* workload bias and high update concurrency is not required then
 this could well work better than a trigger based solution.

 And it does not seem too onerous to have the ability to switch this on as
 required, viz:

 ALTER TABLE table1 VERSIONING;

 (or similar syntax) which makes VACUUM leave this table alone. It might
 make more sense to make such a concept apply to a TABLESPACE instead mind
 you (i.e things in here are for archive/versioning purposes)...




What I think can be done is have a tuplestore which has the delta of
updated rows i.e. only have the changes made in an update statement stored
in a tuplestore (it could be a part of RelationData). It should be simple
enough to have tuplestore store the oid of the inserted tuple and the
difference between new tuple and the old tuple. No changes need to be done
for old tuple since it can be marked as deleted and VACUUM can remove it as
normal logic.

Not a clean way, but should work for what you proposed.


Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-13 Thread Atri Sharma
On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal rhtgyl...@gmail.com wrote:


 Hi All,

 I want to work on the code of intermediate dataset of select and update
 query.

 For example.

 Rohit's salary has been updated 4 times, so it has 4 different version of
 salary.

 I want to select  salary of person named Rohit. Now suppose , in
 intermediate result, I found 4 different versions of the data. I want to
 know the code portion which i need to look for working on all 4 versions in
 dataset. :)

 Thanks in advance!!



Not sure what you are looking for, but each update is an insert of a new
tuple with the new values and marking the old tuple as deleted.

There is no need for tracking the versions of any changes in data set. Each
update operation leaves only one visible tuple. If the transaction commits,
inserted tuple becomes visible and old row is marked deleted. If the
transaction rollbacks, only the old tuple shall remain visible.
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-13 Thread Atri Sharma
On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston 
david.g.johns...@gmail.com wrote:

 Atri Sharma wrote
  On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal lt;



 Or rather even if you want to be able to reference the older versions of
 that record there is nothing in PostgreSQL to facilitate that. You have to
 manually create and manage the data so that you know during what time
 period
 a given record is valid.

 David J.





Sometimes I do miss 'time travel' we used to have :)

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Join push-down support for foreign tables

2014-09-05 Thread Atri Sharma
On Fri, Sep 5, 2014 at 2:20 AM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Sep 4, 2014 at 11:56 AM, Bruce Momjian br...@momjian.us wrote:
  I am thinking eventually we will need to cache the foreign server
  statistics on the local server.
 
  Wouldn't that lead to issues where the statistics get outdated and we
 have to
  anyways query the foreign server before planning any joins? Or are you
 thinking
  of dropping the foreign table statistics once the foreign join is
 complete?
 
  I am thinking we would eventually have to cache the statistics, then get
  some kind of invalidation message from the foreign server.  I am also
  thinking that cache would have to be global across all backends, I guess
  similar to our invalidation cache.

 Maybe ... but I think this isn't really related to the ostensible
 topic of this thread.  We can do join pushdown just fine without the
 ability to do anything like this.

 I'm in full agreement that we should probably have a way to cache some
 kind of statistics locally, but making that work figures to be tricky,
 because (as I'm pretty sure Tom has pointed out before) there's no
 guarantee that the remote side's statistics look anything like
 PostgreSQL statistics, and so we might not be able to easily store
 them or make sense of them.  But it would be nice to at least have the
 option to store such statistics if they do happen to be something we
 can store and interpret.


I agree that we need local statistics too (full agreement to Bruce's
proposal) but playing the Devil's advocate here and trying to figure how
will things like invalidation and as you mentioned, cross compatibility
work.



 It's also coming to seem to me more and more that we need a way to
 designate several PostgreSQL machines as a cooperating cluster.  This
 would mean they'd keep connections to each other open and notify each
 other about significant events, which could include hey, I updated
 the statistics on this table, you might want to get the new ones or
 hey, i've replicated your definition for function X so it's safe to
 push it down now as well as hey, I have just been promoted to be the
 new master or even automatic negotiation of which of a group of
 machines should become the master after a server failure.


Thats a brilliant idea, and shouldnt be too much of a problem. One race
condition that is possible is that multiple backend may try to globally
propagate different statistics of the same table, but I think that any
standard logical ordering algorithm should handle that. Also, the automatic
master promotion seems like a brilliant idea and is also great since we
have time tested standard algorithms for that.

One thing I would like to see is that assuming all the interacting nodes do
not have identical schemas, if we can somehow maintain cross node
statistics and use them for planning cross node joins. That would lead to
similar problems as the ones already noted for having local statistics for
foreign databases, but if we solve those anyways for storing local
statistics, we could potentially look at having cross node relation
statistics as well.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Join push-down support for foreign tables

2014-09-04 Thread Atri Sharma
On Thursday, September 4, 2014, Bruce Momjian br...@momjian.us wrote:

 On Thu, Sep  4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
  The main problem I see here is that accurate costing may require a
  round-trip to the remote server.  If there is only one path that is
  probably OK; the cost of asking the question will usually be more than
  paid for by hearing that the pushed-down join clobbers the other
  possible methods of executing the query.  But if there are many paths,
  for example because there are multiple sets of useful pathkeys, it
  might start to get a bit expensive.
 
  Probably both the initial cost and final cost calculations should be
  delegated to the FDW, but maybe within postgres_fdw, the initial cost
  should do only the work that can be done without contacting the remote
  server; then, let the final cost step do that if appropriate.  But I'm
  not entirely sure what is best here.

 I am thinking eventually we will need to cache the foreign server
 statistics on the local server.



Wouldn't that lead to issues where the statistics get outdated and we have
to anyways query the foreign server before planning any joins? Or are you
thinking of dropping the foreign table statistics once the foreign join is
complete?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Join push-down support for foreign tables

2014-09-04 Thread Atri Sharma
On Thu, Sep 4, 2014 at 9:26 PM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Sep  4, 2014 at 08:41:43PM +0530, Atri Sharma wrote:
 
 
  On Thursday, September 4, 2014, Bruce Momjian br...@momjian.us wrote:
 
  On Thu, Sep  4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
   The main problem I see here is that accurate costing may require a
   round-trip to the remote server.  If there is only one path that is
   probably OK; the cost of asking the question will usually be more
 than
   paid for by hearing that the pushed-down join clobbers the other
   possible methods of executing the query.  But if there are many
 paths,
   for example because there are multiple sets of useful pathkeys, it
   might start to get a bit expensive.
  
   Probably both the initial cost and final cost calculations should
 be
   delegated to the FDW, but maybe within postgres_fdw, the initial
 cost
   should do only the work that can be done without contacting the
 remote
   server; then, let the final cost step do that if appropriate.  But
 I'm
   not entirely sure what is best here.
 
  I am thinking eventually we will need to cache the foreign server
  statistics on the local server.
 
 
 
 
  Wouldn't that lead to issues where the statistics get outdated and we
 have to
  anyways query the foreign server before planning any joins? Or are you
 thinking
  of dropping the foreign table statistics once the foreign join is
 complete?

 I am thinking we would eventually have to cache the statistics, then get
 some kind of invalidation message from the foreign server.  I am also
 thinking that cache would have to be global across all backends, I guess
 similar to our invalidation cache.



That could lead to some bloat in storing statistics since we may have a lot
of tables for a lot of foreign servers. Also, will we have VACUUM look at
ANALYZING the foreign tables?

Also, how will we decide that the statistics are invalid? Will we have the
FDW query the foreign server and do some sort of comparison between the
statistics the foreign server has and the statistics we locally have? I am
trying to understand how the idea of invalidation message from foreign
server will work.

Regards,

Atri


Re: [HACKERS] Join push-down support for foreign tables

2014-09-04 Thread Atri Sharma
On Thu, Sep 4, 2014 at 9:33 PM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Sep  4, 2014 at 09:31:20PM +0530, Atri Sharma wrote:
  I am thinking we would eventually have to cache the statistics, then
 get
  some kind of invalidation message from the foreign server.  I am also
  thinking that cache would have to be global across all backends, I
 guess
  similar to our invalidation cache.
 
 
 
 
  That could lead to some bloat in storing statistics since we may have a
 lot of
  tables for a lot of foreign servers. Also, will we have VACUUM look at
  ANALYZING the foreign tables?

  Also, how will we decide that the statistics are invalid? Will we have
 the FDW
  query the foreign server and do some sort of comparison between the
 statistics
  the foreign server has and the statistics we locally have? I am trying to
  understand how the idea of invalidation message from foreign server will
 work.

 Well, ANALYZING is running on the foreign server, and somehow it would
 be nice if it would send a message to us about its new statistics, or we
 can do it like http does and it gives us a last-refresh statistics date
 when we connect.


Not sure how that would work without changing the way ANALYZE works on the
foreign server. http idea could work,though.


 I am not sure how it will work --- I am just suspecting that we might
 get to a point where the statistics lookup overhead on the foreign
 server might become a bottleneck.


Totally agree, but doing the planning only locally opens the questions I
mentioned above, and also deprives the foreign server database to do any
optimizations that it may want to do (assuming that the foreign database
and postgres query planner do not generate identical plans). This is only
my thought though, we could also be planning better than the foreign server
database, so the optimization part I raised is debatable.

Regards,

Atri




-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Atri Sharma
On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl wrote:

 On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
  Erik == Erik Rijkers e...@xs4all.nl writes:
 
They apply cleanly for me at 2bde297 whether with git apply or
patch, except for the contrib one (which you don't need unless you
want to run the contrib regression tests without applying the
gsp-u patch).
 
   Erik Ah, I had not realised that.  Excluding that contrib-patch and
   Erik only applying these three:
 
   Erik gsp1.patch
   Erik gsp2.patch
   Erik gsp-doc.patch
 
   Erik does indeed work (applies, compiles).
 
  I put up a rebased contrib patch anyway (linked off the CF).
 
  Did the unrecognized node type error go away, or do we still need to
  look into that?
 

 I have found that the unrecognized node type error is caused by:

 shared_preload_libraries = pg_stat_statements

 in postgresql.conf (as my default compile script was doing).

 If I disable that line the error goes away.


I  think thats more of a library linking problem rather than a problem with
the patch. I couldnt reproduce it,though.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Atri Sharma
On Sunday, August 31, 2014, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-08-31 21:09:59 +0530, Atri Sharma wrote:
  On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers e...@xs4all.nl
 javascript:; wrote:
   I have found that the unrecognized node type error is caused by:

 It's a warning, not an error, right?

   shared_preload_libraries = pg_stat_statements
  
   in postgresql.conf (as my default compile script was doing).
  
   If I disable that line the error goes away.
  
  
  I  think thats more of a library linking problem rather than a problem
 with
  the patch. I couldnt reproduce it,though.

 I think it's vastly more likely that the patch simply didn't add the new
 expression types to pg_stat_statements.c:JumbleExpr().



Must have run the above diagnosis in a wrong manner then, I will
check.Thanks for the heads up!

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] COPY and heap_sync

2014-08-30 Thread Atri Sharma
On Saturday, August 30, 2014, Amit Kapila amit.kapil...@gmail.com wrote:

 On Sat, Aug 30, 2014 at 11:56 AM, Jeff Janes jeff.ja...@gmail.com
 javascript:_e(%7B%7D,'cvml','jeff.ja...@gmail.com'); wrote:
 
  If you insert tuples with COPY into a table created or truncated in the
 same transaction, at the end of the COPY it calls heap_sync.
 
  But there cases were people use COPY in a loop with a small amount of
 data in each statement.  Now it is calling heap_sync many times, and if
 NBuffers is large doing that gets very slow.
 
  Could the heap_sync be safely delayed until the end of the transaction,
 rather than the end of the COPY?

 Wouldn't unconditionally delaying sync until end of transaction
 can lead to burst of I/O at that time especially if there are many
 such copy commands in a transaction, leading to delay in some
 other operation's that might be happening concurrently in the
 system.




I agree with that but then, it can provide us the same benefits like group
commit,especially when most of the copy commands touch pages which are
nearby,hence reducing the seek time overhead.

We could look at making it optional through a GUC, since it is useful
albeit for some specific usecases.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma
On Thursday, August 14, 2014, Jeff Davis pg...@j-davis.com wrote:

 On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
  If you're following the HashJoin model, then what you do is the same
 thing
  it does: you write the input tuple back out to the pending batch file for
  the hash partition that now contains key 1001, whence it will be
 processed
  when you get to that partition.  I don't see that there's any special
 case
  here.

 HashJoin only deals with tuples. With HashAgg, you have to deal with a
 mix of tuples and partially-computed aggregate state values. Not
 impossible, but it is a little more awkward than HashJoin.


+1

Not to mention future cases if we start maintaining multiple state
values,in regarded to grouping sets.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma
On Thu, Aug 14, 2014 at 10:21 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 14 Srpen 2014, 18:02, Atri Sharma wrote:
  On Thursday, August 14, 2014, Jeff Davis pg...@j-davis.com wrote:
 
  On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
   If you're following the HashJoin model, then what you do is the same
  thing
   it does: you write the input tuple back out to the pending batch file
  for
   the hash partition that now contains key 1001, whence it will be
  processed
   when you get to that partition.  I don't see that there's any special
  case
   here.
 
  HashJoin only deals with tuples. With HashAgg, you have to deal with a
  mix of tuples and partially-computed aggregate state values. Not
  impossible, but it is a little more awkward than HashJoin.
 
 
  +1
 
  Not to mention future cases if we start maintaining multiple state
  values,in regarded to grouping sets.

 So what would you do for aggregates where the state is growing quickly?
 Say, things like median() or array_agg()?

 I think that we can't do that for all aggregates does not imply we must
 not do that at all.

 There will always be aggregates not implementing dumping state for various
 reasons, and in those cases the proposed approach is certainly a great
 improvement. I like it, and I hope it will get committed.

 But maybe for aggregates supporting serialize/deserialize of the state
 (including all aggregates using known types, not just fixed-size types) a
 hashjoin-like batching would be better? I can name a few custom aggregates
 that'd benefit tremendously from this.


Yeah, could work, but is it worth adding additional paths (assuming this
patch gets committed) for some aggregates? I think we should do a further
analysis on the use case.


 Just to be clear - this is certainly non-trivial to implement, and I'm not
 trying to force anyone (e.g. Jeff) to implement the ideas I proposed. I'm
 ready to spend time on reviewing the current patch, implement the approach
 I proposed and compare the behaviour.


Totally agreed. It would be a different approach, albeit as you said, the
approach can be done off the current patch.


 Kudos to Jeff for working on this.

 Agreed :)






-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Atri Sharma
On Tuesday, July 22, 2014, Rohit Goyal rhtgyl...@gmail.com wrote:

 Hi All,

 I am working on postgresql code and having some problem. :)

 I need to create shared data structure, so that different client and
 connection can update and share the state of those data structures in
 memory. I planned to use top memory context but it can give me shared
 structure within one session/terminal.

 Please tel me how  postgresql do that and how i can do that?

 Regards,
 Rohit Goyal


How about making it a part of shared mem, like shared buffers?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] how to skip building certain paths in path tree

2014-07-21 Thread Atri Sharma
On Tue, Jul 22, 2014 at 10:53 AM, Rajmohan C csrajmo...@gmail.com wrote:

 How do I modify PostgreSQL 9.3 source to skip certain join combinations
 (skip building certain paths in path tree) by checking some condition



have a look at joinpaths.c


Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-07-03 Thread Atri Sharma
On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 30.6.2014 23:12, Tomas Vondra wrote:
  Hi,
 
  attached is v5 of the patch. The main change is that scaling the number
  of buckets is done only once, after the initial hash table is build. The
  main advantage of this is lower price. This also allowed some cleanup of
  unecessary code.
 
  However, this new patch causes warning like this:
 
  WARNING:  temporary file leak: File 231 still referenced
 
  I've been eyeballing the code for a while now, but I still fail to see
  where this comes from :-( Any ideas?

 Meh, the patches are wrong - I haven't realized the tight coupling
 between buckets/batches in ExecHashGetBucketAndBatch:

   *bucketno = hashvalue  (nbuckets - 1);
   *batchno = (hashvalue  hashtable-log2_nbuckets)  (nbatch - 1);

 The previous patches worked mostly by pure luck (the nbuckets was set
 only in the first batch), but once I moved the code at the end, it
 started failing. And by worked I mean didn't throw an error, but
 probably returned bogus results with (nbatch1).

 However, ISTM this nbuckets-nbatch coupling is not really necessary,
 it's only constructed this way to assign independent batch/bucket. So I
 went and changed the code like this:

   *bucketno = hashvalue  (nbuckets - 1);
   *batchno = (hashvalue  (32 - hashtable-log2_nbatch));

 I.e. using the top bits for batchno, low bits for bucketno (as before).

 Hopefully I got it right this time. At least it seems to be working for
 cases that failed before (no file leaks, proper rowcounts so far).


Hi,

I had a look at the patch and I was wondering if there is a way we can set
a cap on the resized size, and instead increase the number of batches
instead? Since this patch evaluates the growth of tuples vs increase of
space, we could look at increasing the number of batches itself instead of
number of buckets, if the resized number of buckets exceeds a threshold. It
shouldnt be too hard, AIUI it would involve a call in MultiExecHash right
after the new cost evaluation the patch adds.

It isnt a target of the current patch, but I think that it is a logical
extension to the same.

Thoughts/Comments?

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Atri Sharma
On Thu, Jul 3, 2014 at 11:40 PM, Stephen Frost sfr...@snowman.net wrote:

 Tomas,

 * Tomas Vondra (t...@fuzzy.cz) wrote:
  However it's likely there are queries where this may not be the case,
  i.e. where rebuilding the hash table is not worth it. Let me know if you
  can construct such query (I wasn't).

 Thanks for working on this!  I've been thinking on this for a while and
 this seems like it may be a good approach.  Have you considered a bloom
 filter?


IIRC, last time when we tried doing bloom filters, I was short of some real
world useful hash functions that we could use for building the bloom filter.

If we are restarting experiments on this, I would be glad to assist.

Regards,

Atri


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-16 Thread Atri Sharma
On Mon, Jun 16, 2014 at 11:28 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Mon, May 26, 2014 at 6:23 AM, Ronan Dunklau ronan.dunk...@dalibo.com
 wrote:
  Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit :
  On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:
   Hello,
  
   Since my last proposal didn't get any strong rebuttal, please find
   attached a more complete version of the IMPORT FOREIGN SCHEMA
 statement.
 
  Thanks!
 
  Please to send future patches to this thread so people can track them
  in their mail.
 
  I'll do.
 
  I didn't for the previous one because it was a few months ago, and no
 patch
  had been added to the commit fest.
 
 
   I tried to follow the SQL-MED specification as closely as possible.
  
   This adds discoverability to foreign servers. The structure of the
   statement as I understand it is simple enough:
  
   IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT
 TO |
   EXCEPT) table_list ] INTO local_schema.
  
   The import_foreign_schema patch adds the infrastructure, and a new FDW
   routine:
  
   typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
   ImportForeignSchemaStmt * parsetree);
  
   This routine must return a list of CreateForeignTableStmt mirroring
   whatever tables were found on the remote side, which will then be
   executed.
  
   The import_foreign_schema_postgres_fdw patch proposes an
 implementation of
   this API for postgres_fdw. It will import a foreign schema using the
 right
   types as well as nullable information.
 
  In the case of PostgreSQL, the right types are obvious until there's
  a user-defined one.  What do you plan to do in that case ?
 
 
  The current implementation fetches the types as regtype, and when
 receiving a
  custom type, two things can happen:
 
   - the type is defined locally: everything will work as expected
   - the type is not defined locally: the conversion function will fail,
 and
  raise an error of the form: ERROR:  type schema.typname does not exist

 Just wondering: what about the case where the same data type is
 defined on both local and remote, but with *different* definitions? Is
 it the responsibility of the fdw to check for type incompatibilities?


Logically, should be.

Just wondering, cant we extend the above proposed function  typedef List
*(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree); be changed a bit to give exact type
definitions from the remote side as well?

Regards,

Atri


Regards,

Atri
*l'apprenant*


Re: [HACKERS] Congrats Andres Freund, the newest PostgreSQL Commiter!

2014-05-22 Thread Atri Sharma
Congrats Andres!


On Fri, May 23, 2014 at 7:54 AM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:

 Hi All,

 At the Developer Meeting that occurred 21th May 2014 was announced a new
 PostgreSQL commiter [1], Mr. Andres Freund.

 I had the opportunity to work and be mentored by him. He deserves very
 much this confidence, for the excellent work that has been doing for the
 community.

 Thank you and Congrats Andres!


 [1]
 https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#New_Committer

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog sobre TI: http://fabriziomello.blogspot.com
  Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello




-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] is there a hook by which we can modify input query before postgresql builds raw parse tree

2014-05-01 Thread Atri Sharma
On Fri, May 2, 2014 at 10:07 AM, Rajmohan C csrajmo...@gmail.com wrote:

 Hi,

I want to know is there a way to use a hook to modify the input query
 before Postgresql parses and builds the parse tree for the query.


Uh...the rewriter?

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Hashable custom types

2014-04-26 Thread Atri Sharma
The plain UNION code supports either sorting or hashing, but
 we've not gotten around to supporting a sort-based approach
 to recursive UNION.  I'm not convinced that it's worth doing ...

 regards, tom lane



Without sorting, isnt the scope of a recursive UNION with custom datatypes
pretty restrictive?

As is, even the sorting shall be a bit restrictive due to the costs
associated. I feel what David has suggested upthread should be good. Maybe
an experimental patch with a workload that should give a load factor 1 for
the hash table should prove some performance points.

Even if thats not the case, we should really do something to improve the
scope of usability of recursive UNION with custom types.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Atri Sharma
On Tue, Apr 22, 2014 at 12:59 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote:

 Jason Petersen wrote:
  Yes, we obviously want a virtual clock. Focusing on the use of
 gettimeofday seems silly to me: it was
  something quick for the prototype.
 
  The problem with the clocksweeps is they don’t actually track the
 progression of “time” within the
  PostgreSQL system.

 Would it make sense to just cache the result of the latest gettimeofday()
 call
 and use that as an approximation for wall time?
 The busier the system is, the more accurate that should be.


That sounds...risky. How will the invalidation/updation of the cache work?

How will we track the time window in which the cached value is still valid
and applicable?

My first thoughts only. I may be missing the point though.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-19 Thread Atri Sharma
On Sat, Apr 19, 2014 at 3:37 AM, Bruce Momjian br...@momjian.us wrote:


  One thing that I discussed with Merlin offline and am now concerned
 about is
  how will the actual eviction work. We cannot traverse the entire list
 and then
  find all the buffers with refcount 0 and then do another traversal to
 find the
  oldest one.

 I thought if there was memory pressure the clock sweep would run and we
 wouldn't have everything at the max counter access value.


Hmm, I see your point.

With that applicable as well, I feel that the clocksweep counting/logical
clock system shall be useful when deciding between multiple candidates for
eviction. At worst, it can serve to replace the gettimeofday() calls.

One thing I have thought of with ideas and inputs from Joshua Yanowski
offline is that we can probably have a maxheap which is on the logical
clock age of buffers. Each time clocksweep sees a buffer whose refcount has
become zero, it will push the buffer into minheap. This can be a new
representation of freelist or a new additional data structure.

This still does not solve the problem of seeing the entire list by the
clocksweep, even if that makes the eviction process O(1) with the addition
of the maxheap.

I am working on a PoC patch but am stuck on this point. My current approach
sees the entire shared buffers list to search for any candidate buffers.

Another thing that is a pain point here is the concurrency and locking
overheads of introducing a new data structure. Can the existing buffer
header spinlock handle this problem or is it hitting the granularity of the
spinlock too much?

I see some blockers for this idea still. Nevertheless, the point of
clocksweep counts as logical clocks seems to be promising,atleast
intuitively.

Thoughts and comments?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma
On Fri, Apr 18, 2014 at 7:27 AM, Peter Geoghegan p...@heroku.com wrote:

A way I have in mind about eviction policy is to introduce a way to have an
ageing factor in each buffer and take the ageing factor into consideration
when evicting a buffer.

Consider a case where a table is pretty huge and spread across multiple
pages. The querying pattern is like a time series pattern i.e. a set of
rows is queried pretty frequently for some time, making the corresponding
page hot. Then, the next set of rows is queried frequently making that page
hot and so on.

Consider a new page entering the shared buffers with refcount 1 and
usage_count 1. If that page is a part of the workload described above, it
is likely that it shall not be used for a considerable amount of time after
it has entered the buffers but will be used eventually.

Now, the current hypothetical situation is that we have three pages:

1) The page that used to be hot at the previous time window but is no
longer hot and is actually the correct candidate for eviction.
2) The current hot page (It wont be evicted anyway for now).
3) The new page which just got in and should not be evicted since it can be
hot soon (for this workload it will be hot in the next time window).

When Clocksweep algorithm runs the next time, it will see the new buffer
page as the one to be evicted (since page (1) may still have usage_count 
0 i.e. it may be 'cooling' but not 'cool' yet.)

This can be changed by introducing an ageing factor that sees how much time
the current buffer has spend in shared buffers. If the time that the buffer
has spent is large enough (relatively) and it is not hot currently, that
means it has had its chance and can be evicted. This shall save the new
page (3) from being evicted since it's time in shared buffers shall not be
high enough to mandate eviction and it shall be given more chances.

Since gettimeofday() is an expensive call and hence cannot be done in the
tight loop, we can count the number of clocksweeps the current buffer has
seen (rather, survived). This shall give us a rough idea of the estimate of
the relative age of the buffer.

When an eviction happens, all the candidates with refcount = 0 shall be
taken.Then, among them, the one with highest ageing factor shall be evicted.

Of course, there may be better ways of doing the same, but I want to
highlight the point (or possibility) of introducing an ageing factor to
prevent eviction of relatively younger pages early in the eviction process.

The overhead isnt too big. We just need to add another attribute in buffer
header for the number of clocksweeps seen (rather, survived) and check it
when an eviction is taking place.The existing spinlock for buffer headers
shall be good for protecting contention and access. The access rules can be
similar to that of usage_count.

Thoughts and comments?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma
On Sat, Apr 19, 2014 at 1:07 AM, Bruce Momjian br...@momjian.us wrote:

 On Fri, Apr 18, 2014 at 04:46:31PM +0530, Atri Sharma wrote:
  This can be changed by introducing an ageing factor that sees how much
 time the
  current buffer has spend in shared buffers. If the time that the buffer
 has
  spent is large enough (relatively) and it is not hot currently, that
 means it
  has had its chance and can be evicted. This shall save the new page (3)
 from
  being evicted since it's time in shared buffers shall not be high enough
 to
  mandate eviction and it shall be given more chances.
 
  Since gettimeofday() is an expensive call and hence cannot be done in
 the tight
  loop, we can count the number of clocksweeps the current buffer has seen
  (rather, survived). This shall give us a rough idea of the estimate of
 the
  relative age of the buffer.

 Counting clock sweeps is an intersting idea.  I think one concern was
 tracking hot buffers in cases where there is no memory pressure, and
 hence the clock sweep isn't running --- I am not sure how this would
 help in that case.


I feel that if there is no memory pressure, frankly it doesnt matter much
about what gets out and what not. The case I am specifically targeting is
when the clocksweep gets to move about a lot i.e. high memory pressure
workloads. Of course,  I may be totally wrong here.

One thing that I discussed with Merlin offline and am now concerned about
is how will the actual eviction work. We cannot traverse the entire list
and then find all the buffers with refcount 0 and then do another traversal
to find the oldest one.

Any thoughts there would be appreciated.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma
Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday
 seems silly to me: it was something quick for the prototype.

 The problem with the clocksweeps is they don’t actually track the
 progression of “time” within the PostgreSQL system.

 What’s wrong with using a transaction number or some similar sequence? It
 would accurately track “age” in the sense we care about: how long ago in
 “units of real work being done by the DB” something was added.



Well, AIUI, we only need the 'relative' age of buffers in relation to the
youngest buffer present. So, the guy who has seen the maximum amount of
clocksweeps is the guy who has been around the most.

I do not see a need for an accurate estimate of the time spent in the
buffer for any purpose right now. It may be useful in the future though.

How do you get the transaction ID? By accessing a tuple on the page and
reading it's XMIN?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-07 Thread Atri Sharma
On Mon, Apr 7, 2014 at 3:41 PM, Pavel Stehule pavel.steh...@gmail.comwrote:




 2014-04-07 11:59 GMT+02:00 Rajeev rastogi rajeev.rast...@huawei.com:

   On 07 April 2014 12:12, Pavel Stehule wrote:

  +1 for feature

 Thanks



 -1 for Oracle syntax - it is hardly inconsistent with Postgres

 We can discuss and come out with the syntax based on everyone agreement.

 Autonomous transactions should be used everywhere - not only in plpgsql



 Yes you are right. I am not planning to support only using plpgsql.
 Initially we can support this

 Using the standalone SQL-commands and then later we can enhance based on
 this infrastructure

 to be used using plpgsql, triggers.


 ok

 long time I though about this feature.

 I am thinking so this should be fully isolated transaction - it should not
 be subtransaction, because then you can break database consistency - RI



I am missing something here, but how does making it a subtransaction break
consistency? Isnt that what should actually be happening so that the
autonomous transaction's changes are actually visible till the parent
transaction commits?

What am I missing here?

Regards,

Atri


Re: [HACKERS] Risk Estimation WAS: Planner hints in Postgresql

2014-03-20 Thread Atri Sharma
On Thu, Mar 20, 2014 at 8:10 PM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Mar 18, 2014 at 2:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Atri Sharma atri.j...@gmail.com writes:
  One of the factors that leads to bad estimates is that the histogram of
 the
  values of a column maintained by the planner gets old by time and the
 data
  in the column changes. So, the histogram is no longer a quite accurate
 view
  of the data and it leads to bad selectivity.
 
  TBH, this is so far down the list of problems that it'll be a long time
  before we need to worry about it.  It's certainly not the number one
  priority for any project to model risk in the planner.
 
  The thing that I think is probably the number one problem is estimates
  that depend on an assumption of uniform distribution of sought-after rows
  among those encountered by a scan.  This is usually where bad plans for
  LIMIT queries are coming from.  We could certainly add some sort of fudge
  factor to those costs, but I'd like to have a more-or-less principled
  framework for doing so.

 I think the problem is, in some sense, more basic than that.  I think
 the kind of query we're talking about here is:

 SELECT * FROM foo WHERE unlikely ORDER BY indexed_column LIMIT 1

 Assume for the sake of argument that there are 100 rows that would be
 returned in the absence of the limit.  Let SC and TC be the startup
 cost and total cost of the index scan.  As a matter of general policy,
 we're going to say that the cost of this is SC + 0.01 * (TC - SC).
 What makes this path look appealing to the planner is that SC is small
 relative to TC.  If we knew, for example, that we weren't going to
 find the first match until 90% of the way through the index scan, then
 we could set SC = 90% * TC and, all else being equal, the planner
 would make the right decision.

 So you might think that the problem here is that we're assuming
 uniform density.  Let's say there are a million rows in the table, and
 there are 100 that match our criteria, so the first one is going to
 happen 1/10,000'th of the way through the table.  Thus we set SC =
 0.0001 * TC, and that turns out to be an underestimate if the
 distribution isn't as favorable as we're hoping.  However, that is NOT
 what we are doing.  What we are doing is setting SC = 0.  I mean, not
 quite 0, but yeah, effectively 0. Essentially we're assuming that no
 matter how selective the filter condition may be, we assume that it
 will match *the very first row*.



Cannot we reuse the same histogram we have in the planner right now for
this? I mean, AFAIK, the heuristic we have is that we divide the histogram
into equal size buckets and then find the bucket in which our predicate
value lies, then take some part of that bucket and the rest of the buckets
before that bucket,right?

So, suppose a query is SELECT * FROM table WHERE a  10, we shall find the
bucket that 10 lies in, right?

Now, why cannot we take the estimate of all the buckets behind the bucket
in which our value is present? Will that estimate not give us the fraction
of tuples that are expected to be before the first matching row?

Its pretty wild, but I wanted to know if my understanding of this scenario
is correct or not.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Risk Estimation WAS: Planner hints in Postgresql

2014-03-20 Thread Atri Sharma
On Thu, Mar 20, 2014 at 8:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  Now, why cannot we take the estimate of all the buckets behind the bucket
  in which our value is present? Will that estimate not give us the
 fraction
  of tuples that are expected to be before the first matching row?

 Uh, no, not unless you assume that the table happens to be perfectly
 sorted by the column's value.




Yes, that is true. So, if an attribute has an index present, can we do this
somehow?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Optimized out tags

2014-03-20 Thread Atri Sharma
On Fri, Mar 21, 2014 at 9:49 AM, Rajashree Mandaogane 
rajashree@gmail.com wrote:

 What can be done to get rid of the 'optimized out' tags while debugging?


Did you use the appropriate debugging flags when running ./configure?

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Risk Estimation WAS: Planner hints in Postgresql

2014-03-18 Thread Atri Sharma
On Tue, Mar 18, 2014 at 11:43 PM, Josh Berkus j...@agliodbs.com wrote:


  On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yeah.  I would like to see the planner's cost estimates extended to
  include some sort of uncertainty estimate, whereupon risk-averse people
  could ask it to prefer low-uncertainty plans over high-uncertainty ones
  (the plans we typically choose for ORDER BY ... LIMIT queries being
 great
  examples of the latter).  But it's a long way from wishing that to
 making
  it so.  Right now it's not even clear (to me anyway) how we'd measure or
  model such uncertainty.



I have been thinking of some ways to have a risk estimate of each
selectivity that our planner gives. I think a way to do it is as follows:

One of the factors that leads to bad estimates is that the histogram of the
values of a column maintained by the planner gets old by time and the data
in the column changes. So, the histogram is no longer a quite accurate view
of the data and it leads to bad selectivity.

One thing we can try to do is to add a factor of error that we feel the
selectivity given can have. This allows us to factor in the probability
that the data changed and the estimate of the difference of the current
histogram and the histogram of the actual data currently present in the
column in the table.

We can use Central Limit Theorem (
http://en.wikipedia.org/wiki/Central_limit_theorem). Essentially, what the
theorem says is that given a distribution that has finite variance and
finite mean, we can take random independent samples from the data and
calculate the standard deviation and the mean of the sample. If we have
large enough number of samples and if we plot the mean and SD, they would
follow a normal distribution.

What is interesting is that this can allow us to predict the SD of a given
dataset from the curve and the SD should be directly proportional to the
deviation it has from the given planner histogram.

I am no mathematician hence its hard for me to explain. I think this link
[1] will be more helpful.

So, we can have a probability value for the random variable and that shall
model the confidence we have in our estimate.

I may be wrong in some parts but I hope I have been able to convey the
general idea.

If this idea develops, I shall be happy to work on this but my hands are
full in ROLLUPS right now, so for my part it shall take time. I just want
to float the idea and get a general feel about the idea right now.

Please let me know your comments and feedback on this.

Regards,

Atri

[1]: http://www.theriac.org/DeskReference/viewDocument.php?id=177
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Atri Sharma

  That's precisely what risk estimation was about.

 Yeah.  I would like to see the planner's cost estimates extended to
 include some sort of uncertainty estimate, whereupon risk-averse people
 could ask it to prefer low-uncertainty plans over high-uncertainty ones
 (the plans we typically choose for ORDER BY ... LIMIT queries being great
 examples of the latter).  But it's a long way from wishing that to making
 it so.  Right now it's not even clear (to me anyway) how we'd measure or
 model such uncertainty.


 Well, currently, selectivity estimates based on MCV should be pretty
 low-uncertainty, whereas certainty of other estimates could be modeled as a
 random variable if ANALYZE gathered a few statistical moments (for
 variables that are prone to that kind of statistical analysis).

 That alone could improve things considerably, and statistical info could
 be propagated along expressions to make it possible to model uncertainty in
 complex expressions as well.



That is a sort of solution that I proposed yesterday on the mailing list.
The solution essentially takes lots of samples of the data and then plots
the mean and standard deviation of the independent samples to get the
probability of the histogram selectivity estimate.


The problem is multi faceted (outdated stats, bad guess at distribution of
data, cases Merlin mentioned before (CASE statements, COALESCE statements
etc.). Finding a general solution to this problem shall require a lot of
research and time.

I agree with Tom, we should focus on some of the main problems we have in
that domain and then try to solve them first.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 9:22 PM, Rajmohan C csrajmo...@gmail.com wrote:

 I am implementing Planner hints in Postgresql to force the optimizer to
 select a particular plan for a query on request from sql input. I am having
 trouble in modifying the planner code. I want to create a path node of hint
 plan and make it the plan to be used by executor. How do I enforce this ?
 Should I create a new Plan for this ..how to create a plan node which can
 be then given directly to executor for a particular query?




Planner hints have been discussed a lot before as well and AFAIK there is a
wiki page that says why we shouldnt implement them. Have you referred to
them?

Please share if you have any new points on the same.

Regards,

Atri


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 David Johnston pol...@yahoo.com writes:
  Need to discuss the general why before any meaningful help on the
 how is
  going to be considered by hackers.

 Possibly worth noting is that in past discussions, we've concluded that
 the most sensible type of hint would not be use this plan at all, but
 here's what to assume about the selectivity of this WHERE clause.
 That seems considerably less likely to break than any attempt to directly
 specify plan details.


Isnt using a user given value for selectivity a pretty risky situation as
it can horribly screw up the plan selection?

Why not allow the user to specify an alternate plan and have the planner
assign a higher preference to it during plan evaluation? This shall allow
us to still have a fair evaluation of all possible plans as we do right now
and yet have a higher preference for the user given plan during evaluation?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 10:58 PM, Stephen Frost sfr...@snowman.net wrote:

 * Atri Sharma (atri.j...@gmail.com) wrote:
  Isnt using a user given value for selectivity a pretty risky situation as
  it can horribly screw up the plan selection?
 
  Why not allow the user to specify an alternate plan and have the planner

 Uh, you're worried about the user given us a garbage selectivity, but
 they're going to get a full-blown plan perfect?



I never said that the user plan would be perfect. The entire point of
planner hints is based on the assumption that the user knows more about the
data than the planner does hence the user's ideas about the plan should be
given a preference. Garbage selectivity can screw up  the cost estimation
of *all* our possible plans and we could end up preferring a sequential
scan over an index only scan for e.g. I am trying to think of ways that
give some preference to a user plan but do not interfere with the cost
estimation of our other potential plans.



 What exactly would such a preference look like?  A cost modifier?
 We'd almost certainly have to make that into a GUC or a value passed in
 as part of the query, with a high likelihood of users figuring out how
 to use it to say use my plan forever and always..


A factor that we experimentally determine by which we decrease the cost of
the user specified plan so that it gets a higher preference in the plan
evaluation.

Of course, this is not a nice hack. Specifically after our discussion on
IRC the other day, I am against planner hints, but if we are just
discussing how it could be done, I  could think of some ways which I listed.

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
The larger question to answer first is whether we want to implement

 something that is deterministic...

 How about just dropping the whole concept of hinting and provide a way
 for
 someone to say use this plan, or die trying.  Maybe require it be used in
 conjunction with named PREPAREd statements:



You mean taking away the entire concept of query planning and cost
estimation? Thats like replacing the optimizer with DBA decision and I am
not at all comfortable with that idea. That are only my thoughts though.




 PREPARE s1 (USING /path/to/plan_def_on_server_or_something_similar) AS
 SELECT ...;

 Aside from whole-plan specification I can definitely see where join/where
 specification could be useful if it can overcome the current limitation of
 not being able to calculate inter-table estimations.



Prepare plans use a generic plan for the execution. Replacing it with a
totally user defined plan does not seem to be clean.

The crux is that IMHO planner hints are a bad way of trying to circumvent
the need for cross-column statistics. We should do cross-column statistics
done and ignore planner hints completely.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
 There's lots of ways to implement planner hints, but I fail to see the
 point in discussing how to implement something we actively don't want.



+1. The original poster wanted a way to implement it as a personal project
or something ( I think he only replied to me, not the entire list).

Planner hints should be ignored :)

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
On Mon, Mar 17, 2014 at 10:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Possibly worth noting is that in past discussions, we've concluded that
  the most sensible type of hint would not be use this plan at all, but
  here's what to assume about the selectivity of this WHERE clause.
  That seems considerably less likely to break than any attempt to
 directly
  specify plan details.

  Isnt using a user given value for selectivity a pretty risky situation as
  it can horribly screw up the plan selection?

 And forcing a plan to be used *isn't* that?  Please re-read the older
 threads, since you evidently have not.


I never said that we force a plan to be used. I just said that we should
increase the preference for a user given plan and not interfere in the cost
estimation of the other potential plans and the evaluation of the final
selected plan.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma
There's a big difference between saying to the planner, Use plan X
 vs Here's some information describing the data supporting choosing
 plan X intelligently.  The latter allows for better plans in the face
 of varied/changing data, integrates with the planner in natural way,
 and encourages users to understand how the planner works.



+1

I was thinking of varying the 'weight' of a user defined plan by an fixed
experimental factor to tell the planner to give higher/lower preference to
this plan, but after your idea above, I think Stephen's point of
introducing a GUC for the factor is the only way possible and I agree with
him on the point that eventually the user will figure out a way to force
usage of his plan using the GUC.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*


  1   2   3   4   >