from:"Atri Sharma"

Re: [HACKERS] GSoC 2017

2017-01-10 Thread Atri Sharma

Count me in as a mentor

On 10-Jan-2017 3:24 PM, "Alexander Korotkov" 
wrote:

> Hi all!
>
> In 2016 PostgreSQL project didn't pass to GSoC program.  In my
> understanding the reasons for that are following.
>
> 1. We did last-minute submission of our application to GSoC.
> 2. In 2016 GSoC application form for mentoring organizations has been
> changed.  In particular, it required more detailed information about
> possible project.
>
> As result we didn't manage to make a good enough application that time.
> Thus, our application was declined. See [1] and [2] for details.
>
> I think that the right way to manage this in 2017 would be to start
> collecting required information in advance.  According to GSoC 2017
> timeline [3] mentoring organization can submit their applications from
> January 19 to February 9.  Thus, now it's a good time to start collecting
> project ideas and make call for mentors.  Also, we need to decide who would
> be our admin this year.
>
> In sum, we have following questions:
> 1. What project ideas we have?
> 2. Who are going to be mentors this year?
> 3. Who is going to be project admin this year?
>
> BTW, I'm ready to be mentor this year.  I'm also open to be an admin if
> needed.
>
> [1] https://www.postgresql.org/message-id/flat/CAA-
> aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-
> PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com
> [3] https://developers.google.com/open-source/gsoc/timeline
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>

Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Atri Sharma

On Tue, Mar 1, 2016 at 9:30 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:52 AM, Tom Lane  wrote:
> > Aleksander Alekseev  writes:
> >> There are applications that create and delete a lot of temporary
> >> tables. Currently PostgreSQL doesn't handle such a use case well.
>
>
> Yeah, I have a really hard time believing this can ever work.  There
> are MANY catalog tables potentially involved here - pg_class,
> pg_attribute, pg_attrdef, pg_description, pg_trigger, ... and loads
> more - and they all can have OID references to each other.  If you
> create a bunch of fake relcache and syscache entries, you're going to
> need to give them OIDs, but where will those OIDs come from?  What
> guarantees that they aren't in use, or won't be used later while your
> temporary object still exists?  I think making this work would make
> parallel query look like a minor feature.
>
>
Fair point, that means inventing a whole new OID generation structure..



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Atri Sharma

>
>
>
> I think you have no concept how invasive that would be.  Tables not
> represented in the catalogs would be a disaster, because *every single
> part of the backend* would have to be modified to deal with them as
> a distinct code path --- parser, planner, executor, loads and loads
> of utility commands, etc.  I do not think we'd accept that.  Worse yet,
> you'd also break client-side code that expects to see temp tables in
> the catalogs (consider psql \d, for example).
>
>
I might be missing a point here, but I really do not see why we would need
an alternate code path for every part of the backend. I agree that all
utility commands, and client side code would break, but if we abstract out
the syscache API and/or modify only the syscache's underlying access paths,
then would the backend really care about whether the tuple comes from
physical catalogs or in memory catalogs?



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] PROPOSAL: Fast temporary tables

2016-03-01 Thread Atri Sharma

On Tue, Mar 1, 2016 at 8:55 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Hello
>
> There are applications that create and delete a lot of temporary
> tables. Currently PostgreSQL doesn't handle such a use case well.
> Consider the following benchmark/example.
>
>
FWIW, I and Pavel have been spending some time discussing global temporary
tables, and I have been taking a shot at it. This is pretty inline with
that.

The approach you suggest sounds fine. I am personally a tad concerned about
the extra overhead of the locks and sanity of concurrency for the in memory
cache as well. Something I am not too clear about (I need to read your
email again), is about cache invalidation. Do all pages stay in the memory
always?

Also, are you proposing to change the behaviour of syscaches to not write
out those pages to disk? Or do you create a new set of caches?

Regards,

Atri

Re: [HACKERS] about google summer of code 2016

2016-02-18 Thread Atri Sharma

On 19 Feb 2016 8:30 am, "Chapman Flack"  wrote:
>
> On 02/18/16 19:35, Amit Langote wrote:
>
> > Apparently, the deadline is: February 20, 2016 at 04:00 (+0900 UTC)
> >
> > https://summerofcode.withgoogle.com/
>
> For anybody finding that web site as anti-navigable as I did, here
> are more direct links to the actual rules, and terms of agreement
> for the various participants:
>
> https://summerofcode.withgoogle.com/rules/
> https://summerofcode.withgoogle.com/terms/org
> https://summerofcode.withgoogle.com/terms/mentor
> https://summerofcode.withgoogle.com/terms/student
>
> Here is a question: does it ever happen that PostgreSQL acts as
> the org for a project that is PostgreSQL-related but isn't
> directly PGDG-led?
>
> ... there are definitely interesting and promising areas for further
> development in PL/Java beyond what I would ever have time to tackle
> solo, and I could easily enjoy mentoring someone through one or
> another of them over a summer, which could also help reinvigorate
> the project and get another developer familiar with it at a
> non-superficial level.  While I could easily see myself mentoring,
> I think it would feel like overkill to apply individually as a
> one-trick 'organization'.
>
> I see that there was a "based on PL/Java" GSoC'12 project, so maybe
> there is some room for non-core ideas under the PostgreSQL ægis?

FWIW it wasn't a PL/Java based project per se, it was a JDBC FDW.

I agree, there might be scope for non core projects and PL/Java sounds like
a good area.

Regards,

Atri

Re: [HACKERS] 2016-01 Commitfest

2016-01-18 Thread Atri Sharma

On Mon, Jan 18, 2016 at 11:05 PM, Alvaro Herrera 
wrote:

> Two weeks into the commitfest, things have moved a bit:
>
>  Needs review: 53.
>  Waiting on Author: 20.
>  Ready for Committer: 10.
>  Committed: 16.
> Total: 99.  https://commitfest.postgresql.org/8/
>
> We have two committed patches since last report -- not a lot for a whole
> week.  We've managed to review a few patches though: WoA went from 14 to
> 20.  Patch authors have something to do rather just twiddle their
> thumbs^W^W^W review others' patches.
>
> Remember: reviewing patches where somebody else has marked themselves as
> reviewer is still welcome.  You could spot things the other person might
> miss.
>
> Committers: There are patches for you to push!  Go pick one up right now!
> (And remember: marking yourself as committer for a patch makes it
> unlikely for anybody else to review/commit it.  If you do not intend to
> review+commit such a patch soon, it's probably better to remove yourself
> as committer.)
>
> Many patches seem stalled without review:
>Table partition + join pushdown
> https://commitfest.postgresql.org/8/325/
>multivariate statistics
> https://commitfest.postgresql.org/8/450/
>Implement failover on libpq level
> https://commitfest.postgresql.org/8/389/
>Unique Joins
> https://commitfest.postgresql.org/8/129/
>Replace buffer manager spinlock with atomic operations
>
> https://commitfest.postgresql.org/8/408/
>
>
I am guilty of ignoring unique joins patch. I am still up for reviewing it
later this week or weekend, but will not stall anybody else from picking it
up.

Apologies!


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma

On Mon, Jan 11, 2016 at 10:48 PM, Atri Sharma  wrote:

> Sorry, I missed this email.
>
> I was specifically targeting accessing tables inside Node evaluation hence
> do not want to add new nodes.
>
> Thanks for your inputs!
>
> Regards,
>
> Atri
>
> On Tue, Jan 5, 2016 at 11:43 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>> On 2016/01/05 14:30, Atri Sharma wrote:
>> > On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <
>> langote_amit...@lab.ntt.co.jp>
>> >> On 2016/01/05 3:53, Atri Sharma wrote:
>> >>> I was wary to use SPI inside the executor for node evaluation
>> functions.
>> >>> Does it seem safe?
>> >>
>> >> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
>> >> that you are talking about? I guess you'd know to use ExecProcNode() or
>> >> ExecEvalExpr() for them, respectively.
>> >>
>> > I fail to see the relevance of which node is getting evaluated (its a
>> Plan
>> > node BTW) for this question. The concern I had was around using SPI
>> inside
>> > executor and its fail safety.
>>
>> Sorry, I may have misunderstood your question(s). Seeing your first
>> question in the thread, I see that you're looking to query non-system
>> tables within the executor. AFAIU, most of the processing within executor
>> takes the form of some node in some execution pipeline of a plan tree.
>> Perhaps, you're imagining some kind of node, subnode or some such. By the
>> way, some places like ATRewriteTable(), validateCheckConstraint() scan
>> user tables directly using low-level utilities within a dummy executor
>> context. I think Jim suggested something like that upthread.
>>
>>

Sorry for top posting.

Regards,

Atri

Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma

On Tue, Jan 5, 2016 at 11:09 AM, Tom Lane  wrote:

> Atri Sharma  writes:
> > I fail to see the relevance of which node is getting evaluated (its a
> Plan
> > node BTW) for this question. The concern I had was around using SPI
> inside
> > executor and its fail safety.
>
> The code path executor -> PL function -> SPI certainly works, so
> presumably omitting the intermediate PL function could still work.
> Whether it's a good idea is another question entirely.  I do not
> offhand see a good reason why knowledge of non-system tables should
> exist in the core executor; so what is the need to use SPI?
>
>
Thanks!

This was a weird requirement and managed to work around it but I will keep
this hack for future reference.

Regards,

Atri

Re: [HACKERS] Accessing non catalog table in backend

2016-01-11 Thread Atri Sharma

Sorry, I missed this email.

I was specifically targeting accessing tables inside Node evaluation hence
do not want to add new nodes.

Thanks for your inputs!

Regards,

Atri

On Tue, Jan 5, 2016 at 11:43 AM, Amit Langote  wrote:

> On 2016/01/05 14:30, Atri Sharma wrote:
> > On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp>
> >> On 2016/01/05 3:53, Atri Sharma wrote:
> >>> I was wary to use SPI inside the executor for node evaluation
> functions.
> >>> Does it seem safe?
> >>
> >> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
> >> that you are talking about? I guess you'd know to use ExecProcNode() or
> >> ExecEvalExpr() for them, respectively.
> >>
> > I fail to see the relevance of which node is getting evaluated (its a
> Plan
> > node BTW) for this question. The concern I had was around using SPI
> inside
> > executor and its fail safety.
>
> Sorry, I may have misunderstood your question(s). Seeing your first
> question in the thread, I see that you're looking to query non-system
> tables within the executor. AFAIU, most of the processing within executor
> takes the form of some node in some execution pipeline of a plan tree.
> Perhaps, you're imagining some kind of node, subnode or some such. By the
> way, some places like ATRewriteTable(), validateCheckConstraint() scan
> user tables directly using low-level utilities within a dummy executor
> context. I think Jim suggested something like that upthread.
>
> Thanks,
> Amit
>
>
>


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Accessing non catalog table in backend

2016-01-04 Thread Atri Sharma

On Tue, Jan 5, 2016 at 9:54 AM, Amit Langote 
wrote:

> On 2016/01/05 3:53, Atri Sharma wrote:
> > I was wary to use SPI inside the executor for node evaluation functions.
> > Does it seem safe?
>
> What is "node evaluation functions"? Is it "Plan" nodes or "Expr" nodes
> that you are talking about? I guess you'd know to use ExecProcNode() or
> ExecEvalExpr() for them, respectively.
>
> Thanks,
> Amit
>
>
>
I fail to see the relevance of which node is getting evaluated (its a Plan
node BTW) for this question. The concern I had was around using SPI inside
executor and its fail safety.

Regards,

Atri

Re: [HACKERS] Accessing non catalog table in backend

2016-01-04 Thread Atri Sharma

Thanks!

I was wary to use SPI inside the executor for node evaluation functions.
Does it seem safe?

Regards,

Atri
On 5 Jan 2016 12:20 am, "Jim Nasby"  wrote:

> On 1/4/16 12:07 PM, Atri Sharma wrote:
>
>> Hi All,
>>
>> I wanted to check if it is possible to query a non catalog table in
>> backend.
>>
>> I understand that cql interface is only for catalog querying hence it is
>> not usable for this purpose per se.
>>
>
> AFAIK it's possible to do it with low level routines, but I think unless
> you have a really good reason to go that route you're much better off just
> using SPI. If it's good enough for plpgsql... :)
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>

[HACKERS] Accessing non catalog table in backend

2016-01-04 Thread Atri Sharma

Hi All,

I wanted to check if it is possible to query a non catalog table in backend.

I understand that cql interface is only for catalog querying hence it is
not usable for this purpose per se.

Please advice.

Regards,

Atri

Re: [HACKERS] Threads in PostgreSQL

2015-12-20 Thread Atri Sharma

On Mon, Dec 21, 2015 at 11:51 AM, sri harsha 
wrote:

> Hi,
>
>Is it possible to use threads in Postgresql ?? I am using threads in my
> foreign data wrapper and i get the following error when i use the threads .
>
> *ERROR:  stack depth limit exceeded*
> *HINT:  Increase the configuration parameter "max_stack_depth" (currently
> 2048kB), after ensuring the platform's stack depth limit is adequate.*
>
>

PostgreSQL is a process-per-backend model.

Can you elaborate on what you are using the threads for?

Re: [HACKERS] Question concerning XTM (eXtensible Transaction Manager API)

2015-11-16 Thread Atri Sharma

> I think the general idea is that if Commit is WAL logged, then the
> operation is considered to committed on local node and commit should
> happen on any node, only once prepare from all nodes is successful.
> And after that transaction is not supposed to abort.  But I think you are
> trying to optimize the DTM in some way to not follow that kind of
protocol.
> By the way, how will arbiter does the recovery in a scenario where it
> crashes, won't it need to contact all nodes for the status of in-progress
or
> prepared transactions?
> I think it would be better if more detailed design of DTM with respect to
> transaction management and recovery could be updated on wiki for having
> discussion on this topic.  I have seen that you have already updated many
> details of the system, but still the complete picture of DTM is not clear.

I agree.

I have not been following this discussion but from what I have read above I
think the recovery model in this design is broken. You have to follow some
protocol, whichever you choose.

I think you can try using something like Paxos,  if you are looking at a
higher reliable model but don't want the overhead of 3PC.

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2015-11-11 Thread Atri Sharma

> I'm thinking the client should get some kind of a token back from the
commit, and it could use the token on the standby, to wait for that commit
to be applied. The token could be just the XID, or the LSN of the commit
record. Or the application could generate the token and pass it to the
server in the commit, similar to how 2PC works. So the interaction would be
something like:
>
> In master:
> BEGIN;
> INSERT INTO FOO ...;
> COMMIT;
> Server returns: COMMITted with token 1234
>
> Later, in standby:
> BEGIN WAIT FOR COMMIT 1234 TO BE VISIBLE;
> SELECT * FROM foo;

+1.

The LSN should be good enough IMO.

Re: [HACKERS] count_nulls(VARIADIC "any")

2015-08-13 Thread Atri Sharma

On Thu, Aug 13, 2015 at 12:55 PM, Pavel Stehule 
wrote:

>
>
> 2015-08-13 9:21 GMT+02:00 Marko Tiikkaja :
>
>> On 8/13/15 9:18 AM, Shulgin, Oleksandr wrote:
>>
>>> nnulls()
>>>
>>
>> I think I'd prefer num_nulls() over that.
>>
>
> can be
>
> what about similar twin function num_nonulls()?
>
>
+1

Re: [HACKERS] Commitfest remaining "Needs Review" items

2015-08-10 Thread Atri Sharma

>
> * Unique Joins
>>
>
> Still needs to be reviewed. Any volunteers?
>
>
>
Can take this one up, if its within my limits.

Re: [HACKERS] Updatable view?

2015-07-30 Thread Atri Sharma

On 31 Jul 2015 11:59, "Tatsuo Ishii"  wrote:
>
> > On 31 Jul 2015 10:15, "Tatsuo Ishii"  wrote:
> >>
> >> > I think it would be nice to have... but not to the point of working
on
> >> > it myself.
> >> >
> >> > Might be worth an email to -general to see how many people have
> >> > immediate use for it.
> >>
> >> What I am thinking about is,
> >>
> >> 1) Implement certain class of updatable views allowed in SQL:1999
> >>(UNION ALL, natural joins)
> >>
> >> 2) Anything beyond #1 (I have no idea for now)
> >>
> >> Let me see how people are interested in...
> >>
> >
> > How does the standard define it? Do they also follow the same MVCC
> > semantics as normal tables?
>
> In my understanding there's no such concept like MVCC in the standard.
> Anyway in our implementation, we should keep the MVCC semantics of
> course.
>

Yes I meant our internal MVCC semantics. I will have to look at the way
MVCC handles views for exact logic though

> > I am concerned that we may end up losing read
> > performance for views if we implement this (unless I am missing
something)
>
> Why do updatable views lose read performance? I thought the only
> performance concern will be in the update/delete/insert operations.

I meant update, sorry. Pre coffee mails tend to be incorrect :)

Re: [HACKERS] Memory Accounting v11

2015-07-15 Thread Atri Sharma

On Wed, Jul 15, 2015 at 12:57 PM, Jeff Davis  wrote:

> On Tue, 2015-07-14 at 16:19 -0400, Robert Haas wrote:
> > tuplesort.c does its own accounting, and TBH that seems like the right
> > thing to do here, too.  The difficulty is, I think, that some
> > transition functions use an internal data type for the transition
> > state, which might not be a single palloc'd chunk.  But since we can't
> > spill those aggregates to disk *anyway*, that doesn't really matter.
>
> So would it be acceptable to just ignore the memory consumed by
> "internal", or come up with some heuristic?
>
> Regards,
> Jeff Davis
>
>
I think a heuristic would be more suited here and ignoring memory
consumption for internal types means that we are not making the memory
accounting useful for a set of usecases.



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] [idea] table partition + hash join

2015-06-10 Thread Atri Sharma

On Wed, Jun 10, 2015 at 2:16 PM, Amit Langote  wrote:

>
> Perhaps the qual needs to be pushed all the way down
> to the Hash's underlying scan if that makes sense.
>


And that is a Pandora's box of troubles IMHO unless done in a very careful
manner.

Re: [HACKERS] Improving GEQO

2015-05-28 Thread Atri Sharma

On Fri, May 29, 2015 at 12:59 AM, Merlin Moncure  wrote:

> On Wed, May 27, 2015 at 3:06 PM, boix  wrote:
> > Hello, my partner and me are working with the goal of improve the GEQO's
> > performance, we tried with Ant Colony Optimization, but it does not
> improve,
> > actually we are trying with a new variant of Genetic Algorithm,
> specifically
> > Micro-GA. This algorithm finds a better solution than GEQO in less time,
> > however the total query execution time is higher. The fitness is
> calculated
> > by geqo_eval function. Does anybody know why this happens?
> >
> > We attach the patch made with the changes in postgresql-9.2.0.
>
> can you submit more details?  for example 'explain analyze' (perhaps
> here: http://explain.depesz.com/) of the plans generated GEQO vs GA vs
> stock?  It sounds like you might be facing an estimation miss which is
> not really an issue a better planner could solve.
>
> That said, assuming you're getting 'better' plans in less time suggest
> you might be on to something.
>
> merlin
>
>

What sort of tests are you running? I suspect that anything which is not
too well thought out and tested might end up performing well only on small
subset of tests.

Also, what is the consistency of the plans generated? If you are only
targeting planning time, I feel it might be of lesser value. However, if
you can get large order joins to be executed in a near optimal (brute
force) solution, you might be on to something.

Something I would like to see done is remove the dead code that is present
in existing GEQO. This might alone lead to lesser compilation times.

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Support UPDATE table SET(*)=...

2015-02-13 Thread Atri Sharma

Hi all,

Sorry for the delay.

Please find attached latest version of UPDATE (*) patch.The patch
implements review comments and Tom's gripe about touching
transformTargetList is addressed now. I have added regression tests and
simplified parser representation a bit.

Regards,

Atri
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 35b0699..1f68bdf 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -25,7 +25,9 @@ PostgreSQL documentation
 UPDATE [ ONLY ] table_name [ * ] [ [ AS ] alias ]
 SET { column_name = { expression | DEFAULT } |
   ( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) |
-  ( column_name [, ...] ) = ( sub-SELECT )
+  ( column_name [, ...] ) = ( sub-SELECT ) |
+  ( * [, ...] ) = ( { expression | DEFAULT } [, ...] ) |
+  ( * [, ...] ) = ( sub-SELECT )
 } [, ...]
 [ FROM from_list ]
 [ WHERE condition | WHERE CURRENT OF cursor_name ]
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index a68f2e8..6d08dbd 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1937,6 +1937,101 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
 	nsitem->p_lateral_only = false;
 	nsitem->p_lateral_ok = true;
 
+	/*
+	 * Check if (SET(*) = SELECT ...)  is present. If it is present we
+	 * need to resolve and populate the remaining needed MultiAssignRefs in the
+	 * target list. We modify target list in place and add needed MultiAssignRefs.
+	 */
+	if (list_length(stmt->targetList) == 1)
+	{
+		ResTarget *current_val = linitial(stmt->targetList);
+
+		/* Currently there is no way that ResTarget node's name value in UPDATE command
+		 * is set to NULL except for UPDATE SET (*) case.
+		 * Hence we can safely depend on name value being NULL as a check for
+		 * presence of UPDATE SET (*) case.
+		 */
+		if (current_val->name == NULL)
+		{
+			List *rel_cols_list;
+			List *expanded_tlist = NULL;
+			List *sub_list = NULL;
+			ListCell *lc_val;
+			ListCell *lc_relcol;
+			int rteindex = 0;
+			int sublevels_up = 0;
+			int i = 0;
+
+			rteindex = RTERangeTablePosn(pstate, pstate->p_target_rangetblentry,
+		 &sublevels_up);
+
+			expandRTE(pstate->p_target_rangetblentry, rteindex, sublevels_up,
+	  current_val->location, false,
+	  &(rel_cols_list), NULL);
+
+
+			/* SET(*) = (SELECT ...) case */
+			if (IsA(current_val->val, MultiAssignRef))
+			{
+MultiAssignRef *orig_val = (MultiAssignRef *) (current_val->val);
+
+orig_val->ncolumns = list_length(rel_cols_list);
+
+Assert(sub_list == NULL);
+
+sub_list = list_make1(orig_val);
+
+/* Change targetlist to have corresponding ResTarget nodes
+ * as corresponding to the columns in target relation */
+for (i = 1;i < list_length(rel_cols_list);i++)
+{
+	MultiAssignRef *r = makeNode(MultiAssignRef);
+
+	r->source = orig_val->source;
+	r->colno = i + 1;
+	r->ncolumns = orig_val->ncolumns;
+
+	lappend(sub_list, r);
+}
+			}
+			else if (IsA(current_val->val, List))  /* SET(*) = (val, val,...) case */
+			{
+
+Assert(sub_list == NULL);
+sub_list = (List *) (current_val->val);
+			}
+			else
+			{
+elog(ERROR, "Unknown type in UPDATE command");
+			}
+
+			if (list_length(rel_cols_list) != list_length(sub_list))
+elog(ERROR, "number of columns does not match number of values");
+
+			/* Change targetlist to have corresponding ResTarget nodes
+			 * as corresponding to the columns in target relation */
+			forboth(lc_val, sub_list, lc_relcol, rel_cols_list)
+			{
+ResTarget *current_res = makeNode(ResTarget);
+
+current_res->name = strVal(lfirst(lc_relcol));
+current_res->val = (Node *) (lfirst(lc_val));
+
+if (expanded_tlist == NULL)
+{
+	expanded_tlist = list_make1(current_res);
+}
+else
+{
+	lappend(expanded_tlist, current_res);
+}
+			}
+
+			stmt->targetList = expanded_tlist;
+		}
+	}
+
+
 	qry->targetList = transformTargetList(pstate, stmt->targetList,
 		  EXPR_KIND_UPDATE_SOURCE);
 
@@ -1982,18 +2077,20 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
 			continue;
 		}
 		if (origTargetList == NULL)
-			elog(ERROR, "UPDATE target count mismatch --- internal error");
+elog(ERROR, "UPDATE target count mismatch --- internal error");
+
 		origTarget = (ResTarget *) lfirst(origTargetList);
 		Assert(IsA(origTarget, ResTarget));
 
 		attrno = attnameAttNum(pstate->p_target_relation,
-			   origTarget->name, true);
+   origTarget->name, true);
+
 		if (attrno == InvalidAttrNumber)
 			ereport(ERROR,
 	(errcode(ERRCODE_UNDEFINED_COLUMN),
 	 errmsg("column \"%s\" of relation \"%s\" does not exist",
 			origTarget->name,
-		 RelationGetRelationName(pstate->p_target_relation)),
+			RelationGetRelationName(pstate->p_target_relation)),
 	 parser_errposition(pstate, origTarget->location)));
 
 		updateTarge

Re: [HACKERS] [pgsql-advocacy] GSoC 2015 - mentors, students and admins.

2015-02-09 Thread Atri Sharma

I am up for mentoring again.
On 10 Feb 2015 02:23, "Thom Brown"  wrote:

> Hi all,
>
> Google Summer of Code 2015 is approaching.  I'm intending on registering
> PostgreSQL again this year.
>
> Before I do that, I'd like to have an idea of how many people are
> interested in being either a student or a mentor.
>
> I've volunteered to be admin again, but if anyone else has a strong
> interest of seeing the projects through this year, please let yourself be
> known.
>
> Who would be willing to mentor projects this year?
>
> Project ideas are welcome, and I've copied many from last year's
> submissions into this year's wiki page.  Please feel free to add more (or
> delete any that stand no chance or are redundant):
> https://wiki.postgresql.org/wiki/GSoC_2015
>
> Students can find more information about GSoC here:
> http://www.postgresql.org/developer/summerofcode
>
> Thanks
>
> Thom
>

Re: [HACKERS] Implementation of global temporary tables?

2015-02-02 Thread Atri Sharma

On Mon, Feb 2, 2015 at 6:34 PM, Pavel Stehule 
wrote:

>
>
> 2015-02-02 13:36 GMT+01:00 Atri Sharma :
>
>>
>>> > 1. Main catalogue will be stable.
>>> > 2. There is not necessary to implement new storage and it can helps
>>> with
>>> > transaction support.
>>>
>>> The amount of complexity that'd be involved to store catalog data in a
>>> separate relation around the caches and accesses would be significant. I
>>> don't think that's a realistic option.
>>>
>>
>> Not to mention the problems we might end up in. We still have corner
>> cases in our cache code, and a new heap on top of it all might be just too
>> painful.
>>
>>>
>>> > > > 3.c - store ephemeral metadata only in memory without MVCC
>>> > >
>>> > > I think that's not an option. That'd end up being a massive amount of
>>> > > duplication at a low rate of functionality.
>>> > >
>>> >
>>> > I don't plan to implement a storage - I expect only few functions for
>>> > store/read data from session memory context
>>>
>>> What does it have to do with temp tables then?
>>>
>>
>> I think what Pavel means here is that we do not need a full fledged heap
>> layer and rather only a minimal API from a per session memory context.
>> However, that might be still as painful because we will eventually end up
>> inventing mechanisms for syscache and typcache to work with this storage,
>> which IMO is the biggest pain point around this idea.
>>
>
> It should be solvable - I see another risk - if we accelerate a work with
> temp tables, then 4 byte oid should not be enough.
>
>
>>
>>
Hrm, that might well be true. It might be worth the effort to find a better
way to materialize global temp tables then, like having a single OID and
only materializing a relfilenode for a session when the session inserts
into the temp table. Not sure here at all...



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Implementation of global temporary tables?

2015-02-02 Thread Atri Sharma

>
>
> > 1. Main catalogue will be stable.
> > 2. There is not necessary to implement new storage and it can helps with
> > transaction support.
>
> The amount of complexity that'd be involved to store catalog data in a
> separate relation around the caches and accesses would be significant. I
> don't think that's a realistic option.
>

Not to mention the problems we might end up in. We still have corner cases
in our cache code, and a new heap on top of it all might be just too
painful.

>
> > > > 3.c - store ephemeral metadata only in memory without MVCC
> > >
> > > I think that's not an option. That'd end up being a massive amount of
> > > duplication at a low rate of functionality.
> > >
> >
> > I don't plan to implement a storage - I expect only few functions for
> > store/read data from session memory context
>
> What does it have to do with temp tables then?
>

I think what Pavel means here is that we do not need a full fledged heap
layer and rather only a minimal API from a per session memory context.
However, that might be still as painful because we will eventually end up
inventing mechanisms for syscache and typcache to work with this storage,
which IMO is the biggest pain point around this idea.


Regards,

Atri

Regards,

Atri
*l'apprenant*

Re: [HACKERS] Implementation of global temporary tables?

2015-02-02 Thread Atri Sharma

> Some was changed from 2009:
>
> * We have updatable CTE
>
> * We have unlogged tables
>
> Opened questions:
>
> 1. Name and conformance with standard of this feature - because we doesn't
> support modules, a mapping ANSI -> PG should not be trivial
>
> 2. Implementation
>
> I see three possible ways how to implement it:
>
> 2.a - using on demand created temp tables - most simple solution, but
> doesn't help with catalogue bloating
>
> 2.b - using unlogged tables for holding statistics, relfilenode, and all
> necessary data
>
> 3.c - store ephemeral metadata only in memory without MVCC
>
>
With 2.a, essentially, we are defining a global definition of a temp table,
but the actual per session objects still follow the same rules are our
current temp tables do?

2.b seems like a lot of new data, and if we are defining new paths for e.g.
statistics for global temp tables, we might end up adding new logic in
planner to use those tables. I am not seeing how this will work.

Could you elaborate a bit on 3.c please?

Something that really bothers me here, on a different note, is the catalog
churn this could cause. Of course, you mentioned in as a point in your
email, but I feel that 2.a's showstopper could be the massive catalog churn
it causes. Maybe have a way to manage such tables without getting pg_class
to bloat pretty quickly (I am assuming you do not mean metadata as the
catalog metadata in 3.c).

Regards,

Atri

[HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-06 Thread Atri Sharma

On Tuesday, January 6, 2015, Amit Langote 
wrote:

> On 06-01-2015 PM 04:26, Atri Sharma wrote:
> > On Tue, Jan 6, 2015 at 12:43 PM, Amit Langote <
> langote_amit...@lab.ntt.co.jp 
> >> wrote:
> >> Though, I have no strong opinion on whether one thing is good or the
> >> other or whether they cover some particular use case all the same.
> >> Perhaps you can say that better.
> >>
> >>
> > Personally, I think returning non ordered rows when ORDER BY clause is
> > specifically specified by user is a gross violation of security and could
> > lead to major user application breakdowns, since the application will
> trust
> > that postgres will return the rows in order since ORDER BY was specified.
> > Of course, what Ashutosh suggested makes the patch much simpler, but I
> > would rather not go down that road.
> >
>
> I think the same thing applies to IMMUTABLE declarations for example.
> Planner trusts (or take as a hint) such declarations during, say,
> constraint exclusion where quals involving non-immutable functions are
> kept out of the exclusion proof. If a miscreant declares a non-immutable
> function IMMUTABLE, then constraint violations may ensue simply because
> planner trusted the miscreant. That is, such unsafe restrict clauses
> would wrongly prove a partition as being unnecessary to scan. I am sure
> there are other sites where such bets are made. In that light, I might
> as well call them hints than anything.
>
> 
> The volatility category is a *promise* to the optimizer about the
> behavior of the function
> 
>
> Though, as I said ordering behavior *may not be* a good candidate to
> make such promises.
>
> On the other hand, what such a thing might help with, are the situations
> where a developer is frustrated because planner would ignore (or is
> uninformed about) the order that the developer *knows* his function
> produces.
>
> But, if the node you propose to enforce the order is good enough, then
> it may be worthwhile to go that route, :)
>
>
>
The purpose of the patch is to give the planner an option to use the
preorder that the developer knows will be produced. However, since ensuring
against developer induced errors in this case is relatively cheap, I think
the new node is worth it.


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

On Tue, Jan 6, 2015 at 12:43 PM, Amit Langote  wrote:

> On 06-01-2015 PM 04:08, Atri Sharma wrote:
> > On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote <
> langote_amit...@lab.ntt.co.jp
>
>
> I read what Ashutosh says as that a clause like IMMUTABLE does not
> entail a node execution. Reading manual for CREATE FUNCTION:
>
> 
> IMMUTABLE
> STABLE
> VOLATILE
>
> These attributes inform the query optimizer about the behavior of the
> function.
> 
>
> They declare the shape of the kind of output the function produces and
> planner simply trusts the declaration meaning it does not add a node to
> check if, say, an IMMUTABLE function did not actually modify the
> database or that it is returning the same output for a given input.
>
> Though, I have no strong opinion on whether one thing is good or the
> other or whether they cover some particular use case all the same.
> Perhaps you can say that better.
>
>
Personally, I think returning non ordered rows when ORDER BY clause is
specifically specified by user is a gross violation of security and could
lead to major user application breakdowns, since the application will trust
that postgres will return the rows in order since ORDER BY was specified.
Of course, what Ashutosh suggested makes the patch much simpler, but I
would rather not go down that road.

Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

On Tue, Jan 6, 2015 at 12:47 PM, Pavan Deolasee 
wrote:

>
>
> On Tue, Jan 6, 2015 at 12:38 PM, Atri Sharma  wrote:
>
>>
>>
> May be what Amit has in mind is that the planner can choose the most
> optimal sorting algorithm using the hint that the dataset is probably
> already sorted. Actually why not just do that?  Though AFAIK its against
> our stated policy of not adding any planner hints.
>
>
>
I am not sure I totally understand your point. If the dataset is known to
have exact order configuration as requested, why would we need the sort at
all?

I would rather not mess around with the sort algorithm choosing mechanism
anyways. IIUC, the standard way to hint about any existing order is to make
a path and populate its pathkeys with the corresponding existing order
pathkeys (ORDINALITY is the best example here that I can quote).

This is what the patch does. It does not in anyways force the planner or
give it any planner hints, just an optional Path to evalutate. The Path has
pathkeys specified in ORDER BY clause given at time of creation.

Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

On Tue, Jan 6, 2015 at 12:30 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
> On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma  wrote:
>
>>
>>>>
>>>>
>>>> Even checking whether the output of the function is in the right order
> or not, has its cost. I am suggesting that we can eliminate this cost as
> well. For example, PostgreSQL does not check whether a function is really
> immutable or not.
>
>

That implies possibly returning a non ordered result set even when the user
explicitly specified an ORDER BY clause. If we are depending on an
optimization and it did not work out (even if it is a user error), I think
we should error out indicating that the order was incorrect rather than
returning non ordered rows, which could be disastrous IMO.

Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

On Tue, Jan 6, 2015 at 12:29 PM, Amit Langote  wrote:

> On 06-01-2015 PM 04:00, Ashutosh Bapat wrote:
> > On Tue, Jan 6, 2015 at 12:23 PM, Atri Sharma 
> wrote:
> >>> We can eliminate the new node and put onus or having the right order on
> >>> the user like we do with volatile setting of the function.
> >>>
> >>>
> >>
> >> That is exactly what the new node does, since we are not re sorting
> right
> >> now in case the order is wrong. Please see my explanation upthread,
> >> OrderCheck node's primary purpose is to check for a user error in the
> >> result rows order. The onus right now to give correct order is on user.
> >>
> >>
> > Even checking whether the output of the function is in the right order or
> > not, has its cost. I am suggesting that we can eliminate this cost as
> well.
> > For example, PostgreSQL does not check whether a function is really
> > immutable or not.
> >
>
> Sounds something like ORDERED BY x implying output is "known" ordered by
> x perhaps enough hint for the planner to make necessary plan choices
> though I may be wrong.
>
>
>
I may be missing something, but isnt what you mentioned the exact
functionality this patch adds?

Re: [HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

>
>
>>
>> The overhead of this patch is small. A new path is added for the preorder
>> keys, and OrderCheck node's additional cost is pretty low, given that it
>> only compares two rows and stores only a single row (previous row seen),
>> hence the memory footprint is minuscule.
>>
>>
> We can eliminate the new node and put onus or having the right order on
> the user like we do with volatile setting of the function.
>
>

That is exactly what the new node does, since we are not re sorting right
now in case the order is wrong. Please see my explanation upthread,
OrderCheck node's primary purpose is to check for a user error in the
result rows order. The onus right now to give correct order is on user.

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

On Mon, Jan 5, 2015 at 11:24 PM, Tom Lane  wrote:

> David G Johnston  writes:
> > Atri Sharma wrote
> >> If order of result rows is not the same as required, an error is raised:
> >>
> >> SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
> >> ERROR:  Order not same as specified
>
> > First reaction for the error was unfavorable but (see below) it likely is
> > the best option and does adequately cover the reason for failure -
> > programmer error.
>
> TBH, my first reaction to this entire patch is unfavorable: it's a
> solution in search of a problem.  It adds substantial complication not
> only for users but for PG developers in order to solve a rather narrow
> performance issue.
>

I could agree about the scope of the performance issue, but am not sure
about the added complication. It essentially is similar to, say, a
combination of how Unique is implemented with a flavour or ORDINALITY
implementation. A new path that is added in a certain number of cases plus
a low overhead node does not seem too bad to me IMO. This is inline with a
lot of real world cases I have seen, where the data is *bubbled* up to
SRFs, which does give a possibility of an existing order. Couple it with
the power to specify ORDER BY in your SRF function and you could save a lot.

I am not sure how it complicates for hackers.  Could you please elaborate a
bit?

> What would make sense to me is to teach the planner about inlining
> SQL functions that include ORDER BY clauses, so that the performance
> issue of a double sort could be avoided entirely transparently to
> the user.
>

It sounds good, but inlining in current way shall restrict the scope of
optimization (which is not applicable for current design). For eg, you
cannot inline RECORD returning SRFs...

-- 
Regards,

Atri
*l'apprenant*

[HACKERS] Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-05 Thread Atri Sharma

Hi All,

Please forgive if this is a repost.

Please find attached patch for supporting ORDER BY clause in CREATE
FUNCTION for SRFs. Specifically:

CREATE OR REPLACE FUNCTION func1(OUT e int, OUT f int) returns setof record
as ' SELECT a,b FROM table1 ORDER BY a; ' language 'sql' ORDER BY e;

This shall allow for capturing information about existing preorder that
might be present inherently in the SRF's input or algorithm (the above
example and think generate_series).

This allows for eliminating sorts that can be based off the known existing
preorder. For eg:

SELECT * FROM correct_order_singlecol() ORDER BY e; # Does not need to sort
by e since  existing preorder is known.

Eliminating such sorts can be a huge gain, especially if the expected input
to needed Sort node is large.

The obvious question that comes is what happens if specified ORDER BY
clause is false. For checking the order, a new node is added which is top
node of the plan and is responsible for projecting result rows. It tracks
the previous row seen and given a sort order, ensures that the current
tuple to be projected is in the required sort order.

So, for above example

EXPLAIN (COSTS OFF) SELECT * FROM correct_order_multicol() ORDER BY e;
  QUERY PLAN
---
 OrderCheck
   ->  Function Scan on correct_order_multicol
(2 rows)


If order of result rows is not the same as required, an error is raised:

SELECT * FROM incorrect_order_nulls() ORDER BY e NULLS LAST;
ERROR:  Order not same as specified



Preorder columns are first transformed into SortGroupClauses first and then
stored directly in pg_proc.


This functionality is a user case seen functionality, and is especially
useful when SRF inputs are large and/or might be pipelined from another
function (SRFs are used in pipelines in analytical systems many times, with
large data).

The overhead of this patch is small. A new path is added for the preorder
keys, and OrderCheck node's additional cost is pretty low, given that it
only compares two rows and stores only a single row (previous row seen),
hence the memory footprint is minuscule.

In the inner joins thread, Tom mentioned having a new node which has
multiple plans and executor can decide which plan to execute given runtime
conditions. I played around with the idea, and am open to experiment having
a new node which has a Sort based plan and is executed in case OrderCheck
node sees that the inherent order of result tuples is not correct. Feedback
here would be very welcome.


I will add the patch to current commitfest.

Thoughts?

Regards,

Atri


orderbycreatefuncver1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-31 Thread Atri Sharma

   ChainAggregate is

> a bit like a node having two parents, a Sort and a GroupAggregate.
> However,
> the graph edge between ChainAggregate and its GroupAggregate is a
> tuplestore
> instead of the usual, synchronous ExecProcNode().
>

Well, I dont buy the two parents theory. The Sort nodes are intermediately
stacked amongst ChainAggregate nodes, so there is still the single edge.
However, as you rightly said, there is a shared tuplestore, but note that
only the head of chain ChainAggregate has the top GroupAggregate as its
parent.

>
> Suppose one node orchestrated all sorting and aggregation.  Call it a
> MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
> performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
> would directly manage two Tuplesortstate, CUR and NEXT.  The node would
> have
> an initial phase similar to ExecSort(), in which it drains the outer node
> to
> populate the first CUR.  After that, it looks more like
> agg_retrieve_direct(),
> except that CUR is the input source, and each tuple drawn is also put into
> NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
> This
> design does not add I/O consumption or require a nonstandard communication
> channel between executor nodes.  Tom, Andrew, does that look satisfactory?
>
>
So you are essentially proposing merging ChainAggregate and its
corresponding Sort node?

So the structure would be something like:

GroupAggregate
--> MultiGroupAgg (a,b)
> MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level
GroupAggregate node projects the result of the entire operation. The key to
ChainAggregate nodes is that each ChainAggregate node handles grouping sets
that fit a single ROLLUP list i.e. can be done by a single sort order.
There can be multiple lists of this type in a single GS operation, however,
our current design has only a single top GroupAggregate node but a
ChainAggregate node + Sort node per sort order. If you are proposing
replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
with a single MultiGroupAggregate node, I am not able to understand how it
will handle all the multiple sort orders. If you are proposing replacing
only ChainAggregate + Sort node with a single MultiGroupAgg node, that
still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please
correct me if I am wrong.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] mysql with postgres

2014-12-23 Thread Atri Sharma

On Tue, Dec 23, 2014 at 3:06 PM, Ravi Kiran 
wrote:

> hi all,
>
>
> Is postgres source code compatible with mysql database?? If it is, could
> someone could give me some links so that I can do that.
>
> I want to hack into the postgres source code, but as I am comfortable with
> mysql, I want to use the mysql database not postgres.
>
> any references would be fine.
>
> thank you
>

Eh...what?

Re: [HACKERS] Final Patch for GROUPING SETS

2014-12-22 Thread Atri Sharma

On Tuesday, December 23, 2014, Robert Haas  wrote:

> On Mon, Dec 22, 2014 at 11:19 AM, Andrew Gierth
> > wrote:
> >  Tom> The other reason that's a bad comparison is that I've not seen
> >  Tom> many queries that use more than a couple of window frames,
> >  Tom> whereas we have to expect that the number of grouping sets in
> >  Tom> typical queries will be significantly more than "a couple".
> >
> > I would be interested in seeing more good examples of the size and
> > type of grouping sets used in typical queries.
>
> From what I have seen, there is interest in being able to do things
> like GROUP BY CUBE(a, b, c, d) and have that be efficient.  That will
> require 16 different groupings, and we really want to minimize the
> number of times we have to re-sort to get all of those done.  For
> example, if we start by sorting on (a, b, c, d), we want to then make
> a single pass over the data computing the aggregates with (a, b, c,
> d), (a, b, c), (a, b), (a), and () as the grouping columns.
>
>
>
That is what ChainAggregate node does exactly. A set of orders that fit in
a single ROLLUP list (like your example) are processed in a single go.


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Commitfest problems

2014-12-18 Thread Atri Sharma

On Thu, Dec 18, 2014 at 8:44 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Wed, Dec 17, 2014 at 5:00 PM, Stephen Frost  wrote:
> >
> >
> > Another thought I had was to suggest we consider *everyone* to be a
> > contributor and implement a way to tie together the mailing list
> > archives with the commit history and perhaps the commitfest app and make
> > it searchable and indexed on some website.  eg:
> >
> > contributors.postgresql.org/sfrost
> >   - Recent commits
> >   - Recent commit mentions
> >   - Recent emails to any list
> >   - Recent commitfest app activity
> >   - Recent wiki page updates
> > ...
> >
> > Ideally with a way for individuals to upload a photo, provide a company
> > link, etc, similar to what the existing Major Contributors have today.
> > Obviously, this is not a small task to develop and there is some risk of
> > abuse (which I expect the other folks on the infra team will point out
> > and likely tar and feather me for suggesting this at all..) but it might
> > be along the same lines as Bruce's PgLife..
> >
>


+1

It does feel good to be acknowledged for our work especially when there is
a policy to acknowledge this in our community.

Regards,

Atri

Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Atri Sharma

On Wed, Dec 17, 2014 at 7:18 PM, Simon Riggs  wrote:
>
> On 17 December 2014 at 12:35, Kouhei Kaigai  wrote:
>
> > Its concept is good to me. I think, the new combined function should be
> > responsible to take a state data type as argument and update state object
> > of the aggregate function. In other words, combined function performs
> like
> > transition function but can update state object according to the summary
> > of multiple rows. Right?
>
> That wasn't how I defined it, but I think your definition is better.
>
> > It also needs some enhancement around Aggref/AggrefExprState structure to
> > inform which function shall be called on execution time.
> > Combined functions are usually no-thank-you. AggrefExprState updates its
> > internal state using transition function row-by-row. However, once
> someone
> > push-down aggregate function across table joins, combined functions have
> > to be called instead of transition functions.
> > I'd like to suggest Aggref has a new flag to introduce this aggregate
> expects
> > state object instead of scalar value.
> >
> > Also, I'd like to suggest one other flag in Aggref not to generate final
> > result, and returns state object instead.
> >
> > Let me use David's example but little bit adjusted.
> >
> > original)
> > SELECT p.name, AVG(s.qty)
> >   FROM sales s INNER JOIN product p ON s.product_id = s.product_id
> >   GROUP BY p.name;
> >
> > modified)
> > SELECT p.name, AVG(qty)
> >   FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY
> product_id) s
> >INNER JOIN product p
> >   ON p.product_id = s.product_id GROUP BY p_name;
> >
> > Let's assume the modifier set a flag of use_combine_func on the AVG(qty)
> of
> > the main query, and also set a flag of not_generate_final on the
> AVG(qty) of
> > the sub-query.
> > It shall work as we expected.
>
> That matches my thinking exactly.
>
> David, if you can update your patch with some docs to explain the
> behaviour, it looks complete enough to think about committing it in
> early January, to allow other patches that depend upon it to stand a
> chance of getting into 9.5. (It is not yet ready, but I see it could
> be).
>
> The above example is probably the best description of the need, since
> user defined aggregates must also understand this.
>
> Could we please call these "combine functions" or other? MERGE is an
> SQL Standard statement type that we will add later, so it will be
> confusing if we use the "merge" word in this context.
>
> David, your patch avoids creating any mergefuncs for existing
> aggregates. We would need to supply working examples for at least a
> few of the builtin aggregates, so we can test the infrastructure. We
> can add examples for all cases later.
>
>
>
I am still missing how and why we skip transfns. What am I missing,please?

Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Atri Sharma

On Wed, Dec 17, 2014 at 6:05 PM, Kouhei Kaigai  wrote:
>
> Simon,
>
> Its concept is good to me. I think, the new combined function should be
> responsible to take a state data type as argument and update state object
> of the aggregate function. In other words, combined function performs like
> transition function but can update state object according to the summary
> of multiple rows. Right?
>
> It also needs some enhancement around Aggref/AggrefExprState structure to
> inform which function shall be called on execution time.
> Combined functions are usually no-thank-you. AggrefExprState updates its
> internal state using transition function row-by-row. However, once someone
> push-down aggregate function across table joins, combined functions have
> to be called instead of transition functions.
> I'd like to suggest Aggref has a new flag to introduce this aggregate
> expects
> state object instead of scalar value.
>
> Also, I'd like to suggest one other flag in Aggref not to generate final
> result, and returns state object instead.
>
>
>
So are you proposing not calling transfuncs at all and just use combined
functions?

That sounds counterintuitive to me. I am not able to see why you would want
to avoid transfns totally even for the case of pushing down aggregates that
you mentioned.

>From Simon's example mentioned upthread:

PRE-AGGREGATED PLAN
Aggregate
-> Join
 -> PreAggregate (doesn't call finalfn)
  -> Scan BaseTable1
 -> Scan BaseTable2

finalfn wouldnt be called. Instead, combined function would be responsible
for getting preaggregate results and combining them (unless of course, I am
missing something).

Special casing transition state updating in Aggref seems like a bad idea to
me. I would think that it would be better if we made it more explicit i.e.
add a new node on top that does the combination (it would be primarily
responsible for calling combined function).

Not a good source of inspiration, but seeing how SQL Server does it
(Exchange operator + Stream Aggregate) seems intuitive to me, and having
combination operation as a separate top node might be a cleaner way.

I may be wrong though.

Regards,

Atri

Re: [HACKERS] Combining Aggregates

2014-12-17 Thread Atri Sharma

On Wed, Dec 17, 2014 at 3:23 PM, Simon Riggs  wrote:
>
> KaiGai, David Rowley and myself have all made mention of various ways
> we could optimize aggregates.
>
> Following WIP patch adds an extra function called a "combining
> function", that is intended to allow the user to specify a
> semantically correct way of breaking down an aggregate into multiple
> steps.
>
> Gents, is this what you were thinking? If not...
>
>
>
A quick look at the patch makes me assume that the patch does not handle
the problem of combining transvals or move at all in that direction (which
is fine, just reconfirming).

So, essentially, we are adding a "grand total" on top of individual sum()
or count() operations,right?

Also, should we not have a sanity check for the user function provided?

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Atri Sharma

On Wed, Dec 17, 2014 at 12:03 AM, Stephen Frost  wrote:
>
> David,
>
> * David Rowley (dgrowle...@gmail.com) wrote:
> > I'd just like to add something which might be flying below the radar of
> > more senior people. There are people out there  (ike me)  working on
> > PostgreSQL more for the challenge and perhaps the love of the product,
> who
> > make absolutely zero money out of it. For these people getting credit
> where
> > it's due is very important. I'm pretty happy with this at the moment and
> I
> > can't imagine any situation where not crediting reviewers would be
> > beneficial to anyone.
>
> Thanks for this.  One question which has been brought up in the past is
> the specific "level" of credit.  That you're happy with the credit
> you've been given thus far is great as it happens to support the side
> that I'm on. :D
>
> However, could you quantify what, exactly, you feel is approrpiate
> credit for reviewers and authors..?  I'll intentionally omit the options
> that have been presented in the past to try and avoid influencing your
> response.
>
>
>
Mentioning them in list of contributors, for one.

Re: [HACKERS] Support UPDATE table SET(*)=...

2014-12-15 Thread Atri Sharma

On Mon, Dec 8, 2014 at 6:06 AM, Michael Paquier 
wrote:
>
> On Wed, Nov 26, 2014 at 4:26 AM, Tom Lane  wrote:
> > I think what's likely missing here is a clear design for the raw parse
> > tree representation (what's returned by the bison grammar).  The patch
> > seems to be trying to skate by without creating any new parse node types
> > or fields, but that may well be a bad idea.  At the very least there
> > needs to be some commentary added to parsenodes.h explaining what the
> > representation actually is (cf commentary there for MultiAssignRef).
> >
> > Also, I think it's a mistake not to be following the MultiAssignRef
> > model for the case of "(*) = ctext_row".  We optimize the ROW-source
> > case at the grammar stage when there's a fixed number of target columns,
> > because that's a very easy transformation --- but you can't do it like
> > that when there's not.  It's possible that that optimization should be
> > delayed till later even in the existing case; in general, optimizing
> > in gram.y is a bad habit that's better avoided ...
> Marking as "returned with feedback" based on those comments.
>
>
Thank you everybody for your comments.

I am indeed trying to avoid a new node since my intuitive feeling says that
we do not need a new node for a functionality which is present in other
commands albeit in different form. However, I agree that documentation
explaining parse representation is lacking and shall improve that. Also, in
accordance to Tom's comments, I shall look for not touching target list
directly and follow the path of MultiAssignRef.

I have moved patch to current CF and marked it as Waiting on Author since I
plan to resubmit after addressing the concerns.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:47 PM, Tom Lane  wrote:

> Atri Sharma  writes:
> > On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane  wrote:
> >> I think the right approach for now is to emulate the GEQO precedent as
> >> closely as possible.  Build all the single-relation Paths the same as
> >> now, then do a join search over all the relations, then (if we've
> noticed
> >> that some joins are potentially removable) do another join search over
> >> just the nonremovable relations.
>
> > How about using geqo more liberally when replanning (decrease the number
> of
> > relations in join before geqo is hit?)
>
> This is going to be quite difficult enough without overcomplicating it.
> Or as a wise man once said, "premature optimization is the root of all
> evil".  Get it working in the basic way and then see if improvement is
> necessary at all.
>
>
Sure, I can take a crack at it since I am working on a patch that does
require this alternative path approach. Let me try something and report my
experimental results.

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane  wrote:

> Stephen Frost  writes:
> > * Atri Sharma (atri.j...@gmail.com) wrote:
> >> Agreed, but in some cases, we could possibly make some assumptions (if
> >> there is no index, if a large fraction of table will be returned in
> scan,
> >> FunctionScan).
>
> > All neat ideas but how about we get something which works in the way
> > being asked for before we start trying to optimize it..?  Maybe I'm
> > missing something, but getting all of this infrastructure into place and
> > making sure things aren't done to the plan tree which shouldn't be (or
> > done to all of them if necessary..) is enough that we should get that
> > bit done first and then worry if there are ways we can further improve
> > things..
>
> Yeah; moreover, there's no evidence that hard-wiring such assumptions
> would save anything.  In the example of a FunctionScan, guess what:
> there's only one Path for that relation anyway.
>
> That is precisely what I meant :) I guess I was being too over cautious
and even trying to save the time spent in evaluating whatever paths we have
and building new FunctionScan paths...


> I think the right approach for now is to emulate the GEQO precedent as
> closely as possible.  Build all the single-relation Paths the same as
> now, then do a join search over all the relations, then (if we've noticed
> that some joins are potentially removable) do another join search over
> just the nonremovable relations.
>

How about using geqo more liberally when replanning (decrease the number of
relations in join before geqo is hit?)



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:27 PM, Stephen Frost  wrote:

> * Atri Sharma (atri.j...@gmail.com) wrote:
> > Agreed, but in some cases, we could possibly make some assumptions (if
> > there is no index, if a large fraction of table will be returned in scan,
> > FunctionScan).
>
> All neat ideas but how about we get something which works in the way
> being asked for before we start trying to optimize it..?  Maybe I'm
> missing something, but getting all of this infrastructure into place and
> making sure things aren't done to the plan tree which shouldn't be (or
> done to all of them if necessary..) is enough that we should get that
> bit done first and then worry if there are ways we can further improve
> things..
>
>
>
Right,sorry for digressing.

I think we are in agreement as to what needs to be done (start with a plan,
note ideas and replan if necessary). The idea of executor modifying the
plan (or personally, even choosing the plan) seems counterintuitive.

Does it also make sense to recalculate the costs from scratch for the
replan? It might be, I am just asking.

Regards,

Atri

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:19 PM, Tom Lane  wrote:

> Atri Sharma  writes:
> > Is it possible to only replan part of the plan in case of this
> > optimization? I think that we might need to only replan parts of the
> > original plan (as you mentioned, join search and above). So we could
> reuse
> > the original plan in part and not do a lot of replanning (an obvious case
> > is scan strategy, which we can assume will not change for the two plans).
>
> I think you assume wrong; or at least, I certainly would not wish to
> hard-wire any such assumption.  Skipping some joins could change the
> shape of the join tree *completely*, because the cost estimates will
> change so much.  And that could in turn lead to making different choices
> of scan methods, eg, we might or might not care about sort order of
> a scan result if we change join methods.
>
> regards, tom lane
>

Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 11:03 PM, Tom Lane  wrote:

> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> However, even granting that that is a concern, so what?  You *have* to
> >> do the planning twice, or you're going to be generating a crap plan for
> >> one case or the other.
>
> > Yeah, I don't see a way around that..
>
> Also, it occurs to me that it's only necessary to repeat the join search
> part of the process, which means that in principle the mechanisms already
> exist for that; see GEQO.  This means that for small join problems, the
> total planning time would much less than double anyway.  For large
> problems, where the join search is the bulk of the time, we could hope
> that removal of unnecessary joins would reduce the join search runtime
> enough that the second search would be pretty negligible next to the
> first (which is not optional).  So I think "it'll double the runtime"
> is an unfounded objection, or at least there's good reason to hope it's
> unfounded.
>
>
Is it possible to only replan part of the plan in case of this
optimization? I think that we might need to only replan parts of the
original plan (as you mentioned, join search and above). So we could reuse
the original plan in part and not do a lot of replanning (an obvious case
is scan strategy, which we can assume will not change for the two plans).

I wonder if we could have a rule based system for replacement of some plan
nodes with other type of nodes. As we discover more cases like this, we
could add more rules. Wild thought though.


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 8:32 PM, Stephen Frost  wrote:

> * Atri Sharma (atri.j...@gmail.com) wrote:
> > So the planner keeps all possibility satisfying plans, or it looks at the
> > possible conditions (like presence of foreign key for this case, for eg)
> > and then lets executor choose between them?
>
> Right, this was one of the thoughts that I had.
>
> > So is the idea essentially making the planner return a set of "best"
> plans,
> > one for each condition? Are we assured of their optimality at the local
> > level i.e. at each possibility?
>
> We *already* have an idea of there being multiple plans (see
> plancache.c).
>
>
> Thanks for pointing me there.

What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.

My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing INNER JOINs

2014-12-03 Thread Atri Sharma

On Wed, Dec 3, 2014 at 5:00 PM, Simon Riggs  wrote:

> On 3 December 2014 at 09:29, David Rowley  wrote:
> > *** Method 3: Marking scans as possibly skippable during planning and
> > removing redundant join nodes at executor startup (Simon's method)
> >
> > Pros:
> > 1. The plan can be executed as normal if there are any foreign key
> triggers
> > pending.
> > 2. Does not require extra code in all join types  (see cons #2 above)
> > 3. Does not suffer from extra node visiting overhead (see cons #3 above)
> >
> > Cons:
> > 1. Executor must modify the plan.
> > 2. Planner may have generated a plan which is not optimal for
> modification
> > by the executor (e.g. Sort nodes for merge join, or index scans for
> > pre-sorted input won't become seqscans which may be more efficient as
> > ordering may not be required after removing a merge join)
> >
> > With each of the methods listed above, someone has had a problem with,
> and
> > from the feedback given I've made changes based and ended up with the
> next
> > revision of the patch.
> >
> > Tom has now pointed out that he does not like the executor modifying the
> > plan, which I agree with to an extent as it I really do hate the extra
> > useless nodes that I'm unable to remove from the plan.
>
> I guess we need an Option node. Tom and I discussed that about an aeon ago.
>
> The Option node has a plan for each situation. At execution time, we
> make the test specified in the plan and then select the appropriate
> subplan.
>
> That way we can see what is happening in the plan and the executor
> doesn't need to edit anything.
>

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

So is the idea essentially making the planner return a set of "best" plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?

IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)

I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-12 Thread Atri Sharma

On Wed, Nov 12, 2014 at 1:24 PM, David Rowley  wrote:

>
> On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs 
> wrote:
>
>>
>> This plan type is widely used in reporting queries, so will hit the
>> mainline of BI applications and many Mat View creations.
>> This will allow SELECT count(*) FROM foo to go faster also.
>>
>>
> We'd also need to add some infrastructure to merge aggregate states
> together for this to work properly. This means that could also work for
> avg() and stddev etc. For max() and min() the merge functions would likely
> just be the same as the transition functions.
>
>
It might make sense to make a new planner operator which can be responsible
for pulling from each of the individual parallel Agg nodes and then
aggregating over the results.

A couple of things that might be worth considering are whether we want to
enforce using parallel aggregation or let planner decide if it wants to do
a parallel aggregate or go with a single plan. For eg, the average
estimated size of groups might be one thing that planner may consider while
deciding between a parallel and a single execution plan.

I dont see merging states as an easy problem, and should perhaps be tackled
apart from this thread.

Also, do we want to allow parallelism only with GroupAggs?

Regards,

Atri

Re: [HACKERS] Representing a SRF return column in catalogs

2014-11-07 Thread Atri Sharma

On Saturday, November 8, 2014, Tom Lane  wrote:

>
> I'm confused too.  The original example seemed to imagine that details
> of a query (not the function, but the calling query) would be stored in
> the catalogs, which is completely nuts.
>
> pg_proc already has provisions to remember the names of output parameters
> of a function, but it seems like you want something else than that, only
> it's not very clear what.  Are you trying to say that you'd like to
> represent the sort order of the output of a function?  If so, you'd have
> to add new columns to pg_proc for that, but I can't see why we'd represent
> that information in terms of column names.  A column number and a sort
> operator would make more sense.
>
>
Exactly. I would like to represent the sort order of the output.

Thanks for catching it, I really need to stop writing emails without
drinking coffee...


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Representing a SRF return column in catalogs

2014-11-07 Thread Atri Sharma

On Fri, Nov 7, 2014 at 7:15 PM, Robert Haas  wrote:

> On Wed, Nov 5, 2014 at 8:24 AM, Atri Sharma  wrote:
> > I am working on something that requires representing a SRF return column
> in
> > pg_proc and being able to retrieve it, retrieve the name of the column
> and
> > make a ColumnRef node from it.
> >
> > The catch here are aliases:
> >
> > SELECT generate_series(1,100) AS a ORDER BY a;
> >
> > I need to know that the return column of generate_series is being
> referred
> > as a in this query (hence tlist, sortClause will have 'c').
> >
> > I discussed and think that a way can be to have position of column rather
> > than the actual name but that raises the question of the functionality
> > needed for back resolving position to name (to make a ColumnRef node) and
> > then infer that the column is being referred as an alias in this query.
>
> It's not clear to me what you are trying to do, so I can't give you any
> advice.

Let me give an example:

Consider an user defined SRF (or an inbuilt one (generate_series)). I am
working on a path keys tracking project (more details on it in a separate
email). I am interested in one of the columns of the result of the SRF and
want to store it in catalogs in a manner that allows me to refer it later
when executing the SRF.

One way can be to store the raw column name. However, I am not sure how
will that work around aliases without a considerable fiddling with Alias
nodes in parsetime.

Can I store relattnos or something? I need to get the stored att in planner
and build pathkeys from it.

Please advice.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

[HACKERS] Representing a SRF return column in catalogs

2014-11-05 Thread Atri Sharma

Hi,

I am working on something that requires representing a SRF return column in
pg_proc and being able to retrieve it, retrieve the name of the column and
make a ColumnRef node from it.

The catch here are aliases:

SELECT generate_series(1,100) AS a ORDER BY a;

I need to know that the return column of generate_series is being referred
as a in this query (hence tlist, sortClause will have 'c').

I discussed and think that a way can be to have position of column rather
than the actual name but that raises the question of the functionality
needed for back resolving position to name (to make a ColumnRef node) and
then infer that the column is being referred as an alias in this query.

Any pointers will be really helpful.

Regards,

Atri

Re: [HACKERS] How to implent CONVERT ( data_type [ ( length ) ] , expression ) function in postgreSQL

2014-11-02 Thread Atri Sharma

On Sun, Nov 2, 2014 at 3:40 PM, rohtodeveloper 
wrote:

>
> Dear
>
>
> In SQLServer, there'are two functions to converte an expression of one
> data type to another.
>
> 1. CAST ( expression AS data_type [ ( length ) ] )
> 2. CONVERT ( data_type [ ( length ) ] , expression )
>
> However, In PostgreSQL, there's only the CAST ( expression AS data_type [
> ( length ) ] ) function. I have tried the following two ways to implenting
> the CONVERT ( data_type [ ( length ) ] , expression ) function, but both
> are failed.
>
> 1. CREATE FUNCTION .
> The function's arguments can only be expressions but not data_type .
> 2. Modifying the gram.y .
> The CONVERT ( data_type [ ( length ) ] , expression ) is in grammer
> conflict with the PostgreSQL self's
> convert(data,src_encoding_name,dest_encoding_name) function. And the
> PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name)
> function cannot be used.
>
> I wonder whether there's a better way to solve this problem.
> Any help will be appreciated.
> Please do not cross post to multiple lists.
>
>

Please do not cross post to various lists.

For the options you suggested:

1) Pass in datatype as string and deparse and process in the function.
2) Are you referring to pg_convert here?

IMO I do not understand why you need the convert function in the first
place. You may want to refer to
http://www.postgresql.org/docs/9.3/static/typeconv.html

Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-27 Thread Atri Sharma

>
>
>
> IMHO writing all the data into a WAL would be the cleanest solution.
>
> Also, what is a small database? I don't think a static value will work,
> because the sweet spot between the current approach (forcing two
> checkpoints) and writing everything in WAL depends on the amount of dirty
> buffers that need to be checkpointed. Which is mostly driven by the size
> of shared buffers and write activity - for small shared buffers and/or
> mostly-read workload, checkpoints are cheap, so the 'small database'
> threshold (when choosing the WAL approach) is much higher than for large
> shared buffers or write-heavy workloads.
>

So are you proposing having a heuristic based on the amount of data in
shared buffers and write activity? Do you have something in mind that works
for general workloads as well?


> So maybe if we could determine the amount of data to be checkpointed, and
> then base the decision on that, that'd work better? This would also have
> to take into account that writing into WAL is sequential, while
> checkpoints usually cause random writes all over the datafiles (which is
> more expensive).
>
> Another option might be forcing just a "spread" checkpoint, not the
> immediate one (which is what we do now). That would not fix the CREATE
> DATABASE duration (actually, it would make it longer), but it would lower
> the impact on other activity on the machine.
>
>
>
I believe this to be the cleanest way to reduce the amount of I/O
generated. If I understand correctly, the original problem you mentioned
was not the time CREATE DATABASE is taking but rather the amount of I/O
each one is generating.

This also leads me to think if it makes sense to explore group commits
around the creation of files for a new database (for a same backend, of
course). This might be on call, if the user knows he/she is going to create
a lot of databases in the near future and is fine with a large spike in I/O
at one go. Again, might be even more broken than the current scenario, but
depends on what the user wants...

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-27 Thread Atri Sharma

On Mon, Oct 27, 2014 at 4:44 PM, Heikki Linnakangas  wrote:

> On 10/27/2014 01:06 PM, Atri Sharma wrote:
>
>>
>>>
>>>
>>>>  To solve #1, we could redesign CREATE DATABASE so that replaying the
>>> DBASE_CREATE record doesn't zap the old directory, and also doesn't copy
>>> any files. We could instead just assume that if the transaction commits,
>>> all the files have been copied and fsync'd already, like we assume that
>>> if
>>> a CREATE INDEX commits in wal_level=minimal, the underlying file was
>>> fsync'd before the commit.
>>>
>>>
>> Do you mean that during a recovery, we just let the database directory be
>> and assume that it is in good shape since the transaction committed
>> originally?
>>
>
> Right.

It does make sense, however, with the checkpoint after creating the files
gone, the window between the creation of files and actual commit might be
increased, increasing the possibility of a crash during that period and
causing an orphan database. However, my understanding of the consequences
of removing the checkpoint might be incorrect, so my fears might be wrong.

Regards,

Atri

Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-27 Thread Atri Sharma

>
>
>>
> To solve #1, we could redesign CREATE DATABASE so that replaying the
> DBASE_CREATE record doesn't zap the old directory, and also doesn't copy
> any files. We could instead just assume that if the transaction commits,
> all the files have been copied and fsync'd already, like we assume that if
> a CREATE INDEX commits in wal_level=minimal, the underlying file was
> fsync'd before the commit.
>

Do you mean that during a recovery, we just let the database directory be
and assume that it is in good shape since the transaction committed
originally?


> I wonder if we should bite the bullet and start WAL-logging all the files
> that are copied from the template database to the new database. When the
> template database is small (template0 is 6.4MB currently), that wouldn't
> generate too much WAL. We could perhaps do that only if the template
> database is small, and do the checkpoints otherwise, although I wouldn't
> like to have subtly different behavior depending on database size like that.


For the sort of workload Tomas described above (creating a lot of databases
on the fly), we may end up with a lot of WAL eventually if we do this.

Regards,

Atri

Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-17 Thread Atri Sharma

On Fri, Oct 17, 2014 at 7:45 PM, Merlin Moncure  wrote:

> On Wed, Oct 15, 2014 at 3:48 AM, Atri Sharma  wrote:
> >
> >
> > On Wednesday, October 15, 2014, Marti Raudsepp  wrote:
> >>
> >> Hi
> >>
> >> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma 
> wrote:
> >> > Please find attached a patch which implements support for UPDATE
> table1
> >> > SET(*)=...
> >>
> >> I presume you haven't read Tom Lane's proposal and discussion about
> >> multiple column assignment in UPDATE:
> >> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
> >> (Assigning all columns was also discussed there)
> >>
> >> And there's a WIP patch:
> >> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
> >
> > Thanks for the links, but this patch only targets SET(*) case, which, if
> I
> > understand correctly, the patch you mentioned doesn't directly handle
> (If I
> > understand correctly, the target of the two patches is different).
>
> Yeah -- in fact, there was some discussion about this exact case.
> This patch solves a very important problem: when doing record
> operations to move data between databases with identical schema
> there's currently no way to 'update' in a generic way without building
> out the entire field list via complicated and nasty dynamic SQL.


Thanks!


> I'm
> not sure about the proposed syntax though; it seems a little weird to
> me.  Any particular reason why you couldn't have just done:
>
> UPDATE table1 SET * = a,b,c, ...
>
> also,
>
> UPDATE table1 t SET t = (SELECT (a,b,c)::t FROM...);
>

I honestly have not spent a lot of time thinking about the exact syntax
that may be acceptable. If we have arguments for or against a specific
syntax, I will be glad to incorporate them.

>
>

Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma

On Wed, Oct 15, 2014 at 2:18 PM, Atri Sharma  wrote:




>
> On Wednesday, October 15, 2014, Marti Raudsepp  wrote:
>
>> Hi
>>
>> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma 
>> wrote:
>> > Please find attached a patch which implements support for UPDATE table1
>> > SET(*)=...
>>
>> I presume you haven't read Tom Lane's proposal and discussion about
>> multiple column assignment in UPDATE:
>> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
>> (Assigning all columns was also discussed there)
>>
>> And there's a WIP patch:
>> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
>>
>>
>>
> Thanks for the links, but this patch only targets SET(*) case, which, if I
> understand correctly, the patch you mentioned doesn't directly handle (If I
> understand correctly, the target of the two patches is different).
>
>
Digging more, I figured that the patch I posted builds on top of Tom's
patch,  since it did not add whole row cases.

Regards,

Atri

Re: [HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma

On Wednesday, October 15, 2014, Marti Raudsepp  wrote:

> Hi
>
> On Wed, Oct 15, 2014 at 11:02 AM, Atri Sharma  > wrote:
> > Please find attached a patch which implements support for UPDATE table1
> > SET(*)=...
>
> I presume you haven't read Tom Lane's proposal and discussion about
> multiple column assignment in UPDATE:
> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
> (Assigning all columns was also discussed there)
>
> And there's a WIP patch:
> http://www.postgresql.org/message-id/20930.1402931...@sss.pgh.pa.us
>
>
>
Thanks for the links, but this patch only targets SET(*) case, which, if I
understand correctly, the patch you mentioned doesn't directly handle (If I
understand correctly, the target of the two patches is different).

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

[HACKERS] Support UPDATE table SET(*)=...

2014-10-15 Thread Atri Sharma

Hi All,

Please find attached a patch which implements support for UPDATE table1
SET(*)=...
The patch supports both UPDATE table SET(*)=(a,b,c) and UPDATE table1
SET(*)=(SELECT a,b,c FROM...).
It solves the problem of doing UPDATE from a record variable of the same
type as the table e.g. update foo set (*) = (select foorec.*) where ...;

The design is simple. It basically expands the * in transformation stage,
does the necessary type checking and adds it to the parse tree. This allows
for normal execution for the rest of the stages.


Thoughts/Comments?

Regards,

Atri


updatestar_ver1.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proper query implementation for Postgresql driver

2014-09-29 Thread Atri Sharma

On Tue, Sep 30, 2014 at 11:06 AM, Tom Lane  wrote:

> Shay Rojansky  writes:
> > Thanks for the suggestion to look at PgJDBC, I'll do that.
>
> BTW, libpqtypes (http://libpqtypes.esilo.com) might be worth
> studying as well.  I've not used it myself, but it claims to
> offer datatype-extensible processing of binary formats.
>
> regards, tom lane
>
>
> --
>
>
It does offer that, and is indeed a good idea in the current context.

Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-14 Thread Atri Sharma

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> On 14/09/14 19:25, Atri Sharma wrote:
>
>>
>>
>> On Sunday, September 14, 2014, Mark Kirkwood
>> mailto:mark.kirkw...@catalyst.net.nz>>
>>
>> wrote:
>>
>> On 14/09/14 05:36, Rohit Goyal wrote:
>>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and
>> update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different
>> version
>> of salary.
>>
>> I want to select  salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I
>> want to
>> know the code portion which i need to look for working on all 4
>> versions
>> in dataset. :)
>>
>>
>>
>> Hi Rohit,
>>
>> Currently in Postgres, these intermediate versions all exist -
>> however a given session can only see one of them. Also VACUUM is
>> allowed to destroy versions that no other transactions can see.
>>
>> So if I'm understanding you correctly, you would like to have some
>> way for a session to see *all* these versions (and I guess
>> preventing VACUUM from destroying them).
>>
>>
>>
>> Any modifications of that sort are bound to introduce lots of pain, not
>> to mention performance degradation and the added responsibility of
>> ensuring that dead tuples don't bloat up the system (prevent vacuum from
>> running at regular intervals and you can have a xid wraparound).
>>
>> I just mentioned that in case you are planning to go in that direction.
>> If you only want the data, use the triggers as Gavin mentioned.
>>
>>
> Obviously in the general case sure - but (as yet) we don't have much idea
> about Rohit's use case and workload. If retrieving past versions is the
> *primary* workload bias and high update concurrency is not required then
> this could well work better than a trigger based solution.
>
> And it does not seem too onerous to have the ability to switch this on as
> required, viz:
>
> ALTER TABLE table1 VERSIONING;
>
> (or similar syntax) which makes VACUUM leave this table alone. It might
> make more sense to make such a concept apply to a TABLESPACE instead mind
> you (i.e things in here are for archive/versioning purposes)...
>
>


What I think can be done is have a tuplestore which has the delta of
updated rows i.e. only have the changes made in an update statement stored
in a tuplestore (it could be a part of RelationData). It should be simple
enough to have tuplestore store the oid of the inserted tuple and the
difference between new tuple and the old tuple. No changes need to be done
for old tuple since it can be marked as deleted and VACUUM can remove it as
normal logic.

Not a clean way, but should work for what you proposed.

Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-14 Thread Atri Sharma

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> On 14/09/14 19:25, Atri Sharma wrote:
>
>>
>>
>> On Sunday, September 14, 2014, Mark Kirkwood
>> mailto:mark.kirkw...@catalyst.net.nz>>
>>
>> wrote:
>>
>> On 14/09/14 05:36, Rohit Goyal wrote:
>>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and
>> update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different
>> version
>> of salary.
>>
>> I want to select  salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I
>> want to
>> know the code portion which i need to look for working on all 4
>> versions
>> in dataset. :)
>>
>>
>>
>> Hi Rohit,
>>
>> Currently in Postgres, these intermediate versions all exist -
>> however a given session can only see one of them. Also VACUUM is
>> allowed to destroy versions that no other transactions can see.
>>
>> So if I'm understanding you correctly, you would like to have some
>> way for a session to see *all* these versions (and I guess
>> preventing VACUUM from destroying them).
>>
>>
>>
>> Any modifications of that sort are bound to introduce lots of pain, not
>> to mention performance degradation and the added responsibility of
>> ensuring that dead tuples don't bloat up the system (prevent vacuum from
>> running at regular intervals and you can have a xid wraparound).
>>
>> I just mentioned that in case you are planning to go in that direction.
>> If you only want the data, use the triggers as Gavin mentioned.
>>
>>
> Obviously in the general case sure - but (as yet) we don't have much idea
> about Rohit's use case and workload. If retrieving past versions is the
> *primary* workload bias and high update concurrency is not required then
> this could well work better than a trigger based solution.
>
> And it does not seem too onerous to have the ability to switch this on as
> required, viz:
>
> ALTER TABLE table1 VERSIONING;
>
> (or similar syntax) which makes VACUUM leave this table alone.
>

How do you plan to  do all that VACUUM does for this table then?

It seems to me that you are saying to VACUUM that it need not be concerned
with table 'A' and you are assuming ownership of all the tasks performed by
VACUUM for this table. Seems pretty broken to me, not to mention the
performance degradations.

Regards,

Atri

Regards,

Atri

Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-14 Thread Atri Sharma

On Sunday, September 14, 2014, Mark Kirkwood 
wrote:

> On 14/09/14 05:36, Rohit Goyal wrote:
>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different version
>> of salary.
>>
>> I want to select  salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I want to
>> know the code portion which i need to look for working on all 4 versions
>> in dataset. :)
>>
>>
>>
> Hi Rohit,
>
> Currently in Postgres, these intermediate versions all exist - however a
> given session can only see one of them. Also VACUUM is allowed to destroy
> versions that no other transactions can see.
>
> So if I'm understanding you correctly, you would like to have some way for
> a session to see *all* these versions (and I guess preventing VACUUM from
> destroying them).
>
>
>
Any modifications of that sort are bound to introduce lots of pain, not to
mention performance degradation and the added responsibility of ensuring
that dead tuples don't bloat up the system (prevent vacuum from running at
regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction. If
you only want the data, use the triggers as Gavin mentioned.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-13 Thread Atri Sharma

On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston <
david.g.johns...@gmail.com> wrote:

> Atri Sharma wrote
> > On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <
>
>
>
> Or rather even if you want to be able to reference the older versions of
> that record there is nothing in PostgreSQL to facilitate that. You have to
> manually create and manage the data so that you know during what time
> period
> a given record is valid.
>
> David J.
>
>
>
>
>
Sometimes I do miss 'time travel' we used to have :)

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Postgres code for a query intermediate dataset

2014-09-13 Thread Atri Sharma

On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal  wrote:

>
> Hi All,
>
> I want to work on the code of intermediate dataset of select and update
> query.
>
> For example.
>
> Rohit's salary has been updated 4 times, so it has 4 different version of
> salary.
>
> I want to select  salary of person named Rohit. Now suppose , in
> intermediate result, I found 4 different versions of the data. I want to
> know the code portion which i need to look for working on all 4 versions in
> dataset. :)
>
> Thanks in advance!!
>
>
>
Not sure what you are looking for, but each update is an insert of a new
tuple with the new values and marking the old tuple as deleted.

There is no need for tracking the versions of any changes in data set. Each
update operation leaves only one visible tuple. If the transaction commits,
inserted tuple becomes visible and old row is marked deleted. If the
transaction rollbacks, only the old tuple shall remain visible.
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Join push-down support for foreign tables

2014-09-05 Thread Atri Sharma

On Fri, Sep 5, 2014 at 2:20 AM, Robert Haas  wrote:

> On Thu, Sep 4, 2014 at 11:56 AM, Bruce Momjian  wrote:
> >> I am thinking eventually we will need to cache the foreign server
> >> statistics on the local server.
> >>
> >> Wouldn't that lead to issues where the statistics get outdated and we
> have to
> >> anyways query the foreign server before planning any joins? Or are you
> thinking
> >> of dropping the foreign table statistics once the foreign join is
> complete?
> >
> > I am thinking we would eventually have to cache the statistics, then get
> > some kind of invalidation message from the foreign server.  I am also
> > thinking that cache would have to be global across all backends, I guess
> > similar to our invalidation cache.
>
> Maybe ... but I think this isn't really related to the ostensible
> topic of this thread.  We can do join pushdown just fine without the
> ability to do anything like this.
>
> I'm in full agreement that we should probably have a way to cache some
> kind of statistics locally, but making that work figures to be tricky,
> because (as I'm pretty sure Tom has pointed out before) there's no
> guarantee that the remote side's statistics look anything like
> PostgreSQL statistics, and so we might not be able to easily store
> them or make sense of them.  But it would be nice to at least have the
> option to store such statistics if they do happen to be something we
> can store and interpret.
>

I agree that we need local statistics too (full agreement to Bruce's
proposal) but playing the Devil's advocate here and trying to figure how
will things like invalidation and as you mentioned, cross compatibility
work.

>
> It's also coming to seem to me more and more that we need a way to
> designate several PostgreSQL machines as a cooperating cluster.  This
> would mean they'd keep connections to each other open and notify each
> other about significant events, which could include "hey, I updated
> the statistics on this table, you might want to get the new ones" or
> "hey, i've replicated your definition for function X so it's safe to
> push it down now" as well as "hey, I have just been promoted to be the
> new master" or even automatic negotiation of which of a group of
> machines should become the master after a server failure.

Thats a brilliant idea, and shouldnt be too much of a problem. One race
condition that is possible is that multiple backend may try to globally
propagate different statistics of the same table, but I think that any
standard logical ordering algorithm should handle that. Also, the automatic
master promotion seems like a brilliant idea and is also great since we
have time tested standard algorithms for that.

One thing I would like to see is that assuming all the interacting nodes do
not have identical schemas, if we can somehow maintain cross node
statistics and use them for planning cross node joins. That would lead to
similar problems as the ones already noted for having local statistics for
foreign databases, but if we solve those anyways for storing local
statistics, we could potentially look at having cross node relation
statistics as well.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Join push-down support for foreign tables

2014-09-04 Thread Atri Sharma

On Thu, Sep 4, 2014 at 9:33 PM, Bruce Momjian  wrote:

> On Thu, Sep  4, 2014 at 09:31:20PM +0530, Atri Sharma wrote:
> > I am thinking we would eventually have to cache the statistics, then
> get
> > some kind of invalidation message from the foreign server.  I am also
> > thinking that cache would have to be global across all backends, I
> guess
> > similar to our invalidation cache.
> >
> >
> >
> >
> > That could lead to some bloat in storing statistics since we may have a
> lot of
> > tables for a lot of foreign servers. Also, will we have VACUUM look at
> > ANALYZING the foreign tables?
>
> > Also, how will we decide that the statistics are invalid? Will we have
> the FDW
> > query the foreign server and do some sort of comparison between the
> statistics
> > the foreign server has and the statistics we locally have? I am trying to
> > understand how the idea of invalidation message from foreign server will
> work.
>
> Well, ANALYZING is running on the foreign server, and somehow it would
> be nice if it would send a message to us about its new statistics, or we
> can do it like http does and it gives us a last-refresh statistics date
> when we connect.
>

Not sure how that would work without changing the way ANALYZE works on the
foreign server. http idea could work,though.

>
> I am not sure how it will work --- I am just suspecting that we might
> get to a point where the statistics lookup overhead on the foreign
> server might become a bottleneck.
>

Totally agree, but doing the planning only locally opens the questions I
mentioned above, and also deprives the foreign server database to do any
optimizations that it may want to do (assuming that the foreign database
and postgres query planner do not generate identical plans). This is only
my thought though, we could also be planning better than the foreign server
database, so the optimization part I raised is debatable.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Join push-down support for foreign tables

2014-09-04 Thread Atri Sharma

On Thu, Sep 4, 2014 at 9:26 PM, Bruce Momjian  wrote:

> On Thu, Sep  4, 2014 at 08:41:43PM +0530, Atri Sharma wrote:
> >
> >
> > On Thursday, September 4, 2014, Bruce Momjian  wrote:
> >
> > On Thu, Sep  4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
> > > The main problem I see here is that accurate costing may require a
> > > round-trip to the remote server.  If there is only one path that is
> > > probably OK; the cost of asking the question will usually be more
> than
> > > paid for by hearing that the pushed-down join clobbers the other
> > > possible methods of executing the query.  But if there are many
> paths,
> > > for example because there are multiple sets of useful pathkeys, it
> > > might start to get a bit expensive.
> > >
> > > Probably both the initial cost and final cost calculations should
> be
> > > delegated to the FDW, but maybe within postgres_fdw, the initial
> cost
> > > should do only the work that can be done without contacting the
> remote
> > > server; then, let the final cost step do that if appropriate.  But
> I'm
> > > not entirely sure what is best here.
> >
> > I am thinking eventually we will need to cache the foreign server
> > statistics on the local server.
> >
> >
> >
> >
> > Wouldn't that lead to issues where the statistics get outdated and we
> have to
> > anyways query the foreign server before planning any joins? Or are you
> thinking
> > of dropping the foreign table statistics once the foreign join is
> complete?
>
> I am thinking we would eventually have to cache the statistics, then get
> some kind of invalidation message from the foreign server.  I am also
> thinking that cache would have to be global across all backends, I guess
> similar to our invalidation cache.
>
>
>
That could lead to some bloat in storing statistics since we may have a lot
of tables for a lot of foreign servers. Also, will we have VACUUM look at
ANALYZING the foreign tables?

Also, how will we decide that the statistics are invalid? Will we have the
FDW query the foreign server and do some sort of comparison between the
statistics the foreign server has and the statistics we locally have? I am
trying to understand how the idea of invalidation message from foreign
server will work.

Regards,

Atri

Re: [HACKERS] Join push-down support for foreign tables

2014-09-04 Thread Atri Sharma

On Thursday, September 4, 2014, Bruce Momjian  wrote:

> On Thu, Sep  4, 2014 at 08:37:08AM -0400, Robert Haas wrote:
> > The main problem I see here is that accurate costing may require a
> > round-trip to the remote server.  If there is only one path that is
> > probably OK; the cost of asking the question will usually be more than
> > paid for by hearing that the pushed-down join clobbers the other
> > possible methods of executing the query.  But if there are many paths,
> > for example because there are multiple sets of useful pathkeys, it
> > might start to get a bit expensive.
> >
> > Probably both the initial cost and final cost calculations should be
> > delegated to the FDW, but maybe within postgres_fdw, the initial cost
> > should do only the work that can be done without contacting the remote
> > server; then, let the final cost step do that if appropriate.  But I'm
> > not entirely sure what is best here.
>
> I am thinking eventually we will need to cache the foreign server
> statistics on the local server.
>
>
>
Wouldn't that lead to issues where the statistics get outdated and we have
to anyways query the foreign server before planning any joins? Or are you
thinking of dropping the foreign table statistics once the foreign join is
complete?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Atri Sharma

On Sunday, August 31, 2014, Andres Freund  wrote:

> On 2014-08-31 21:09:59 +0530, Atri Sharma wrote:
> > On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers  > wrote:
> > > I have found that the "unrecognized node type" error is caused by:
>
> It's a warning, not an error, right?
>
> > > shared_preload_libraries = pg_stat_statements
> > >
> > > in postgresql.conf (as my default compile script was doing).
> > >
> > > If I disable that line the error goes away.
> > >
> > >
> > I  think thats more of a library linking problem rather than a problem
> with
> > the patch. I couldnt reproduce it,though.
>
> I think it's vastly more likely that the patch simply didn't add the new
> expression types to pg_stat_statements.c:JumbleExpr().
>
>
>
Must have run the above diagnosis in a wrong manner then, I will
check.Thanks for the heads up!

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Final Patch for GROUPING SETS - unrecognized node type: 347

2014-08-31 Thread Atri Sharma

On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers  wrote:

> On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
> >> "Erik" == Erik Rijkers  writes:
> >
> >  >> They apply cleanly for me at 2bde297 whether with git apply or
> >  >> patch, except for the contrib one (which you don't need unless you
> >  >> want to run the contrib regression tests without applying the
> >  >> gsp-u patch).
> >
> >  Erik> Ah, I had not realised that.  Excluding that contrib-patch and
> >  Erik> only applying these three:
> >
> >  Erik> gsp1.patch
> >  Erik> gsp2.patch
> >  Erik> gsp-doc.patch
> >
> >  Erik> does indeed work (applies, compiles).
> >
> > I put up a rebased contrib patch anyway (linked off the CF).
> >
> > Did the "unrecognized node type" error go away, or do we still need to
> > look into that?
> >
>
> I have found that the "unrecognized node type" error is caused by:
>
> shared_preload_libraries = pg_stat_statements
>
> in postgresql.conf (as my default compile script was doing).
>
> If I disable that line the error goes away.
>
>
I  think thats more of a library linking problem rather than a problem with
the patch. I couldnt reproduce it,though.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] COPY and heap_sync

2014-08-30 Thread Atri Sharma

On Saturday, August 30, 2014, Amit Kapila  wrote:

> On Sat, Aug 30, 2014 at 11:56 AM, Jeff Janes  > wrote:
> >
> > If you insert tuples with COPY into a table created or truncated in the
> same transaction, at the end of the COPY it calls heap_sync.
> >
> > But there cases were people use COPY in a loop with a small amount of
> data in each statement.  Now it is calling heap_sync many times, and if
> NBuffers is large doing that gets very slow.
> >
> > Could the heap_sync be safely delayed until the end of the transaction,
> rather than the end of the COPY?
>
> Wouldn't unconditionally delaying sync until end of transaction
> can lead to burst of I/O at that time especially if there are many
> such copy commands in a transaction, leading to delay in some
> other operation's that might be happening concurrently in the
> system.
>
>
>
>
I agree with that but then, it can provide us the same benefits like group
commit,especially when most of the copy commands touch pages which are
nearby,hence reducing the seek time overhead.

We could look at making it optional through a GUC, since it is useful
albeit for some specific usecases.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma

On Thu, Aug 14, 2014 at 10:21 PM, Tomas Vondra  wrote:

> On 14 Srpen 2014, 18:02, Atri Sharma wrote:
> > On Thursday, August 14, 2014, Jeff Davis  wrote:
> >
> >> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
> >> > If you're following the HashJoin model, then what you do is the same
> >> thing
> >> > it does: you write the input tuple back out to the pending batch file
> >> for
> >> > the hash partition that now contains key 1001, whence it will be
> >> processed
> >> > when you get to that partition.  I don't see that there's any special
> >> case
> >> > here.
> >>
> >> HashJoin only deals with tuples. With HashAgg, you have to deal with a
> >> mix of tuples and partially-computed aggregate state values. Not
> >> impossible, but it is a little more awkward than HashJoin.
> >>
> >>
> > +1
> >
> > Not to mention future cases if we start maintaining multiple state
> > values,in regarded to grouping sets.
>
> So what would you do for aggregates where the state is growing quickly?
> Say, things like median() or array_agg()?
>
> I think that "we can't do that for all aggregates" does not imply "we must
> not do that at all."
>
> There will always be aggregates not implementing dumping state for various
> reasons, and in those cases the proposed approach is certainly a great
> improvement. I like it, and I hope it will get committed.
>
> But maybe for aggregates supporting serialize/deserialize of the state
> (including all aggregates using known types, not just fixed-size types) a
> hashjoin-like batching would be better? I can name a few custom aggregates
> that'd benefit tremendously from this.
>

Yeah, could work, but is it worth adding additional paths (assuming this
patch gets committed) for some aggregates? I think we should do a further
analysis on the use case.

>
> Just to be clear - this is certainly non-trivial to implement, and I'm not
> trying to force anyone (e.g. Jeff) to implement the ideas I proposed. I'm
> ready to spend time on reviewing the current patch, implement the approach
> I proposed and compare the behaviour.
>

Totally agreed. It would be a different approach, albeit as you said, the
approach can be done off the current patch.

>
> Kudos to Jeff for working on this.
>
> Agreed :)


>
>


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma

On Thursday, August 14, 2014, Jeff Davis  wrote:

> On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote:
> > If you're following the HashJoin model, then what you do is the same
> thing
> > it does: you write the input tuple back out to the pending batch file for
> > the hash partition that now contains key 1001, whence it will be
> processed
> > when you get to that partition.  I don't see that there's any special
> case
> > here.
>
> HashJoin only deals with tuples. With HashAgg, you have to deal with a
> mix of tuples and partially-computed aggregate state values. Not
> impossible, but it is a little more awkward than HashJoin.
>
>
+1

Not to mention future cases if we start maintaining multiple state
values,in regarded to grouping sets.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Shared Data Structure b/w clients

2014-07-22 Thread Atri Sharma

On Tuesday, July 22, 2014, Rohit Goyal  wrote:

> Hi All,
>
> I am working on postgresql code and having some problem. :)
>
> I need to create shared data structure, so that different client and
> connection can update and share the state of those data structures in
> memory. I planned to use top memory context but it can give me shared
> structure within one session/terminal.
>
> Please tel me how  postgresql do that and how i can do that?
>
> Regards,
> Rohit Goyal
>

How about making it a part of shared mem, like shared buffers?

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] how to skip building certain paths in path tree

2014-07-21 Thread Atri Sharma

On Tue, Jul 22, 2014 at 10:53 AM, Rajmohan C  wrote:

> How do I modify PostgreSQL 9.3 source to skip certain join combinations
> (skip building certain paths in path tree) by checking some condition
>


have a look at joinpaths.c

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Atri Sharma

On Thu, Jul 3, 2014 at 11:40 PM, Stephen Frost  wrote:

> Tomas,
>
> * Tomas Vondra (t...@fuzzy.cz) wrote:
> > However it's likely there are queries where this may not be the case,
> > i.e. where rebuilding the hash table is not worth it. Let me know if you
> > can construct such query (I wasn't).
>
> Thanks for working on this!  I've been thinking on this for a while and
> this seems like it may be a good approach.  Have you considered a bloom
> filter?
>

IIRC, last time when we tried doing bloom filters, I was short of some real
world useful hash functions that we could use for building the bloom filter.

If we are restarting experiments on this, I would be glad to assist.

Regards,

Atri

Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins

2014-07-03 Thread Atri Sharma

On Tue, Jul 1, 2014 at 4:54 AM, Tomas Vondra  wrote:

> On 30.6.2014 23:12, Tomas Vondra wrote:
> > Hi,
> >
> > attached is v5 of the patch. The main change is that scaling the number
> > of buckets is done only once, after the initial hash table is build. The
> > main advantage of this is lower price. This also allowed some cleanup of
> > unecessary code.
> >
> > However, this new patch causes warning like this:
> >
> > WARNING:  temporary file leak: File 231 still referenced
> >
> > I've been eyeballing the code for a while now, but I still fail to see
> > where this comes from :-( Any ideas?
>
> Meh, the patches are wrong - I haven't realized the tight coupling
> between buckets/batches in ExecHashGetBucketAndBatch:
>
>   *bucketno = hashvalue & (nbuckets - 1);
>   *batchno = (hashvalue >> hashtable->log2_nbuckets) & (nbatch - 1);
>
> The previous patches worked mostly by pure luck (the nbuckets was set
> only in the first batch), but once I moved the code at the end, it
> started failing. And by "worked" I mean "didn't throw an error, but
> probably returned bogus results with (nbatch>1)".
>
> However, ISTM this nbuckets-nbatch coupling is not really necessary,
> it's only constructed this way to assign independent batch/bucket. So I
> went and changed the code like this:
>
>   *bucketno = hashvalue & (nbuckets - 1);
>   *batchno = (hashvalue >> (32 - hashtable->log2_nbatch));
>
> I.e. using the top bits for batchno, low bits for bucketno (as before).
>
> Hopefully I got it right this time. At least it seems to be working for
> cases that failed before (no file leaks, proper rowcounts so far).
>
>
Hi,

I had a look at the patch and I was wondering if there is a way we can set
a cap on the resized size, and instead increase the number of batches
instead? Since this patch evaluates the growth of tuples vs increase of
space, we could look at increasing the number of batches itself instead of
number of buckets, if the resized number of buckets exceeds a threshold. It
shouldnt be too hard, AIUI it would involve a call in MultiExecHash right
after the new cost evaluation the patch adds.

It isnt a target of the current patch, but I think that it is a logical
extension to the same.

Thoughts/Comments?

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-15 Thread Atri Sharma

On Mon, Jun 16, 2014 at 11:28 AM, Michael Paquier  wrote:

> On Mon, May 26, 2014 at 6:23 AM, Ronan Dunklau 
> wrote:
> > Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit :
> >> On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:
> >> > Hello,
> >> >
> >> > Since my last proposal didn't get any strong rebuttal, please find
> >> > attached a more complete version of the IMPORT FOREIGN SCHEMA
> statement.
> >>
> >> Thanks!
> >>
> >> Please to send future patches to this thread so people can track them
> >> in their mail.
> >
> > I'll do.
> >
> > I didn't for the previous one because it was a few months ago, and no
> patch
> > had been added to the commit fest.
> >
> >>
> >> > I tried to follow the SQL-MED specification as closely as possible.
> >> >
> >> > This adds discoverability to foreign servers. The structure of the
> >> > statement as I understand it is simple enough:
> >> >
> >> > IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT
> TO |
> >> > EXCEPT) table_list ] INTO local_schema.
> >> >
> >> > The import_foreign_schema patch adds the infrastructure, and a new FDW
> >> > routine:
> >> >
> >> > typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
> >> > ImportForeignSchemaStmt * parsetree);
> >> >
> >> > This routine must return a list of CreateForeignTableStmt mirroring
> >> > whatever tables were found on the remote side, which will then be
> >> > executed.
> >> >
> >> > The import_foreign_schema_postgres_fdw patch proposes an
> implementation of
> >> > this API for postgres_fdw. It will import a foreign schema using the
> right
> >> > types as well as nullable information.
> >>
> >> In the case of PostgreSQL, "the right types" are obvious until there's
> >> a user-defined one.  What do you plan to do in that case ?
> >>
> >
> > The current implementation fetches the types as regtype, and when
> receiving a
> > custom type, two things can happen:
> >
> >  - the type is defined locally: everything will work as expected
> >  - the type is not defined locally: the conversion function will fail,
> and
> > raise an error of the form: ERROR:  type "schema.typname" does not exist
>
> Just wondering: what about the case where the same data type is
> defined on both local and remote, but with *different* definitions? Is
> it the responsibility of the fdw to check for type incompatibilities?
>

Logically, should be.

Just wondering, cant we extend the above proposed function  typedef List
*(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree); be changed a bit to give exact type
definitions from the remote side as well?

Regards,

Atri


Regards,

Atri
*l'apprenant*

Re: [HACKERS] Congrats Andres Freund, the newest PostgreSQL Commiter!

2014-05-22 Thread Atri Sharma

Congrats Andres!


On Fri, May 23, 2014 at 7:54 AM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

> Hi All,
>
> At the Developer Meeting that occurred 21th May 2014 was announced a new
> PostgreSQL commiter [1], Mr. Andres Freund.
>
> I had the opportunity to work and be mentored by him. He deserves very
> much this confidence, for the excellent work that has been doing for the
> community.
>
> Thank you and Congrats Andres!
>
>
> [1]
> https://wiki.postgresql.org/wiki/PgCon_2014_Developer_Meeting#New_Committer
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira: http://www.timbira.com.br
> >> Blog sobre TI: http://fabriziomello.blogspot.com
> >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
>



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] is there a hook by which we can modify input query before postgresql builds raw parse tree

2014-05-01 Thread Atri Sharma

On Fri, May 2, 2014 at 10:07 AM, Rajmohan C  wrote:

> Hi,
>
>I want to know is there a way to use a hook to modify the input query
> before Postgresql parses and builds the parse tree for the query.
>

Uh...the rewriter?

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Hashable custom types

2014-04-26 Thread Atri Sharma

The plain UNION code supports either sorting or hashing, but
> we've not gotten around to supporting a sort-based approach
> to recursive UNION.  I'm not convinced that it's worth doing ...
>
> regards, tom lane
>
>
>
Without sorting, isnt the scope of a recursive UNION with custom datatypes
pretty restrictive?

As is, even the sorting shall be a bit restrictive due to the costs
associated. I feel what David has suggested upthread should be good. Maybe
an experimental patch with a workload that should give a load factor 1 for
the hash table should prove some performance points.

Even if thats not the case, we should really do something to improve the
scope of usability of recursive UNION with custom types.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-22 Thread Atri Sharma

On Tue, Apr 22, 2014 at 12:59 PM, Albe Laurenz wrote:

> Jason Petersen wrote:
> > Yes, we obviously want a virtual clock. Focusing on the use of
> gettimeofday seems silly to me: it was
> > something quick for the prototype.
> >
> > The problem with the clocksweeps is they don’t actually track the
> progression of “time” within the
> > PostgreSQL system.
>
> Would it make sense to just cache the result of the latest gettimeofday()
> call
> and use that as an approximation for wall time?
> The busier the system is, the more accurate that should be.
>
>
That sounds...risky. How will the invalidation/updation of the cache work?

How will we track the time window in which the cached value is still valid
and applicable?

My first thoughts only. I may be missing the point though.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-19 Thread Atri Sharma

On Sat, Apr 19, 2014 at 3:37 AM, Bruce Momjian  wrote:

>
> > One thing that I discussed with Merlin offline and am now concerned
> about is
> > how will the actual eviction work. We cannot traverse the entire list
> and then
> > find all the buffers with refcount 0 and then do another traversal to
> find the
> > oldest one.
>
> I thought if there was memory pressure the clock sweep would run and we
> wouldn't have everything at the max counter access value.
>
>
Hmm, I see your point.

With that applicable as well, I feel that the clocksweep counting/logical
clock system shall be useful when deciding between multiple candidates for
eviction. At worst, it can serve to replace the gettimeofday() calls.

One thing I have thought of with ideas and inputs from Joshua Yanowski
offline is that we can probably have a maxheap which is on the logical
clock age of buffers. Each time clocksweep sees a buffer whose refcount has
become zero, it will push the buffer into minheap. This can be a new
representation of freelist or a new additional data structure.

This still does not solve the problem of seeing the entire list by the
clocksweep, even if that makes the eviction process O(1) with the addition
of the maxheap.

I am working on a PoC patch but am stuck on this point. My current approach
sees the entire shared buffers list to search for any candidate buffers.

Another thing that is a pain point here is the concurrency and locking
overheads of introducing a new data structure. Can the existing buffer
header spinlock handle this problem or is it hitting the granularity of the
spinlock too much?

I see some blockers for this idea still. Nevertheless, the point of
clocksweep counts as logical clocks seems to be promising,atleast
intuitively.

Thoughts and comments?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma

Yes, we obviously want a virtual clock. Focusing on the use of gettimeofday
> seems silly to me: it was something quick for the prototype.
>
> The problem with the clocksweeps is they don’t actually track the
> progression of “time” within the PostgreSQL system.
>
> What’s wrong with using a transaction number or some similar sequence? It
> would accurately track “age” in the sense we care about: how long ago in
> “units of real work being done by the DB” something was added.
>
>
>
Well, AIUI, we only need the 'relative' age of buffers in relation to the
youngest buffer present. So, the guy who has seen the maximum amount of
clocksweeps is the guy who has been around the most.

I do not see a need for an accurate estimate of the time spent in the
buffer for any purpose right now. It may be useful in the future though.

How do you get the transaction ID? By accessing a tuple on the page and
reading it's XMIN?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma

On Sat, Apr 19, 2014 at 1:07 AM, Bruce Momjian  wrote:

> On Fri, Apr 18, 2014 at 04:46:31PM +0530, Atri Sharma wrote:
> > This can be changed by introducing an ageing factor that sees how much
> time the
> > current buffer has spend in shared buffers. If the time that the buffer
> has
> > spent is large enough (relatively) and it is not hot currently, that
> means it
> > has had its chance and can be evicted. This shall save the new page (3)
> from
> > being evicted since it's time in shared buffers shall not be high enough
> to
> > mandate eviction and it shall be given more chances.
> >
> > Since gettimeofday() is an expensive call and hence cannot be done in
> the tight
> > loop, we can count the number of clocksweeps the current buffer has seen
> > (rather, survived). This shall give us a rough idea of the estimate of
> the
> > relative age of the buffer.
>
> Counting clock sweeps is an intersting idea.  I think one concern was
> tracking hot buffers in cases where there is no memory pressure, and
> hence the clock sweep isn't running --- I am not sure how this would
> help in that case.
>
>
I feel that if there is no memory pressure, frankly it doesnt matter much
about what gets out and what not. The case I am specifically targeting is
when the clocksweep gets to move about a lot i.e. high memory pressure
workloads. Of course,  I may be totally wrong here.

One thing that I discussed with Merlin offline and am now concerned about
is how will the actual eviction work. We cannot traverse the entire list
and then find all the buffers with refcount 0 and then do another traversal
to find the oldest one.

Any thoughts there would be appreciated.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-18 Thread Atri Sharma

On Fri, Apr 18, 2014 at 7:27 AM, Peter Geoghegan  wrote:

A way I have in mind about eviction policy is to introduce a way to have an
ageing factor in each buffer and take the ageing factor into consideration
when evicting a buffer.

Consider a case where a table is pretty huge and spread across multiple
pages. The querying pattern is like a time series pattern i.e. a set of
rows is queried pretty frequently for some time, making the corresponding
page hot. Then, the next set of rows is queried frequently making that page
hot and so on.

Consider a new page entering the shared buffers with refcount 1 and
usage_count 1. If that page is a part of the workload described above, it
is likely that it shall not be used for a considerable amount of time after
it has entered the buffers but will be used eventually.

Now, the current hypothetical situation is that we have three pages:

1) The page that used to be hot at the previous time window but is no
longer hot and is actually the correct candidate for eviction.
2) The current hot page (It wont be evicted anyway for now).
3) The new page which just got in and should not be evicted since it can be
hot soon (for this workload it will be hot in the next time window).

When Clocksweep algorithm runs the next time, it will see the new buffer
page as the one to be evicted (since page (1) may still have usage_count >
0 i.e. it may be 'cooling' but not 'cool' yet.)

This can be changed by introducing an ageing factor that sees how much time
the current buffer has spend in shared buffers. If the time that the buffer
has spent is large enough (relatively) and it is not hot currently, that
means it has had its chance and can be evicted. This shall save the new
page (3) from being evicted since it's time in shared buffers shall not be
high enough to mandate eviction and it shall be given more chances.

Since gettimeofday() is an expensive call and hence cannot be done in the
tight loop, we can count the number of clocksweeps the current buffer has
seen (rather, survived). This shall give us a rough idea of the estimate of
the relative age of the buffer.

When an eviction happens, all the candidates with refcount = 0 shall be
taken.Then, among them, the one with highest ageing factor shall be evicted.

Of course, there may be better ways of doing the same, but I want to
highlight the point (or possibility) of introducing an ageing factor to
prevent eviction of relatively younger pages early in the eviction process.

The overhead isnt too big. We just need to add another attribute in buffer
header for the number of clocksweeps seen (rather, survived) and check it
when an eviction is taking place.The existing spinlock for buffer headers
shall be good for protecting contention and access. The access rules can be
similar to that of usage_count.

Thoughts and comments?

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Autonomous Transaction (WIP)

2014-04-07 Thread Atri Sharma

On Mon, Apr 7, 2014 at 3:41 PM, Pavel Stehule wrote:

>
>
>
> 2014-04-07 11:59 GMT+02:00 Rajeev rastogi :
>
>   On 07 April 2014 12:12, Pavel Stehule wrote:
>>
>>  >+1 for feature
>>
>> Thanks
>>
>>
>>
>> >-1 for Oracle syntax - it is hardly inconsistent with Postgres
>>
>> We can discuss and come out with the syntax based on everyone agreement.
>>
>> >Autonomous transactions should be used everywhere - not only in plpgsql
>>
>>
>>
>> Yes you are right. I am not planning to support only using plpgsql.
>> Initially we can support this
>>
>> Using the standalone SQL-commands and then later we can enhance based on
>> this infrastructure
>>
>> to be used using plpgsql, triggers.
>>
>
> ok
>
> long time I though about this feature.
>
> I am thinking so this should be fully isolated transaction - it should not
> be subtransaction, because then you can break database consistency - RI
>
>
>
I am missing something here, but how does making it a subtransaction break
consistency? Isnt that what should actually be happening so that the
autonomous transaction's changes are actually visible till the parent
transaction commits?

What am I missing here?

Regards,

Atri

Re: [HACKERS] Optimized out tags

2014-03-20 Thread Atri Sharma

On Fri, Mar 21, 2014 at 9:49 AM, Rajashree Mandaogane <
rajashree@gmail.com> wrote:

> What can be done to get rid of the 'optimized out' tags while debugging?
>

Did you use the appropriate debugging flags when running ./configure?

Regards,

Atri
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Risk Estimation WAS: Planner hints in Postgresql

2014-03-20 Thread Atri Sharma

On Thu, Mar 20, 2014 at 8:51 PM, Tom Lane  wrote:

> Atri Sharma  writes:
> > Now, why cannot we take the estimate of all the buckets behind the bucket
> > in which our value is present? Will that estimate not give us the
> fraction
> > of tuples that are expected to be before the first matching row?
>
> Uh, no, not unless you assume that the table happens to be perfectly
> sorted by the column's value.
>
>
>

Yes, that is true. So, if an attribute has an index present, can we do this
somehow?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Risk Estimation WAS: Planner hints in Postgresql

2014-03-20 Thread Atri Sharma

On Thu, Mar 20, 2014 at 8:10 PM, Robert Haas  wrote:

> On Tue, Mar 18, 2014 at 2:41 PM, Tom Lane  wrote:
> > Atri Sharma  writes:
> >> One of the factors that leads to bad estimates is that the histogram of
> the
> >> values of a column maintained by the planner gets old by time and the
> data
> >> in the column changes. So, the histogram is no longer a quite accurate
> view
> >> of the data and it leads to bad selectivity.
> >
> > TBH, this is so far down the list of problems that it'll be a long time
> > before we need to worry about it.  It's certainly not the number one
> > priority for any project to model risk in the planner.
> >
> > The thing that I think is probably the number one problem is estimates
> > that depend on an assumption of uniform distribution of sought-after rows
> > among those encountered by a scan.  This is usually where bad plans for
> > LIMIT queries are coming from.  We could certainly add some sort of fudge
> > factor to those costs, but I'd like to have a more-or-less principled
> > framework for doing so.
>
> I think the problem is, in some sense, more basic than that.  I think
> the kind of query we're talking about here is:
>
> SELECT * FROM foo WHERE unlikely ORDER BY indexed_column LIMIT 1
>
> Assume for the sake of argument that there are 100 rows that would be
> returned in the absence of the limit.  Let SC and TC be the startup
> cost and total cost of the index scan.  As a matter of general policy,
> we're going to say that the cost of this is SC + 0.01 * (TC - SC).
> What makes this path look appealing to the planner is that SC is small
> relative to TC.  If we knew, for example, that we weren't going to
> find the first match until 90% of the way through the index scan, then
> we could set SC = 90% * TC and, all else being equal, the planner
> would make the right decision.
>
> So you might think that the problem here is that we're assuming
> uniform density.  Let's say there are a million rows in the table, and
> there are 100 that match our criteria, so the first one is going to
> happen 1/10,000'th of the way through the table.  Thus we set SC =
> 0.0001 * TC, and that turns out to be an underestimate if the
> distribution isn't as favorable as we're hoping.  However, that is NOT
> what we are doing.  What we are doing is setting SC = 0.  I mean, not
> quite 0, but yeah, effectively 0. Essentially we're assuming that no
> matter how selective the filter condition may be, we assume that it
> will match *the very first row*.
>
>

Cannot we reuse the same histogram we have in the planner right now for
this? I mean, AFAIK, the heuristic we have is that we divide the histogram
into equal size buckets and then find the bucket in which our predicate
value lies, then take some part of that bucket and the rest of the buckets
before that bucket,right?

So, suppose a query is SELECT * FROM table WHERE a > 10, we shall find the
bucket that 10 lies in, right?

Now, why cannot we take the estimate of all the buckets behind the bucket
in which our value is present? Will that estimate not give us the fraction
of tuples that are expected to be before the first matching row?

Its pretty wild, but I wanted to know if my understanding of this scenario
is correct or not.

Regards,

Atri

-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

2014-03-18 Thread Atri Sharma

>
>> > That's precisely what risk estimation was about.
>>
>> Yeah.  I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter).  But it's a long way from wishing that to making
>> it so.  Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.
>>
>
> Well, currently, selectivity estimates based on MCV should be pretty
> low-uncertainty, whereas certainty of other estimates could be modeled as a
> random variable if ANALYZE gathered a few statistical moments (for
> variables that are prone to that kind of statistical analysis).
>
> That alone could improve things considerably, and statistical info could
> be propagated along expressions to make it possible to model uncertainty in
> complex expressions as well.
>
>

That is a sort of solution that I proposed yesterday on the mailing list.
The solution essentially takes lots of samples of the data and then plots
the mean and standard deviation of the independent samples to get the
probability of the histogram selectivity estimate.


The problem is multi faceted (outdated stats, bad guess at distribution of
data, cases Merlin mentioned before (CASE statements, COALESCE statements
etc.). Finding a general solution to this problem shall require a lot of
research and time.

I agree with Tom, we should focus on some of the main problems we have in
that domain and then try to solve them first.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Risk Estimation WAS: Planner hints in Postgresql

2014-03-18 Thread Atri Sharma

On Tue, Mar 18, 2014 at 11:43 PM, Josh Berkus  wrote:

>
> > On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane  wrote:
> >> Yeah.  I would like to see the planner's cost estimates extended to
> >> include some sort of uncertainty estimate, whereupon risk-averse people
> >> could ask it to prefer low-uncertainty plans over high-uncertainty ones
> >> (the plans we typically choose for ORDER BY ... LIMIT queries being
> great
> >> examples of the latter).  But it's a long way from wishing that to
> making
> >> it so.  Right now it's not even clear (to me anyway) how we'd measure or
> >> model such uncertainty.
>
>

I have been thinking of some ways to have a risk estimate of each
selectivity that our planner gives. I think a way to do it is as follows:

One of the factors that leads to bad estimates is that the histogram of the
values of a column maintained by the planner gets old by time and the data
in the column changes. So, the histogram is no longer a quite accurate view
of the data and it leads to bad selectivity.

One thing we can try to do is to add a factor of error that we feel the
selectivity given can have. This allows us to factor in the probability
that the data changed and the estimate of the difference of the current
histogram and the histogram of the actual data currently present in the
column in the table.

We can use Central Limit Theorem (
http://en.wikipedia.org/wiki/Central_limit_theorem). Essentially, what the
theorem says is that given a distribution that has finite variance and
finite mean, we can take random independent samples from the data and
calculate the standard deviation and the mean of the sample. If we have
large enough number of samples and if we plot the mean and SD, they would
follow a normal distribution.

What is interesting is that this can allow us to predict the SD of a given
dataset from the curve and the SD should be directly proportional to the
deviation it has from the given planner histogram.

I am no mathematician hence its hard for me to explain. I think this link
[1] will be more helpful.

So, we can have a probability value for the random variable and that shall
model the confidence we have in our estimate.

I may be wrong in some parts but I hope I have been able to convey the
general idea.

If this idea develops, I shall be happy to work on this but my hands are
full in ROLLUPS right now, so for my part it shall take time. I just want
to float the idea and get a general feel about the idea right now.

Please let me know your comments and feedback on this.

Regards,

Atri

[1]: http://www.theriac.org/DeskReference/viewDocument.php?id=177
-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma

On Tue, Mar 18, 2014 at 12:46 AM, Merlin Moncure  wrote:

> On Mon, Mar 17, 2014 at 1:45 PM, Pavel Stehule 
> wrote:
> > I don't believe so SELECTIVITY can work well too. Slow queries are
> usually
> > related to some strange points in data. I am thinking so well concept
> should
> > be based on validity of estimations. Some plans are based on totally
> wrong
> > estimation, but should be fast due less sensitivity to bad estimations.
> So
> > well concept is penalization some risk plans - or use brute force - like
> > COLUMN store engine does. Their plan is usually simply and tolerant to
> bad
> > estimations.
>
> Disagree.  There is a special case of slow query where problem is not
> with the data but with the expression over the data; something in the
> query defeats sampled selectivity.  Common culprits are:
>
> *) CASE expressions
> *) COALESCE
> *) casts
> *) simple tranformational expressions
> *) predicate string concatenation
>
> When using those expressions, you often end up with default
> selectivity assumptions and if they are way off -- watch out.
>
> Plan risk analysis solves a different problem: small changes in the
> data mean big changes in the execution runtime.  It probably wouldn't
> even help cases where the server thinks there is one row and you
> actually have thousands or millions unless you want to implement a
> selectivity range with perhaps a risk coefficient.  This was also
> suggested sometime back and was also met with some skepticism (but
> it'd be interesting to see!).
>
>


Another case is with prepared statements, when things like array size are
not know to the planner and the planner makes a hard coded estimate for it,
leading to selection of a customized plan which is worse than the generic
plan.

This would be even more useful for prepared statements since they need some
support from the user in terms of the selectivity and the user should be
allowed to tell more about the data, since he already has given us some
indications about the type of query plans he requires using prepared
statements.


Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma

On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure  wrote:

> On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma  wrote:
> >
> >> There's a big difference between saying to the planner, "Use plan X"
> >> vs "Here's some information describing the data supporting choosing
> >> plan X intelligently".  The latter allows for better plans in the face
> >> of varied/changing data, integrates with the planner in natural way,
> >> and encourages users to understand how the planner works.
> >
> > +1
> >
> > I was thinking of varying the 'weight' of a user defined plan by an fixed
> > experimental factor to tell the planner to give higher/lower preference
> to
> > this plan, but after your idea above, I think Stephen's point of
> introducing
> > a GUC for the factor is the only way possible and I agree with him on the
> > point that eventually the user will figure out a way to force usage of
> his
> > plan using the GUC.
>
> GUC is not the answer beyond the "broad brush" mostly debugging level
> features they already support.   What do you do if your plan
> simultaneously needs and does not need nestloops?
>
> A query plan is a complicated thing that is the result of detail
> analysis of the data.  I bet there are less than 100 users on the
> planet with the architectural knowledge of the planner to submit a
> 'plan'.  What users do have is knowledge of the data that the database
> can't effectively gather for some reason.  Looking at my query above,
> what it would need (assuming the planner could not be made to look
> through length()) would be something like:
>
> SELECT * FROM foo WHERE
>   length(bar) <= 1000 WITH SELECTIVITY 0.999
>   AND length(bar) >= 2 WITH SELECTIVITY 0.999;
>
>

Wont this have scaling issues and  issues over time as the data in the
table changes?

Suppose I make a view with the above query. With time, as the data in the
table changes, the selectivity values wont be good for planning. This may
potentially lead to a lot of changes in the view definition and other
places where this query was used.



In general, I think I step back on my point that specifying the selectivity
is a bad idea.

Could this also work (for the time being) for cross-column statistics?

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma

There's a big difference between saying to the planner, "Use plan X"
> vs "Here's some information describing the data supporting choosing
> plan X intelligently".  The latter allows for better plans in the face
> of varied/changing data, integrates with the planner in natural way,
> and encourages users to understand how the planner works.
>


+1

I was thinking of varying the 'weight' of a user defined plan by an fixed
experimental factor to tell the planner to give higher/lower preference to
this plan, but after your idea above, I think Stephen's point of
introducing a GUC for the factor is the only way possible and I agree with
him on the point that eventually the user will figure out a way to force
usage of his plan using the GUC.

Regards,

Atri


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Planner hints in Postgresql

2014-03-17 Thread Atri Sharma

On Mon, Mar 17, 2014 at 10:54 PM, Tom Lane  wrote:

> Atri Sharma  writes:
> > On Mon, Mar 17, 2014 at 9:45 PM, Tom Lane  wrote:
> >> Possibly worth noting is that in past discussions, we've concluded that
> >> the most sensible type of hint would not be "use this plan" at all, but
> >> "here's what to assume about the selectivity of this WHERE clause".
> >> That seems considerably less likely to break than any attempt to
> directly
> >> specify plan details.
>
> > Isnt using a user given value for selectivity a pretty risky situation as
> > it can horribly screw up the plan selection?
>
> And forcing a plan to be used *isn't* that?  Please re-read the older
> threads, since you evidently have not.
>
>
I never said that we force a plan to be used. I just said that we should
increase the preference for a user given plan and not interfere in the cost
estimation of the other potential plans and the evaluation of the final
selected plan.

Regards,

Atri



-- 
Regards,

Atri
*l'apprenant*

1 2 3 4 >

1 - 100 of 360 matches

Mail list logo