Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Alexander Korotkov
On Wed, Feb 15, 2012 at 2:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  ITSM, I found the problem. This piece of code is triggering an error. It
  assumes each page of corresponding to have initialized buffer. That
 should
  be true because we're inserting index tuples from up to down while
  splits propagate from down to up.
  But this assumptions becomes false we turn buffer off in the root page.
 So,
  root page can produce pages without initialized buffers when splits.

 Hmm ... can we tighten the error check rather than just remove it?  It
 feels less than safe to assume that a hash-entry-not-found condition
 *must* reflect a corner-case situation like that.  At the very least
 I'd like to see it verify that we'd turned off buffering before deciding
 this is OK.  Better, would it be practical to make dummy entries in the
 hash table even after turning buffers off, so that the logic here
 becomes

if (!found) error;
else if (entry is dummy) return without doing anything;
else proceed;

regards, tom lane


Ok, there is another patch fixes this problem. Instead of error triggering
remove it adds empty buffers on root page split if needed.

--
With best regards,
Alexander Korotkov.
*** a/src/backend/access/gist/gistbuild.c
--- b/src/backend/access/gist/gistbuild.c
***
*** 668,677  gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer,
  	if (is_split  BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO)
  	{
  		GISTBufferingInsertStack *oldroot = gfbb-rootitem;
! 		Page		page = BufferGetPage(buffer);
! 		ItemId		iid;
! 		IndexTuple	idxtuple;
! 		BlockNumber leftmostchild;
  
  		gfbb-rootitem = (GISTBufferingInsertStack *) MemoryContextAlloc(
  			gfbb-context, sizeof(GISTBufferingInsertStack));
--- 668,678 
  	if (is_split  BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO)
  	{
  		GISTBufferingInsertStack *oldroot = gfbb-rootitem;
! 		Page		 page = BufferGetPage(buffer);
! 		ItemId		 iid;
! 		IndexTuple	 idxtuple;
! 		BlockNumber  leftmostchild;
! 		OffsetNumber maxoff, i;
  
  		gfbb-rootitem = (GISTBufferingInsertStack *) MemoryContextAlloc(
  			gfbb-context, sizeof(GISTBufferingInsertStack));
***
*** 694,699  gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer,
--- 695,719 
  		oldroot-parent = gfbb-rootitem;
  		oldroot-blkno = leftmostchild;
  		oldroot-downlinkoffnum = InvalidOffsetNumber;
+ 		
+ 		/* 
+ 		 * If root page split produce new pages on leven which have buffers
+ 		 * then initialize empty buffers there.
+ 		 */
+ 		if (LEVEL_HAS_BUFFERS(oldroot-level, gfbb))
+ 		{
+ 			maxoff = PageGetMaxOffsetNumber(page);
+ 			for (i = FirstOffsetNumber; i = maxoff; i = OffsetNumberNext(i))
+ 			{
+ iid = PageGetItemId(page, i);
+ idxtuple = (IndexTuple) PageGetItem(page, iid);
+ gistGetNodeBuffer(gfbb,
+   buildstate-giststate,
+   ItemPointerGetBlockNumber((idxtuple-t_tid)),
+   i,
+   gfbb-rootitem);
+ 			}
+ 		}
  	}
  
  	if (splitinfo)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_fsync performance

2012-02-15 Thread Magnus Hagander
On Wed, Feb 15, 2012 at 02:23, Bruce Momjian br...@momjian.us wrote:
 On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
 On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
   +1, I was about to suggest the same thing.  Running any of these tests
   for a fixed number of iterations will result in drastic degradation of
   accuracy as soon as the machine's behavior changes noticeably from what
   you were expecting.  Run them for a fixed time period instead.  Or maybe
   do a few, then check elapsed time and estimate a number of iterations to
   use, if you're worried about the cost of doing gettimeofday after each
   write.
 
   Good idea, and it worked out very well.  I changed the -o loops
   parameter to -s seconds which calls alarm() after (default) 2 seconds,
   and then once the operation completes, computes a duration per
   operation.
 
  I was kind of wondering how portable alarm() is, and the answer
  according to the buildfarm is that it isn't.

 I'm using following simplistic alarm() implementation for win32:

   https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

 this works with fake sigaction()/SIGALARM hack below - to remember
 function to call.

 Good enough for simple stats printing, and avoids win32-specific
 code spreading around.

 Wow, I wasn't even aware this compiled in Win32;  I thought it was
 ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
 work.  (Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Ping me on im if you need one preconfigured, though...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Different gettext domain needed for error context

2012-02-15 Thread Heikki Linnakangas
I just noticed that we use the same gettext domain for all messages 
attached to one error. That is wrong in case of context information, 
where you have a stack of context lines, originating from different 
modules. The result is that context lines don't always get translated.


For example:

postgres=# set lc_messages ='de_DE.utf8';
SET
postgres=# do $$
begin
  select 1 / 0;
end
$$;
FEHLER:  Division durch Null
CONTEXT:  SQL-Anweisung »select 1 / 0«
PL/pgSQL function inline_code_block line 3 at SQL-Anweisung

Notice how the string PL/pgSQL function ... is not translated. The 
ereport call that raises that error is in int4div, which is in the 
backend gettext domain, postgres. But the errcontext() call comes from 
pl_exec.c.


If the error originates from src/pl/plpgsql, then it works:

postgres=# do $$
begin
  raise;
end
$$;
FEHLER:  RAISE ohne Parameter kann nicht außerhalb einer 
Ausnahmebehandlung verwendet werden

CONTEXT:  PL/pgSQL-Funktion »inline_code_block« Zeile 3 bei RAISE

In theory, I guess the other fields like errhint() potentially have the 
same problem, but they're not currently used to provide extra 
information to messages originating from another module, so I'm inclined 
to leave them alone for now.


To fix this, we need to somehow pass the caller's text domain to 
errcontext(). The most straightforward way is to pass it as an extra 
argument. Ideally, errcontext() would be a macro that passes TEXTDOMAIN 
to the underlying function, so that you don't need to change all the 
callers of errcontext():


#define errcontext(...) errcontext_domain(TEXTDOMAIN, ...)

But that doesn't work, because it would require varags macros. Anyone 
know a trick to make that work?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml

2012-02-15 Thread Christoph Berg
diff --git a/doc/src/sgml/sepgsql.sgml b/doc/src/sgml/sepgsql.sgml
index e45c258..ee0a255 100644
*** a/doc/src/sgml/sepgsql.sgml
--- b/doc/src/sgml/sepgsql.sgml
*** UPDATE t1 SET x = 2, y = md5sum(y) WHERE
*** 358,364 
  /synopsis
  
  In this case we must have literaldb_table:select/ in addition to
! literaldb_table:update/, because literalt1.a/ is referenced
  within the literalWHERE/ clause.  Column-level permissions will also 
be
  checked for each referenced column.
 /para
--- 358,364 
  /synopsis
  
  In this case we must have literaldb_table:select/ in addition to
! literaldb_table:update/, because literalt1.z/ is referenced
  within the literalWHERE/ clause.  Column-level permissions will also 
be
  checked for each referenced column.
 /para


(It is unclear to me why the same example is cited twice here, but the
text around them is consistent with that.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature


Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)

2012-02-15 Thread Kohei KaiGai
The attached patch is additional regression tests of ALTER FUNCTION with
LEAKPROOF based on your patch.
It also moves create_function_3 into the group with create_aggregate and so on.

Thanks,

2012/2/14 Kohei KaiGai kai...@kaigai.gr.jp:
 2012/2/14 Robert Haas robertmh...@gmail.com:
 On Tue, Feb 14, 2012 at 4:55 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I could not find out where is the origin of grammer conflicts, although
 it does not conflict with any options within ALTER FUNCTION.

 Do you think the idea of ALTER ... NOT LEAKPROOF should be
 integrated within v9.2 timeline also?

 Yes.  Did you notice that I attached a patch to make that work?  I'll
 commit that today or tomorrow unless someone comes up with a better
 solution.

 Yes. I'll be available to work on the feature based on this patch.
 It was a headache of mine to implement alter statement to add/remove
 leakproof attribute.

 I also think we ought to stick create_function_3 into one of the
 parallel groups in the regression tests, if possible.  Can you
 investigate that?

 Not yet. This test does not have dependency with other tests,
 so, I'm optimistic to run create_function_3 concurrently.

 Me, too.

 I tried to move create_function_3 into the group of create_view and
 create_index, then it works correctly.

 Thanks,
 --
 KaiGai Kohei kai...@kaigai.gr.jp



-- 
KaiGai Kohei kai...@kaigai.gr.jp


pgsql-v9.2-alter-function-leakproof-regtest.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] controlling the location of server-side SSL files

2012-02-15 Thread Peter Eisentraut
On ons, 2012-02-08 at 09:16 +0100, Magnus Hagander wrote:
  My best idea at the moment is that we should set these parameters to
  empty by default, and make users point them to existing files if they
  want to use that functionality.  Comments?
 
 
 +1. Anybody who actually cares about setting up security is likely not
 going to rely on defaults anyway - and is certainly going to review
 whatever they are. So there should be no big problem there. 

Updated patch to reflect this.
*** i/doc/src/sgml/config.sgml
--- w/doc/src/sgml/config.sgml
***
*** 668,673  SET ENABLE_SEQSCAN TO OFF;
--- 668,737 
/listitem
   /varlistentry
  
+  varlistentry id=guc-ssl-ca-file xreflabel=ssl_ca_file
+   termvarnamessl_ca_file/varname (typestring/type)/term
+   indexterm
+primaryvarnamessl_ca_file/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the name of the file containing the SSL server certificate
+ authority (CA).  The default is empty, meaning no CA file is loaded,
+ and client certificate verification is not performed.  (In previous
+ releases of PostgreSQL, the name of this file was hard-coded
+ as filenameroot.crt/filename.)  Relative paths are relative to the
+ data directory.  This parameter can only be set at server start.
+/para
+   /listitem
+  /varlistentry
+ 
+  varlistentry id=guc-ssl-cert-file xreflabel=ssl_cert_file
+   termvarnamessl_cert_file/varname (typestring/type)/term
+   indexterm
+primaryvarnamessl_cert_file/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the name of the file containing the SSL server certificate.
+ The default is filenameserver.crt/filename.  Relative paths are
+ relative to the data directory.  This parameter can only be set at
+ server start.
+/para
+   /listitem
+  /varlistentry
+ 
+  varlistentry id=guc-ssl-crl-file xreflabel=ssl_crl_file
+   termvarnamessl_crl_file/varname (typestring/type)/term
+   indexterm
+primaryvarnamessl_crl_file/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the name of the file containing the SSL server certificate
+ revocation list (CRL).  The default is empty, meaning no CRL file is
+ loaded.  (In previous releases of PostgreSQL, the name of this file was
+ hard-coded as filenameroot.crl/filename.)  Relative paths are
+ relative to the data directory.  This parameter can only be set at
+ server start.
+/para
+   /listitem
+  /varlistentry
+ 
+  varlistentry id=guc-ssl-key-file xreflabel=ssl_key_file
+   termvarnamessl_key_file/varname (typestring/type)/term
+   indexterm
+primaryvarnamessl_key_file/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the name of the file containing the SSL server private key.
+ The default is filenameserver.key/filename.  Relative paths are
+ relative to the data directory.  This parameter can only be set at
+ server start.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-ssl-renegotiation-limit xreflabel=ssl_renegotiation_limit
termvarnamessl_renegotiation_limit/varname (typeinteger/type)/term
indexterm
*** i/doc/src/sgml/runtime.sgml
--- w/doc/src/sgml/runtime.sgml
***
*** 1831,1840  pg_dumpall -p 5432 | psql -d postgres -p 5433
 SSL certificates and make sure that clients check the server's certificate.
 To do that, the server
 must be configured to accept only literalhostssl/ connections (xref
!linkend=auth-pg-hba-conf) and have SSL
!filenameserver.key/filename (key) and
!filenameserver.crt/filename (certificate) files (xref
!linkend=ssl-tcp). The TCP client must connect using
 literalsslmode=verify-ca/ or
 literalverify-full/ and have the appropriate root certificate
 file installed (xref linkend=libpq-connect).
--- 1831,1838 
 SSL certificates and make sure that clients check the server's certificate.
 To do that, the server
 must be configured to accept only literalhostssl/ connections (xref
!linkend=auth-pg-hba-conf) and have SSL key and certificate files
!(xref linkend=ssl-tcp). The TCP client must connect using
 literalsslmode=verify-ca/ or
 literalverify-full/ and have the appropriate root certificate
 file installed (xref linkend=libpq-connect).
***
*** 2053,2062  pg_dumpall -p 5432 | psql -d postgres -p 5433
/note
  
para
!To start in acronymSSL/ mode, the files filenameserver.crt/
!and filenameserver.key/ must exist in the server's data directory.
!These files should contain the server certificate and private key,
!

Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-15 Thread Etsuro Fujita
(2012/02/14 23:50), Tom Lane wrote:
 Shigeru Hanadashigeru.han...@gmail.com  writes:
 (2012/02/14 17:40), Etsuro Fujita wrote:
 As discussed at
 that thread, it would have to change the PlanForeignScan API to let the
 FDW generate multiple paths and dump them all to add_path instead of
 returning a FdwPlan struct.
 
 Multiple valuable Paths for a scan of a foreign table by FDW, but
 changing PlanForeignScan to return list of FdwPlan in 9.2 seems too
 hasty.
 
 I would really like to see that happen in 9.2, because the longer we let
 that mistake live, the harder it will be to change.  More and more FDWs
 are getting written.  I don't think it's that hard to do: we just have
 to agree that PlanForeignScan should return void and call add_path for
 itself, possibly more than once.

Agreed.  I fixed the PlanForeignScan API.  Please find attached a patch.

 If we do that, I'm inclined to think
 we cou;d get rid of the separate Node type FdwPlan, and just incorporate
 List *fdw_private into ForeignPath and ForeignScan.

+1  While the patch retains the struct FdwPlan, I would like to get rid
of it at next version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/file_fdw/file_fdw.c
--- b/contrib/file_fdw/file_fdw.c
***
*** 25,30 
--- 25,31 
  #include miscadmin.h
  #include nodes/makefuncs.h
  #include optimizer/cost.h
+ #include optimizer/pathnode.h
  #include utils/rel.h
  #include utils/syscache.h
  
***
*** 93,99  PG_FUNCTION_INFO_V1(file_fdw_validator);
  /*
   * FDW callback routines
   */
! static FdwPlan *filePlanForeignScan(Oid foreigntableid,
PlannerInfo *root,
RelOptInfo *baserel);
  static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
--- 94,100 
  /*
   * FDW callback routines
   */
! static void filePlanForeignScan(Oid foreigntableid,
PlannerInfo *root,
RelOptInfo *baserel);
  static void fileExplainForeignScan(ForeignScanState *node, ExplainState *es);
***
*** 406,432  get_file_fdw_attribute_options(Oid relid)
  
  /*
   * filePlanForeignScan
!  *Create a FdwPlan for a scan on the foreign table
   */
! static FdwPlan *
  filePlanForeignScan(Oid foreigntableid,
PlannerInfo *root,
RelOptInfo *baserel)
  {
FdwPlan*fdwplan;
char   *filename;
List   *options;
  
/* Fetch options --- we only need filename at this point */
fileGetOptions(foreigntableid, filename, options);
  
/* Construct FdwPlan with cost estimates */
fdwplan = makeNode(FdwPlan);
estimate_costs(root, baserel, filename,
   fdwplan-startup_cost, 
fdwplan-total_cost);
-   fdwplan-fdw_private = NIL; /* not used */
  
!   return fdwplan;
  }
  
  /*
--- 407,447 
  
  /*
   * filePlanForeignScan
!  *Create the (single) path for a scan on the foreign table
   */
! static void
  filePlanForeignScan(Oid foreigntableid,
PlannerInfo *root,
RelOptInfo *baserel)
  {
+   ForeignPath *pathnode = makeNode(ForeignPath);
FdwPlan*fdwplan;
char   *filename;
List   *options;
  
+   pathnode-path.pathtype = T_ForeignScan;
+   pathnode-path.parent = baserel;
+   pathnode-path.pathkeys = NIL;  /* result is always unordered */
+   pathnode-path.required_outer = NULL;
+   pathnode-path.param_clauses = NIL;
+ 
/* Fetch options --- we only need filename at this point */
fileGetOptions(foreigntableid, filename, options);
  
/* Construct FdwPlan with cost estimates */
fdwplan = makeNode(FdwPlan);
+   fdwplan-fdw_private = NIL; /* not used */
estimate_costs(root, baserel, filename,
   fdwplan-startup_cost, 
fdwplan-total_cost);
  
!   pathnode-fdwplan = fdwplan;
! 
!   /* Use costs estimated by FDW */
!   pathnode-path.rows = baserel-rows;
!   pathnode-path.startup_cost = fdwplan-startup_cost;
!   pathnode-path.total_cost = fdwplan-total_cost;
! 
!   add_path(baserel, (Path *) pathnode);
  }
  
  /*
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***
*** 88,108 
  
  para
  programlisting
! FdwPlan *
  PlanForeignScan (Oid foreigntableid,
   PlannerInfo *root,
   RelOptInfo *baserel);
  /programlisting
  
!  Plan a scan on a foreign table. This is called when a query is planned.
   literalforeigntableid/ is the structnamepg_class/ OID of the
   foreign table.  literalroot/ is the planner's global information
   about the query, and literalbaserel/ is 

Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Heikki Linnakangas

On 15.02.2012 10:18, Alexander Korotkov wrote:

On Wed, Feb 15, 2012 at 2:54 AM, Tom Lanet...@sss.pgh.pa.us  wrote:


Alexander Korotkovaekorot...@gmail.com  writes:

ITSM, I found the problem. This piece of code is triggering an error. It
assumes each page of corresponding to have initialized buffer. That

should

be true because we're inserting index tuples from up to down while
splits propagate from down to up.
But this assumptions becomes false we turn buffer off in the root page.

So,

root page can produce pages without initialized buffers when splits.


Hmm ... can we tighten the error check rather than just remove it?  It
feels less than safe to assume that a hash-entry-not-found condition
*must* reflect a corner-case situation like that.  At the very least
I'd like to see it verify that we'd turned off buffering before deciding
this is OK.  Better, would it be practical to make dummy entries in the
hash table even after turning buffers off, so that the logic here
becomes

if (!found) error;
else if (entry is dummy) return without doing anything;
else proceed;

regards, tom lane



Ok, there is another patch fixes this problem. Instead of error triggering
remove it adds empty buffers on root page split if needed.


Actually, I think it made sense to simply do nothing if the buffer 
doesn't exist. The algorithm doesn't require that all the buffers must 
exist at all times. It is quite accidental that whenever we call 
gistRelocateBuildBuffersOnSplit(), the page must already have its buffer 
created (and as we found out, the assumption doesn't hold after a root 
split, anyway). Also, we talked earlier that it would be good to destroy 
buffers that become completely empty, to save memory. If we do that, 
we'd have to remove that check anyway.


So, I think we should go with your original fix and simply do nothing in 
gistRelocateBuildBuffersOnSplit() if the page doesn't have a buffer. 
Moreover, if the page has a buffer but it's empty, 
gistRelocateBuildBuffersOnSplit() doesn't need to create buffers for the 
new sibling pages. In the final emptying phase, that's a waste of time, 
the buffers we create will never be used, and even before that I think 
it's better to create the buffers lazily.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-15 Thread Peter Geoghegan
On 15 February 2012 06:16, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Feb 10, 2012 at 10:30 AM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 [ new patch ]

 I spent quite a bit of time looking at this today - the patch
 specifically, and the issue of making quicksort fast more generally.
 It seemed to me that if we're going to have separate copies of the
 quicksort code for tuple sorting, we might as well go whole hog and
 specialize those copies to the particular needs of tuplesort.c as much
 as possible.  Accordingly, I whacked the code around so that it knows
 that it is sorting SortTuple objects and need not conditionalize at
 runtime on the size of the objects being swapped.  You suggested
 upthread that this might be worthwhile, and it seems that it is, so I
 think we should do it.

Cool. I agree that we should do this. It doesn't need to be justified
as a performance optimisation - it makes sense to refactor in this
way. If that makes things faster, then so much the better.

 Your patch removes the CHECK_FOR_INTERRUPTS() call from
 comparetup_heap, which is no good.  However, since I'd already decided
 to specialize the copies of quicksort intended for sorting as much as
 possible, it made sense to me to refactor things so that the qsort
 routine itself, rather than the comparator, is responsible for calling
 CHECK_FOR_INTERRUPTS().  This slightly reduces the number of times we
 CHECK_FOR_INTERRUPTS(), but never allows more than a few comparisons
 before doing it.

Well, the idea of that was to have one and only CHECK_FOR_INTERRUPTS()
call in the leading comparator. I should perhaps have further stressed
that that patch was only intended to establish the idea of having that
function pointer call within an inlined leading-key's comparator
calling outer comparator.

 I find that your pg_always_inline macro is equivalent to just plain
 inline on my system (MacOS X v10.6.8, gcc 4.2.1).  It seems to need
 something like this:

 +#elif __GNUC__
 +#define pg_always_inline inline __attribute__((always_inline))

 ...but I'm not very happy about relying on that, because I don't know
 that it will work on every gcc version (never mind non-gcc compilers),
 and I'm not convinced it's actually improving performance even on this
 one.  The documentation seems to indicate that this is intended to
 force inlining even when not optimizing, which may have something to
 do with the lack of effect: that's not really the point here anyway.

There is a scant reference that suggests this, yes, but that's
contradicted by other scanty sources. The macro was observed to be
helpful on the multi-key case, and the effect was quite apparent and
reproducible. At any rate its appearance in my most recenty patch was
vestigial - I think that there ought to be no need for it, now that
the compiler cost/benefit analysis probably has things right by
inlining.

 What I did instead is to replace template_qsort_arg.h with a script
 called gen_qsort_tuple.pl, which generates a file called qsort_tuple.c
 that tuplesort.c then #includes.  This seems more flexible to me than
 the macro-based approach.  In particular, it allows me to generate
 versions of qsort with different call signatures.  The attached patch
 generates two:

 static void qsort_tuple(SortTuple *a, size_t n, SortTupleComparator
 cmp_tuple, Tuplesortstate *state);
 static void qsort_ssup(SortTuple *a, size_t n, SortSupport ssup);

 The first of these is a drop-in replacement for qsort_arg() - any
 tuplesort can use it, not just heap sorts.  But it is faster than
 qsort_arg() because of the specializations for the SortTuple data
 type.  The second handles the special case where we are sorting by a
 single key that has an associated SortSupport object.  In this case we
 don't need to carry the overhead of passing around the Tuplesortstate
 and dereferencing it, nor do we need the SortTupleComparator: we can
 just pass the SortSupport itself.  Maybe there's a way to get this
 effect using macros, but I couldn't figure it out.

I've seen the pattern where generic programming is aped with the
preprocessor in in the style of my patch in, of all things, wxWidgets,
where it continues to be used as part of pgAdmin, or was when I worked
on wxWidgets 3 support exactly one year ago. One thing that is
particularly bizarre about that code is that clients actually #include
a cpp file. This is, I'd guess, a legacy of having to target pre C++98
compilers without decent template support. MSVC 6, in particular, was
completely inadequate in this area.

I am inclined to agree that given that we already use Perl to generate
source code like this, it seems natural that we should prefer to do
that, if only to avoid paranoia about the inclusion of a dial-a-bloat
knob. I am at a disadvantage here, since I've never written a line of
Perl.

 With this patch, I get the following results, as compared with your
 2012-02-10 version and master, using the same test cases I tested
 before.

 select * 

Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Alexander Korotkov
On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 Actually, I think it made sense to simply do nothing if the buffer doesn't
 exist. The algorithm doesn't require that all the buffers must exist at all
 times. It is quite accidental that whenever we call
 gistRelocateBuildBuffersOnSpli**t(), the page must already have its
 buffer created (and as we found out, the assumption doesn't hold after a
 root split, anyway). Also, we talked earlier that it would be good to
 destroy buffers that become completely empty, to save memory. If we do
 that, we'd have to remove that check anyway.

 So, I think we should go with your original fix and simply do nothing in
 gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer.
 Moreover, if the page has a buffer but it's empty,
 gistRelocateBuildBuffersOnSpli**t() doesn't need to create buffers for
 the new sibling pages. In the final emptying phase, that's a waste of time,
 the buffers we create will never be used, and even before that I think it's
 better to create the buffers lazily.


I agree.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 5:38 AM, Christoph Berg c...@df7cb.de wrote:
 diff --git a/doc/src/sgml/sepgsql.sgml b/doc/src/sgml/sepgsql.sgml
 index e45c258..ee0a255 100644
 *** a/doc/src/sgml/sepgsql.sgml
 --- b/doc/src/sgml/sepgsql.sgml
 *** UPDATE t1 SET x = 2, y = md5sum(y) WHERE
 *** 358,364 
  /synopsis

      In this case we must have literaldb_table:select/ in addition to
 !     literaldb_table:update/, because literalt1.a/ is referenced
      within the literalWHERE/ clause.  Column-level permissions will also 
 be
      checked for each referenced column.
     /para
 --- 358,364 
  /synopsis

      In this case we must have literaldb_table:select/ in addition to
 !     literaldb_table:update/, because literalt1.z/ is referenced
      within the literalWHERE/ clause.  Column-level permissions will also 
 be
      checked for each referenced column.
     /para

Fixed, but note that I had to recreate the patch by manual
examination. Including it inline tends to garble things.

 (It is unclear to me why the same example is cited twice here, but the
 text around them is consistent with that.)

Fixed this too, and did some related rewording and proofreading.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2012-02-15 Thread Kohei KaiGai
Harada-san,

I checked the v9 patch, however, it still has some uncertain implementation.

[memory context of tuple store]
It calls tuplestore_begin_heap() under the memory context of
festate-scan_cxt at pgsqlBeginForeignScan.
On the other hand, tuplestore_gettupleslot() is called under the
memory context of festate-tuples.
I could not find a callback functions being invoked on errors,
so I doubt the memory objects acquired within tuplestore_begin_heap()
shall be leaked, even though it is my suggestion to create a sub-context
under the existing one.

In my opinion, it is a good choice to use es_query_cxt of the supplied EState.
What does prevent to apply this per-query memory context?

You mention about PGresult being malloc()'ed. However, it seems to me
fetch_result() and store_result() once copy the contents on malloc()'ed
area to the palloc()'ed area, and PQresult is released on an error using
PG_TRY() ... PG_CATCH() block.

[Minor comments]
Please set NULL to sql variable at begin_remote_tx().
Compiler raises a warnning due to references of uninitialized variable,
even though the code path never run.

It potentially causes a problem in case when fetch_result() raises an
error because of unexpected status (!= PGRES_TUPLES_OK).
One code path is not protected with PG_TRY(), and other code path
will call PQclear towards already released PQresult.

Although it is just a preference of mine, is the exprFunction necessary?
It seems to me, the point of push-down check is whether the supplied
node is built-in object, or not. So, an sufficient check is is_builtin() onto
FuncExpr-funcid, OpExpr-opno, ScalarArrayOpExpr-opno and so on.
It does not depend on whether the function implementing these nodes
are built-in or not.

Thanks,

2012年2月14日9:09 Shigeru Hanada shigeru.han...@gmail.com:
 (2012/02/14 15:15), Shigeru Hanada wrote:
 Good catch, thanks.  I'll revise pgsql_fdw tests little more.

 Here are the updated patches.  In addition to Fujita-san's comment, I
 moved DROP OPERATOR statements to clean up section of test script.

 Regards,
 --
 Shigeru Hanada



-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_fsync performance

2012-02-15 Thread Bruce Momjian
On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:
 On Wed, Feb 15, 2012 at 02:23, Bruce Momjian br...@momjian.us wrote:
  On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
  On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from 
what
you were expecting.  Run them for a fixed time period instead.  Or 
maybe
do a few, then check elapsed time and estimate a number of iterations 
to
use, if you're worried about the cost of doing gettimeofday after each
write.
  
Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.
  
   I was kind of wondering how portable alarm() is, and the answer
   according to the buildfarm is that it isn't.
 
  I'm using following simplistic alarm() implementation for win32:
 
    https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
 
  this works with fake sigaction()/SIGALARM hack below - to remember
  function to call.
 
  Good enough for simple stats printing, and avoids win32-specific
  code spreading around.
 
  Wow, I wasn't even aware this compiled in Win32;  I thought it was
  ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
  work.  (Me wonders if the GoGrid Windows images have compilers.)
 
 They don't, since most of the compilers people would ask for don't
 allow that kind of redistribution.

Shame.

 Ping me on im if you need one preconfigured, though...

How do you do that?  Also, once you create a Windows VM on a public
cloud, how do you connect to it?  SSH?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml

2012-02-15 Thread Christoph Berg
Re: Robert Haas 2012-02-15 
ca+tgmozjutszhpxwwzcm4kjjwh__1admo3kosbbq+fhscqp...@mail.gmail.com
 Fixed, but note that I had to recreate the patch by manual
 examination. Including it inline tends to garble things.

Hmm, I thought I had :set paste and everything...

  (It is unclear to me why the same example is cited twice here, but the
  text around them is consistent with that.)
 
 Fixed this too, and did some related rewording and proofreading.

Makes much more sense now. Thanks!

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


signature.asc
Description: Digital signature


Re: [HACKERS] pg_test_fsync performance

2012-02-15 Thread Bruce Momjian
On Tue, Feb 14, 2012 at 08:23:10PM -0500, Bruce Momjian wrote:
 On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
  On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting.  Run them for a fixed time period instead.  Or 
maybe
do a few, then check elapsed time and estimate a number of iterations 
to
use, if you're worried about the cost of doing gettimeofday after each
write.
   
Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.
   
   I was kind of wondering how portable alarm() is, and the answer
   according to the buildfarm is that it isn't.
  
  I'm using following simplistic alarm() implementation for win32:
  
https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
  
  this works with fake sigaction()/SIGALARM hack below - to remember
  function to call.
  
  Good enough for simple stats printing, and avoids win32-specific
  code spreading around.
 
 Wow, I wasn't even aware this compiled in Win32;  I thought it was
 ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
 work.  (Me wonders if the GoGrid Windows images have compilers.)
 
 I see backend/port/win32/timer.c so I might go with a simple create a
 thread, sleep(2), set flag, exit solution.

Yeah, two Windows buildfarm machines have now successfully compiled my
patches, so I guess I fixed it;  patch attached.

The fix was surprisingly easy given the use of threads;  scheduling the
timeout in the operating system was just too invasive.

I would like to eventually know if this fix actually produces the right
output.  How would I test that?  Are the buildfarm output binaries
available somewhere?  Should I add this as a 9.2 TODO item?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_test_fsync/pg_test_fsync.c b/contrib/pg_test_fsync/pg_test_fsync.c
new file mode 100644
index 02a9e21..7f92bc8
*** a/contrib/pg_test_fsync/pg_test_fsync.c
--- b/contrib/pg_test_fsync/pg_test_fsync.c
***
*** 28,39 
--- 28,54 
  #define OPS_FORMAT			%9.3f ops/sec
  
  /* These are macros to avoid timing the function call overhead. */
+ #ifndef WIN32
  #define START_TIMER	\
  do { \
  	alarm_triggered = false; \
  	alarm(secs_per_test); \
  	gettimeofday(start_t, NULL); \
  } while (0)
+ #else
+ /* WIN32 doesn't support alarm, so we create a thread and sleep there */
+ #define START_TIMER	\
+ do { \
+ 	alarm_triggered = false; \
+ 	if (CreateThread(NULL, 0, process_alarm, NULL, 0, NULL) == \
+ 		INVALID_HANDLE_VALUE) \
+ 	{ \
+ 		fprintf(stderr, Cannot create thread for alarm\n); \
+ 		exit(1); \
+ 	} \
+ 	gettimeofday(start_t, NULL); \
+ } while (0)
+ #endif
  
  #define STOP_TIMER	\
  do { \
*** static void test_sync(int writes_per_op)
*** 62,68 
--- 77,87 
  static void test_open_syncs(void);
  static void test_open_sync(const char *msg, int writes_size);
  static void test_file_descriptor_sync(void);
+ #ifndef WIN32
  static void process_alarm(int sig);
+ #else
+ static DWORD WINAPI process_alarm(LPVOID param);
+ #endif
  static void signal_cleanup(int sig);
  
  #ifdef HAVE_FSYNC_WRITETHROUGH
*** main(int argc, char *argv[])
*** 82,88 
--- 101,109 
  	/* Prevent leaving behind the test file */
  	signal(SIGINT, signal_cleanup);
  	signal(SIGTERM, signal_cleanup);
+ #ifndef WIN32
  	signal(SIGALRM, process_alarm);
+ #endif
  #ifdef SIGHUP
  	/* Not defined on win32 */
  	signal(SIGHUP, signal_cleanup);
*** print_elapse(struct timeval start_t, str
*** 550,560 
--- 571,592 
  	printf(OPS_FORMAT \n, per_second);
  }
  
+ #ifndef WIN32
  static void
  process_alarm(int sig)
  {
  	alarm_triggered = true;
  }
+ #else
+ static DWORD WINAPI
+ process_alarm(LPVOID param)
+ {
+ 	/* WIN32 doesn't support alarm, so we create a thread and sleep here */
+ 	Sleep(secs_per_test * 1000);
+ 	alarm_triggered = true;
+ 	ExitThread(0);
+ }
+ #endif
  
  static void
  die(const char *str)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 8:29 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Cool. I agree that we should do this. It doesn't need to be justified
 as a performance optimisation - it makes sense to refactor in this
 way. If that makes things faster, then so much the better.

Well, maybe so, but I think if the performance had been the same I
might not have messed with it.

 I am inclined to agree that given that we already use Perl to generate
 source code like this, it seems natural that we should prefer to do
 that, if only to avoid paranoia about the inclusion of a dial-a-bloat
 knob. I am at a disadvantage here, since I've never written a line of
 Perl.

I think it's still dial-a-bloat, but I feel pretty comfortable about
how we've got that knob adjusted in this version.  It's almost as much
improvement as any previous version, it applies to more cases, and the
code footprint is the least of any version I've measured.

 Hmm. I'll run some tests myself when I get a chance, and attempt to
 determine what's going on here. I don't have immediate access to a
 good benchmarking server, which would be preferable, and I'd like to
 post a revision of pg_stat_statements normalisation today, as it's
 been too long. Nice work though.

Thanks.

 It strikes me that if we wanted to take this further, we could look at
 squeezing out ApplySortComparator.  For example, suppose that, upon
 discovering that we can do an in-memory quicksort on a single sort
 key, we make an initial pass over the data where we check whether it's
 sorted and, as we go, swap all the entries with isnull1 = true to the
 end of the memtuples array.  We then sort the isnull1 = true entries
 with the standard comparator, and the isnull1 = false entries with an
 optimized comparator that elides most of ApplySortComparator and
 instead just calls the comparison function directly.  We then decide
 on whether to emit the isnull1 = true entries first or last based on
 NULLS FIRST/LAST, and decide whether to emit the remaining entries in
 forward or reverse order based on ASC/DESC.  Or maybe not exactly that
 thing, but something like that, so that we sort the null and non-null
 entries separately.  The additional complexity in the read-out logic
 would probably be more than offset by being able to use a simpler
 comparator.

 While it's really easy to be wrong about these things, I'd venture to
 guess that branch prediction makes this a less than compelling win in
 practice. That said, if you want to know how good a win it might be,
 you need only knock out a quick-and-dirty prototype and see how fast a
 sort is - however well this does will be pretty close to your best
 case, but not quite, since presumably such a prototype wouldn't worry
 about the applicability of the optimisation.

You might be right.  But note that the swapcode improvements were also
vulnerable to branch prediction, too, though there could also be other
effects there.  One effect that should perhaps be considered is the
cost of saving and restoring registers.  Even if the branch itself
doesn't cost much because we know what the result will be, the
operands still have to be loaded into registers, which isn't free of
itself and also potentially forces those registers to be saved and
restored across function calls.  Still, I'm inclined not to poke at
this right now: there are a lot of patches that still need to be
looked at, and at the rate we're currently disposing of patches this
CommitFest will still be ongoing come Easter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bugs/slowness inserting and indexing cubes

2012-02-15 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
 So, I think we should go with your original fix and simply do nothing in
 gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer.

 I agree.

OK, I won't object.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Bug fix] postmaster always crashes on a debug version of Windows

2012-02-15 Thread MauMau

Hello

I encountered a bug which always causes PostgreSQL to crash on Windows. 
Attached is a patch that fixes it. Please review it and include it in the 
upcoming minor releases of supported versions.


The following is a bug report.


Your name  :MauMau
Your email address :maumau...@gmail.com


System Configuration:
-
 Architecture (example: Intel Pentium)  :
Intel Xeon

 Operating System (example: Linux 2.4.18) :
I found the bug on Windows 2008 R2 Enterprise x64 Edition, but it should 
apply to all versions of Windows.


 PostgreSQL version (example: PostgreSQL 9.1.1):  PostgreSQL 8.3.12
However, the problem should happen on all versions of PostgreSQL.

 Compiler used (example: gcc 3.3.5)  :
Visual Studio 2005 Professional Edition

Please enter a FULL description of your problem:

On a debug version of Windows, PostgreSQL server fails to start. When I run 
pg_ctl start on the command prompt, pg_ctl displays server starting, but 
the PostgreSQL server does not start.


No message is left which tells the cause. pg_ctl does not show any error 
messages on the command prompt. Even when I include 'eventlog' in 
log_destination parameter, no error message is recorded in the application 
log nor the system log of Windows event log.


This problem does not occur on a release version of Windows.


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
--
On a debug version of Windows, do the following:

1. Run initdb. initdb's option is not relevant.
2. Set logging_collector to on in postgresql.conf.
3. Run pg_ctl start on the command prompt.

The problem does not happen when logging_collector is off.


If you know how this problem might be fixed, list the solution below:
-
[Cause]
postmaster crashes in the following CloseHandle() call, which is at line 591 
in src/backend/postmaster/syslogger.c (in case of PostgreSQL 9.1.)


/* Now we are done with the write end of the pipe. */
CloseHandle(syslogPipe[1]);

The CloseHandle() crashes because it receives an already closed handle. 
According the following MSDN article, CloseHandle() causes a crash under a 
debugger. Though I'm not exactly sure, I guess running on the debug version 
of Windows is equivalent to running under a debugger.


http://msdn.microsoft.com/en-us/library/windows/desktop/ms724211(v=vs.85).aspx

[Excerpt]
If the application is running under a debugger, the function will throw an 
exception if it receives either a handle value that is not valid or a 
pseudo-handle value. This can happen if you close a handle twice, or if you 
call CloseHandle on a handle returned by the FindFirstFile function instead 
of calling the FindClose function.



Then, where is the handle closed? It was closed in close() at three lines 
above. The pipe handle is associated with a file descriptor by 
_open_osfhandle(). Calling close() for the file descriptor also closes the 
underlying handle. See the MSDN article:


http://msdn.microsoft.com/en-us/library/bdts1c9x(v=vs.80).aspx

[Excerpt]
To close a file opened with _open_osfhandle, call _close. The underlying 
handle is also closed by a call to _close, so it is not necessary to call 
the Win32 function CloseHandle on the original handle.



[Fix]
Remove the unnecessary CloseHandle() call. I sought similar extra closing by 
checking places where _open_osfhandle() or _get_osfhandle() is used, but 
there was nothing elsewhere.


I attached a patch against the master HEAD. Is it OK with a patch that has 
CR-LF line endings?


Regards


postmaster_crash_on_debug_windows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [v9.2] LEAKPROOF attribute of FUNCTION (Re: [HACKERS] [v9.2] Fix Leaky View Problem)

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 6:14 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The attached patch is additional regression tests of ALTER FUNCTION with
 LEAKPROOF based on your patch.
 It also moves create_function_3 into the group with create_aggregate and so 
 on.

Committed, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-15 Thread Heikki Linnakangas

On 13.02.2012 19:13, Fujii Masao wrote:

On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 13.02.2012 01:04, Jeff Janes wrote:


Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD.
  I have no idea if I did it correctly, in particular if calling
GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side
effects that make that a bad thing to do.  I'm not proposing it as the
real fix, I just wanted to get around this problem in order to do more
testing.



Thanks. That's basically the right approach. Attached patch contains a
cleaned up version of that.



It does get rid of the there is no contrecord flag errors, but
recover still does not work.

Now the count of tuples in the table is always correct (I never
provoke a crash during the initial table load), but sometimes updates
to those tuples that were reported to have been committed are lost.

This is more subtle, it does not happen on every crash.

It seems that when recovery ends on record with zero length at...,
that recovery is correct.

But when it ends on invalid magic number  in log file.. then the
recovery is screwed up.



Can you write a self-contained test case for that? I've been trying to
reproduce that by running the regression tests and pgbench with a streaming
replication standby, which should be pretty much the same as crash recovery.
No luck this far.


Probably I could reproduce the same problem as Jeff got. Here is the test case:

$ initdb -D data
$ pg_ctl -D data start
$ psql -c create table t (i int); insert into t
values(generate_series(1,1)); delete from t
$ pg_ctl -D data stop -m i
$ pg_ctl -D data start

The crash recovery emitted the following server logs:

LOG:  database system was interrupted; last known up at 2012-02-14 02:07:01 JST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  redo starts at 0/179CC90
LOG:  invalid magic number  in log file 0, segment 1, offset 8060928
LOG:  redo done at 0/17AD858
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

After recovery, I could not see the table t which I created before:

$ psql -c select count(*) from t
ERROR:  relation t does not exist


Are you still seeing this failure with the latest patch I posted 
(http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)? 
That includes Jeff's fix for the original crash you and Jeff saw. With 
that version, I can't get a crash anymore. I also can't reproduce the 
inconsistency that Jeff still saw with his fix 
(http://archives.postgresql.org/message-id/CAMkU=1zGWp2QnTjiyFe0VMu4gc+MoEexXYaVC2u=+orfiyj...@mail.gmail.com). 
Jeff, can you clarify if you're still seeing an issue with the latest 
version of the patch? If so, can you give a self-contained test case for 
that?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_test_fsync performance

2012-02-15 Thread Magnus Hagander
On Wed, Feb 15, 2012 at 16:14, Bruce Momjian br...@momjian.us wrote:
 On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:
 On Wed, Feb 15, 2012 at 02:23, Bruce Momjian br...@momjian.us wrote:
  On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
  On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
+1, I was about to suggest the same thing.  Running any of these 
tests
for a fixed number of iterations will result in drastic degradation 
of
accuracy as soon as the machine's behavior changes noticeably from 
what
you were expecting.  Run them for a fixed time period instead.  Or 
maybe
do a few, then check elapsed time and estimate a number of 
iterations to
use, if you're worried about the cost of doing gettimeofday after 
each
write.
  
Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.
  
   I was kind of wondering how portable alarm() is, and the answer
   according to the buildfarm is that it isn't.
 
  I'm using following simplistic alarm() implementation for win32:
 
    https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
 
  this works with fake sigaction()/SIGALARM hack below - to remember
  function to call.
 
  Good enough for simple stats printing, and avoids win32-specific
  code spreading around.
 
  Wow, I wasn't even aware this compiled in Win32;  I thought it was
  ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
  work.  (Me wonders if the GoGrid Windows images have compilers.)

 They don't, since most of the compilers people would ask for don't
 allow that kind of redistribution.

 Shame.

 Ping me on im if you need one preconfigured, though...

 How do you do that?  Also, once you create a Windows VM on a public
 cloud, how do you connect to it?  SSH?

rdesktop.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-02-15 Thread Robert Haas
On Tue, Feb 14, 2012 at 4:29 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 An ack about the way it's now implemented would be awesome
 I'm still missing that, which is only fair, just a ping from me here.

I took a brief look at this just now, and in general I'm pleasantly
surprised.  But, as you might imagine, I have some reservations:

1. I fear that the collection of commands to which this applies is
going to end up being kind of a random selection.  I suggest that for
the first version of the patch, just pick a couple of simple commands
like VACUUM and ANALYZE, and do just those.  We can add on more later
easily enough once the basic patch is committed.  Also, that way, if
you don't get to absolutely everything, we'll have a coherent subset
of the functionality, rather than a subset defined by what Dimitri
got to.

2. Currently, we have some objects (views) that support INSTEAD OF
triggers and others (tables) that support BEFORE and AFTER triggers.
I don't see any advantage in supporting both.

3. I am not at all convinced that it's safe to allow command triggers
to silently (i.e. without throwing an error) cancel the operation.  I
don't have very much confidence that that is in general a safe thing
to do; there may be code calling this code that expects that such
things will not happen.  This diff hunk, for example, scares the crap
out of me:

-   /* check that the locales can be loaded */
-   CommandCounterIncrement();
-   (void) pg_newlocale_from_collation(newoid);
+   /* before or instead of command trigger might have cancelled the comman
+   if (OidIsValid(newoid))
+   {
+   /* check that the locales can be loaded */
+   CommandCounterIncrement();
+   (void) pg_newlocale_from_collation(newoid);
+   }

I don't immediately understand why that's necessary, but I'm sure
there's a good reason, and I bet a nickel that there are other places
where similar adjustments are necessary but you haven't found them.  I
think we should rip out the option to silently cancel the command.  We
can always add that later, but if we add it here and in the first
commit I think we are going to be chasing bugs for months.

4. This API forces all the work of setting up the CommandContext to be
done regardless of whether any command triggers are present:

+   cmd-objectId = InvalidOid;
+   cmd-objectname = (char *)aggName;
+   cmd-schemaname = get_namespace_name(aggNamespace);

I'll grant you that most people probably do not execute enough DDL for
the cost of those extra get_namespace_name() calls to add up, but I'm
not completely sure it's a negligible overhead in general, and in any
case I think it's a safe bet that there will be continuing demand to
add more information to the set of things that are supplied to the
trigger.  So I think that should be rethought.  It'd be nice to have a
cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do
stuff ... }, emphasis on cheap.  There's a definitional problem here,
too, which is that you're supplying to the trigger the aggregate name
*as supplied by the user* and the schema name as it exists at the time
we do the reverse lookup.  That reverse lookup isn't guaranteed to
work at all, and it's definitely not guaranteed to produce an answer
that's consistent with the aggName field.  Maybe there's no better
way, but it would at least be better to pass the namespace OID rather
than the name.  That way, the name lookup can be deferred until we are
sure that we actually need to call something.

5. I'm not entirely convinced that it makes sense to support command
triggers on commands that affect shared objects.  It seems odd that
such triggers will fire or not fire depending on which database is
currently selected.  I think that could lead to user confusion, too.

6. Why do we need all this extra
copyfuncs/equalfuncs/outfuncs/readfuncs support?  I thought we were
dropping passing node trees for 9.2.

7. I don't have a strong feeling on what the psql command should be
called, but \dcT seems odd.  Why one lower-case letter and one
upper-case letter?

In general, I like the direction that this is going.  But I think it
will greatly improve its chances of being successful and relatively
non-buggy if we strip it down to something very simple for an initial
commit, and then add more stuff piece by piece.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Marti Raudsepp
On Mon, Feb 13, 2012 at 20:48, Greg Stark st...@mit.edu wrote:
 I don't think we should be looking at either CUDA or OpenCL directly.
 We should be looking for a generic library that can target either and
 is well maintained and actively developed.

I understand your point about using some external library for the
primitives, but I don't see why it needs to support both CUDA and
OpenCL. Libraries for GPU-accelerated primitives generally target
OpenCL *or* CUDA, not both.

As far as I understand (and someone correct me if I'm wrong), the
difference between them is mostly the API and the fact that CUDA had a
head start, and thus a larger developer community around it. (All the
early adopters went to CUDA)

But OpenCL already acts as an abstraction layer. CUDA is
NVIDIA-specific, but OpenCL is supported by AMD, Intel as well as
NVIDIA. It's pretty rare for servers to have separate graphics cards,
but recent Intel and AMD CPUs already have a GPU included on die,
which is another bonus for OpenCL.

So I'd say, the way things are heading, it's only a matter of time
before OpenCL takes over and there will be little reason to look back.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-15 Thread Fujii Masao
On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 13.02.2012 19:13, Fujii Masao wrote:

 On Mon, Feb 13, 2012 at 8:37 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com  wrote:

 On 13.02.2012 01:04, Jeff Janes wrote:


 Attached is my quick and dirty attempt to set XLP_FIRST_IS_CONTRECORD.
  I have no idea if I did it correctly, in particular if calling
 GetXLogBuffer(CurrPos) twice is OK or if GetXLogBuffer has side
 effects that make that a bad thing to do.  I'm not proposing it as the
 real fix, I just wanted to get around this problem in order to do more
 testing.



 Thanks. That's basically the right approach. Attached patch contains a
 cleaned up version of that.


 It does get rid of the there is no contrecord flag errors, but
 recover still does not work.

 Now the count of tuples in the table is always correct (I never
 provoke a crash during the initial table load), but sometimes updates
 to those tuples that were reported to have been committed are lost.

 This is more subtle, it does not happen on every crash.

 It seems that when recovery ends on record with zero length at...,
 that recovery is correct.

 But when it ends on invalid magic number  in log file.. then the
 recovery is screwed up.



 Can you write a self-contained test case for that? I've been trying to
 reproduce that by running the regression tests and pgbench with a
 streaming
 replication standby, which should be pretty much the same as crash
 recovery.
 No luck this far.


 Probably I could reproduce the same problem as Jeff got. Here is the test
 case:

 $ initdb -D data
 $ pg_ctl -D data start
 $ psql -c create table t (i int); insert into t
 values(generate_series(1,1)); delete from t
 $ pg_ctl -D data stop -m i
 $ pg_ctl -D data start

 The crash recovery emitted the following server logs:

 LOG:  database system was interrupted; last known up at 2012-02-14
 02:07:01 JST
 LOG:  database system was not properly shut down; automatic recovery in
 progress
 LOG:  redo starts at 0/179CC90
 LOG:  invalid magic number  in log file 0, segment 1, offset 8060928
 LOG:  redo done at 0/17AD858
 LOG:  database system is ready to accept connections
 LOG:  autovacuum launcher started

 After recovery, I could not see the table t which I created before:

 $ psql -c select count(*) from t
 ERROR:  relation t does not exist


 Are you still seeing this failure with the latest patch I posted
 (http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)?

Yes. Just to be safe, I again applied the latest patch to HEAD,
compiled that and tried
the same test. Then unfortunately I got the same failure again.

I ran the configure with '--enable-debug' '--enable-cassert'
'CPPFLAGS=-DWAL_DEBUG',
and make with -j 2 option.

When I ran the test with wal_debug = on, I got the following assertion failure.

LOG:  INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap -
insert: rel 1663/12277/16384; tid 0/197
STATEMENT:  create table t (i int); insert into t
values(generate_series(1,1)); delete from t
LOG:  INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap -
insert: rel 1663/12277/16384; tid 0/198
STATEMENT:  create table t (i int); insert into t
values(generate_series(1,1)); delete from t
TRAP: FailedAssertion(!(((bool) (((void*)((target-tid)) != ((void
*)0))  (((target-tid))-ip_posid != 0, File: heapam.c,
Line: 5578)
LOG:  xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0
LOG:  xlog bg flush request 0/17B4000; write 0/17B; flush 0/17B
LOG:  server process (PID 16806) was terminated by signal 6: Abort trap

This might be related to the original problem which Jeff and I saw.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bitfield and gcc

2012-02-15 Thread Robert Haas
On Mon, Feb 13, 2012 at 5:38 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Sat, Feb 11, 2012 at 01:54, Gaetano Mendola mend...@gmail.com wrote:
 I wonder if somewhere in Postgres source we are relying on the GCC
 correct behaviour regarding the read-modify-write of bitfield in
 structures.

 Probably not. I'm pretty sure that we don't have any bitfields, since
 not all compilers are happy with them. And it looks like this behavior
 doesn't affect other kinds of struct fields.

We do, actually: see spgist_private.h, itemid.h, regis.h, spell.h, and
ts_type.h.  And maybe some others.

I'm not aware, however, of any cases where we put a lock in the same
structure as a bitfield, so I think we might be OK in that regard.
But the bit about 64-bit spinlocks next to other stuff is a bit
alarming.  I continue to be astonished at the degree to which the gcc
developers seem not to care about the POLA.  Padding out all of our
spinlocks to 64 bits would not be free: it would cost us significantly
in memory usage, if nothing else.  I understand that it's not possible
to modify individual bits in a bitfield atomically, but generating a
64-bit-wide read-modify-write when the underlying base type is 4 bytes
or less is almost pure evil, IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade message

2012-02-15 Thread Peter Eisentraut
pg_upgrade prints something like this:

Restoring user relation files
  /var/lib/postgresql/8.4/main/base/35338/37229

But it's not actually restoring anything, is it?

Maybe transferring would be better?  (Or copying/linking, to be more
precise.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] REASSIGN OWNED lacks support for FDWs

2012-02-15 Thread Tom Lane
As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php
there is no switch case in shdepReassignOwned for foreign data wrappers.

The obvious short-term answer (and probably the only back-patchable one)
is to add a case for that object type.  But after all the refactoring
that's been done in the general area of this type of command, I'm a bit
surprised that shdepReassignOwned still looks like this.  Can't we merge
this knowledge into someplace where it doesn't have to be maintained
separately?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different gettext domain needed for error context

2012-02-15 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 To fix this, we need to somehow pass the caller's text domain to 
 errcontext(). The most straightforward way is to pass it as an extra 
 argument. Ideally, errcontext() would be a macro that passes TEXTDOMAIN 
 to the underlying function, so that you don't need to change all the 
 callers of errcontext():

 #define errcontext(...) errcontext_domain(TEXTDOMAIN, ...)

 But that doesn't work, because it would require varags macros. Anyone 
 know a trick to make that work?

This is pretty ugly, but AFAICS it should work:

#define errcontext set_errcontext_domain(TEXTDOMAIN), real_errcontext

But it might be better in the long run to bite the bullet and make
people change all their errcontext calls.  There aren't that many,
especially not outside the core code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] run GUC check hooks on RESET

2012-02-15 Thread Robert Haas
On Sat, Feb 11, 2012 at 1:36 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Kevin Grittner  wrote:
 Tom Lane wrote:

 I agree it's a bug that you can do what Kevin's example shows.

 I'll look at it and see if I can pull together a patch.

 Attached.

 Basically, if a GUC has a check function, this patch causes it to be
 run on a RESET just like it is on a SET, to make sure that the
 resulting value is valid to set within the context.  Some messages
 needed adjustment.  While I was there, I made cod a little more
 consistent among related GUCs.

 I also added a little to the regression tests to cover this.

This patch makes me a little nervous, because the existing behavior
seems to have been coded for quite deliberately.  Sadly, I don't see
any comments explaining why the RESET case was excluded originally.
On the other hand, I can't see what it would break, either.  Have you
gone through all the check hooks and verified that we're not violating
any of their assumptions?

I assume that you're thinking we'd only fix this in master?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml

2012-02-15 Thread Alvaro Herrera

Excerpts from Christoph Berg's message of mié feb 15 12:16:52 -0300 2012:
 Re: Robert Haas 2012-02-15 
 ca+tgmozjutszhpxwwzcm4kjjwh__1admo3kosbbq+fhscqp...@mail.gmail.com
  Fixed, but note that I had to recreate the patch by manual
  examination. Including it inline tends to garble things.
 
 Hmm, I thought I had :set paste and everything...

If you copied from a pager, those tend to expand tabs to spaces, so the
patch gets mangled at that point.  At least less does that.  OTOH if
you :r the patch file, it works fine.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] client performance v.s. server statistics

2012-02-15 Thread Amit Kapila
So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

 

The difference as compare to your embedded DB you are seeing is mainly seems
to be due to TCP.

One optimization you can use is to use Unix-domain socket mode of
PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf
and other related parameters. 

I am suggesting you this as earlier you were using embedded DB, so your
client/server should be on same machine. If now this is not the case then it
will not work.

 

Can you please clarify some more things like

1.  After doing sequence scan, do you need all the records in client for
which seq. scan is happening. If less records then why you have not created
index.

2.  What is exact scenario for fetching records

 

 

 

From: pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

 

Hi,

I am checking a performance problem encountered after porting old embeded DB
to postgreSQL. While the system is real-time sensitive, we are concerning
for per-query cost. In our environment sequential scanning (select * from
...) for a table with tens of thousands of record costs 1 - 2 seconds,
regardless of using ODBC driver or the timing result shown in psql client
(which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or
checking the statistics in pg_stat_statement view, the query costs only less
than 100ms.

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP? Has the
pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples
from shared buffers to result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

P.S. In our original embeded DB a fastpath interface is provided to read
directly from shared memory for the records, thus provides extremely
realtime access (of course sacrifice some other features such as
consistency).

Best regards,
Han



Re: [HACKERS] client performance v.s. server statistics

2012-02-15 Thread Amit Kapila
So I want to know what exactly the operations are involved in the server
side statistics in EXPLAIN ANALYZE

It gives the time for execution of Query on server. According to my
knowledge, it doesn't account for data to send over TCP.

 

From: Zhou Han [mailto:zhou...@gmail.com] 
Sent: Wednesday, February 15, 2012 12:32 PM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] client performance v.s. server statistics

 

Hi,

I have tried unix domain socket and the performance is similar with TCP
socket. It is MIPS architecture so memory copy to/from kernel can occupy
much time, and apparently using unit domain socket has no difference than
TCP in terms of memory copy. 

But it is still unbelievable for the ten-fold gap between the client side
statistic and the server side statistics. So I want to know what exactly the
operations are involved in the server side statistics in EXPLAIN ANALYZE.
May I check the code later on when I get time.

For the query itself, it was just for performance comparison. There are
other index based queries, which are of course much faster, but still result
in similar ten-fold of time gap between client side and server side
statistics.

I am thinking of non-kernel involved client interface, is there such an
option, or do I have to develop one from scratch?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila amit.kap...@huawei.com wrote:

So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

 

The difference as compare to your embedded DB you are seeing is mainly seems
to be due to TCP.

One optimization you can use is to use Unix-domain socket mode of
PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf
and other related parameters. 

I am suggesting you this as earlier you were using embedded DB, so your
client/server should be on same machine. If now this is not the case then it
will not work.

 

Can you please clarify some more things like

1.  After doing sequence scan, do you need all the records in client for
which seq. scan is happening. If less records then why you have not created
index.

2.  What is exact scenario for fetching records

 

 

 

pgsql-hackers-ow...@postgresql.org
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

 

Hi,

I am checking a performance problem encountered after porting old embeded DB
to postgreSQL. While the system is real-time sensitive, we are concerning
for per-query cost. In our environment sequential scanning (select * from
...) for a table with tens of thousands of record costs 1 - 2 seconds,
regardless of using ODBC driver or the timing result shown in psql client
(which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or
checking the statistics in pg_stat_statement view, the query costs only less
than 100ms.

rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or
EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to
result sets?

Could you experts share your views on this big gap? And any suggestions to
optimise?

P.S. In our original embeded DB a fastpath interface is provided to read
directly from shared memory for the records, thus provides extremely
realtime access (of course sacrifice some other features such as
consistency).

Best regards,
Han

 



Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-02-15 Thread Robert Haas
On Sun, Feb 5, 2012 at 4:09 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 The attached part-1 patch moves related routines from hooks.c to
 label.c because of references to static variables.

I have committed this part.  Seems like a better location for that code, anyhow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [trivial patch] typo in doc/src/sgml/sepgsql.sgml

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 1:45 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Christoph Berg's message of mié feb 15 12:16:52 -0300 2012:
 Re: Robert Haas 2012-02-15 
 ca+tgmozjutszhpxwwzcm4kjjwh__1admo3kosbbq+fhscqp...@mail.gmail.com
  Fixed, but note that I had to recreate the patch by manual
  examination. Including it inline tends to garble things.

 Hmm, I thought I had :set paste and everything...

 If you copied from a pager, those tend to expand tabs to spaces, so the
 patch gets mangled at that point.  At least less does that.  OTOH if
 you :r the patch file, it works fine.

It is also possible that my email client is to blame.  All I know for
sure is I had to regenerate the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure in AtCleanup_Portals

2012-02-15 Thread Tom Lane
Pavan Deolasee pavan.deola...@gmail.com writes:
 On Tue, Feb 7, 2012 at 3:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Hmm.  It works fine if you issue an actual ROLLBACK command there,
 so my gut reaction is that AbortOutOfAnyTransaction isn't sufficiently
 duplicating the full-fledged ROLLBACK code path.  No time to dig further
 right now though.

 It works OK for a ROLLBACK command because we create a new unnamed
 portal for the ROLLBACK command, silently dropping the old one if it
 already exists. Since the ROLLBACK command then runs successfully, we
 don't see the same assertion. Would it be safe to drop FAILED unnamed
 portals during AbortOutAnyTransaction ? May be it is if we can do that
 before creating a new portal for ROLLBACK command itself.

I poked at this some more, and noticed another dangerous-seeming issue:
at the time PortalCleanup is called, if it does get called, the portal's
stmts and sourceText are already pointing at garbage.  The reason is
that exec_simple_query blithely does this:

/*
 * We don't have to copy anything into the portal, because everything
 * we are passing here is in MessageContext, which will outlive the
 * portal anyway.
 */
PortalDefineQuery(portal,
  NULL,
  query_string,
  commandTag,
  plantree_list,
  NULL);

That comment is a true statement for the normal successful path of
control, wherein we reach the PortalDrop a few dozen lines below.
But it's not true if the command suffers an error.  We will mark
the portal PORTAL_FAILED right away (in one of various PG_CATCH
blocks) but we don't clean it up until another command arrives,
so the current contents of MessageContext are long gone when portal
cleanup happens.

It seems to me that the most reliable fix for these issues is to
institute the same policy for transitioning a portal to FAILED state
as we already have for transitions to DONE state, ie, we should run the
cleanup hook immediately.  IOW it would be a good idea to create a
function MarkPortalFailed that is just like MarkPortalDone except for
the specific portal state transition, and use that instead of merely
setting portal-status = PORTAL_FAILED in error cleanup situations.
This would ensure that the cleanup hook gets to run before we possibly
cut the portal's memory out from under it.  It's more than is probably
needed to resolve the specific crash you're reporting, but it seems
likely to forestall future issues.

I haven't actually tested such a fix yet, but will go do that now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade message

2012-02-15 Thread Bruce Momjian
On Wed, Feb 15, 2012 at 07:50:41PM +0200, Peter Eisentraut wrote:
 pg_upgrade prints something like this:
 
 Restoring user relation files
   /var/lib/postgresql/8.4/main/base/35338/37229
 
 But it's not actually restoring anything, is it?
 
 Maybe transferring would be better?  (Or copying/linking, to be more
 precise.)

What an excellent idea; I changed it to say link/copy, with the
attached, applied patch.  The new output is:

Creating databases in the new cluster   ok
Adding support functions to new cluster ok
Restoring database schema to new clusterok
Removing support functions from new cluster ok
-- Linking user relation files
ok
Setting next OID for new clusterok
Creating script to delete old cluster   ok

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/relfilenode.c b/contrib/pg_upgrade/relfilenode.c
new file mode 100644
index 54ee5f0..a1e30b1
*** a/contrib/pg_upgrade/relfilenode.c
--- b/contrib/pg_upgrade/relfilenode.c
*** transfer_all_new_dbs(DbInfoArr *old_db_a
*** 37,43 
  	int			old_dbnum, new_dbnum;
  	const char *msg = NULL;
  
! 	prep_status(Restoring user relation files\n);
  
  	/* Scan the old cluster databases and transfer their files */
  	for (old_dbnum = new_dbnum = 0;
--- 37,44 
  	int			old_dbnum, new_dbnum;
  	const char *msg = NULL;
  
! 	prep_status(%s user relation files\n,
! 		user_opts.transfer_mode == TRANSFER_MODE_LINK ? Linking : Copying);
  
  	/* Scan the old cluster databases and transfer their files */
  	for (old_dbnum = new_dbnum = 0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Speed up in-memory tuplesorting.

2012-02-15 Thread Tom Lane
Robert Haas rh...@postgresql.org writes:
 Speed up in-memory tuplesorting.

This patch appears to have broken those members of the buildfarm that
use VPATH builds.  I assume you didn't think about the path to
qsort_tuple.c very carefully.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Speed up in-memory tuplesorting.

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 2:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas rh...@postgresql.org writes:
 Speed up in-memory tuplesorting.

 This patch appears to have broken those members of the buildfarm that
 use VPATH builds.  I assume you didn't think about the path to
 qsort_tuple.c very carefully.

So it appears. I copied the rule that modifies errcodes.h, but clearly
that's not enough.   I think I might need to steal and adapt the
following logic from src/backend/catalog/Makefile:

# locations of headers that genbki.pl needs to read
pg_includes = -I$(top_srcdir)/src/include/catalog
-I$(top_builddir)/src/include/catalog

I'll go test that now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)

2012-02-15 Thread Heikki Linnakangas

On 15.02.2012 18:52, Fujii Masao wrote:

On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Are you still seeing this failure with the latest patch I posted
(http://archives.postgresql.org/message-id/4f38f5e5.8050...@enterprisedb.com)?


Yes. Just to be safe, I again applied the latest patch to HEAD,
compiled that and tried
the same test. Then unfortunately I got the same failure again.


Ok.


I ran the configure with '--enable-debug' '--enable-cassert'
'CPPFLAGS=-DWAL_DEBUG',
and make with -j 2 option.

When I ran the test with wal_debug = on, I got the following assertion failure.

LOG:  INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap -
insert: rel 1663/12277/16384; tid 0/197
STATEMENT:  create table t (i int); insert into t
values(generate_series(1,1)); delete from t
LOG:  INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap -
insert: rel 1663/12277/16384; tid 0/198
STATEMENT:  create table t (i int); insert into t
values(generate_series(1,1)); delete from t
TRAP: FailedAssertion(!(((bool) (((void*)((target-tid)) != ((void
*)0))  (((target-tid))-ip_posid != 0, File: heapam.c,
Line: 5578)
LOG:  xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0
LOG:  xlog bg flush request 0/17B4000; write 0/17B; flush 0/17B
LOG:  server process (PID 16806) was terminated by signal 6: Abort trap

This might be related to the original problem which Jeff and I saw.


That's strange. I made a fresh checkout, too, and applied the patch, but 
still can't reproduce. I used the attached script to test it.


It's surprising that the crash happens when the records are inserted, 
not at recovery. I don't see anything obviously wrong there, so could 
you please take a look around in gdb and see if you can get a clue 
what's going on? What's the stack trace?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


fujii-crash.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Gaetano Mendola

On 13/02/2012 19:48, Greg Stark wrote:

I don't think we should be looking at either CUDA or OpenCL directly.
We should be looking for a generic library that can target either and
is well maintained and actively developed. Any GPU code we write
ourselves would rapidly be overtaken by changes in the hardware and
innovations in parallel algorithms. If we find a library that provides
a sorting api and adapt our code to use it then we'll get the benefits
of any new hardware feature as the library adds support for them.



I think one option is to make the sort function pluggable with a shared
library/dll. I see several benefits from this:

 - It could be in the interest of the hardware vendor to provide the 
most powerful sort implementation (I'm sure for example that TBB sort 
implementation is faster that pg_sort)


 - It can permit people to play with it without being deep involved 
in pg development and stuffs.


 - It can relieve the postgres core group the choose about the right 
language/tool/implementation to use.


 - Also for people not willing (or not able for the matter) to upgrade
postgres engine to change instead the sort function upon an hardware
upgrade.


Of course if this happens postgres engine has to make some sort of
sanity check (that the function for example actually sorts) before to 
thrust the plugged sort.

The engine can even have multiple sort implementation available and
use the most proficient one (imagine some sorts acts better on
a certain range value or on certain element size).


Regards
Gaetano Mendola


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Gaetano Mendola

On 13/02/2012 19:48, Greg Stark wrote:

I don't think we should be looking at either CUDA or OpenCL directly.
We should be looking for a generic library that can target either and
is well maintained and actively developed. Any GPU code we write
ourselves would rapidly be overtaken by changes in the hardware and
innovations in parallel algorithms. If we find a library that provides
a sorting api and adapt our code to use it then we'll get the benefits
of any new hardware feature as the library adds support for them.



I think one option is to make the sort function plugable with a shared
library/dll. I see several benefits from this:

 - It could be in the interest of the hardware vendor to provide the 
most powerful sort implementation (I'm sure for example that TBB sort 
implementation is faster that pg_sort)


 - It can permit people to play with it without being deep involved 
in pg development and stuffs.


 - It can relieve the postgres core group the choose about the right 
language/tool/implementation to use.


 - Also for people not willing (or not able for the matter) to upgrade
postgres engine to change instead the sort function upon an hardware
upgrade.


Of course if this happens postgres engine has to make some sort of
sanity check (that the function for example actually sorts) before to 
thrust the plugged sort.

The engine can even have multiple sort implementation available and
use the most proficient one (imagine some sorts acts better on
a certain range value or on certain element size).


Regards
Gaetano Mendola


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-02-15 Thread Dimitri Fontaine
Hi,

Thanks for your time reviewing that patch!

Robert Haas robertmh...@gmail.com writes:
 I took a brief look at this just now, and in general I'm pleasantly
 surprised.  But, as you might imagine, I have some reservations:

Good starting point, let's see about the details :)

 1. I fear that the collection of commands to which this applies is
 going to end up being kind of a random selection.  I suggest that for
 the first version of the patch, just pick a couple of simple commands
 like VACUUM and ANALYZE, and do just those.  We can add on more later
 easily enough once the basic patch is committed.  Also, that way, if
 you don't get to absolutely everything, we'll have a coherent subset
 of the functionality, rather than a subset defined by what Dimitri
 got to.

I share that feeling here. Now, given the current scope of the patch, I
think we can add in 9.2 all the commands that make sense to support
(yes, including alter operator family). FWIW, I've been adding support
for 16 forms of ALTER commands today, triple (implementation of alter
rename, owner and namespace are separated).

So while your reaction is perfectly understandable, I don't think that's
the main thing here, you've just happened to see an intermediate state
of things.

 2. Currently, we have some objects (views) that support INSTEAD OF
 triggers and others (tables) that support BEFORE and AFTER triggers.
 I don't see any advantage in supporting both.

That's because you can cancel an INSERT from a BEFORE trigger, that's
how you can write an INSTEAD OF trigger on a TABLE.  As a VIEW does not
support INSERT (directly), an INSTEAD OF trigger is what makes sense
here.

I don't think it's possible to transpose that thinking to command
triggers, so I've been providing for either INSTEAD OF triggers or
BEFORE/AFTER triggers. Note that you can't have both with my current
patch. It's a convenience only thing, but I think it's worth having it:
you don't need to read the trigger procedure to decide if that BEFORE
command trigger is in fact an INSTEAD OF command trigger.

Also, the code footprint for this feature is very small.

 3. I am not at all convinced that it's safe to allow command triggers
 to silently (i.e. without throwing an error) cancel the operation.  I
 don't have very much confidence that that is in general a safe thing

I was trying to offer a parallel with returning NULL from a BEFORE
INSERT trigger, which allows you to cancel it. If that turns out to be
so bad an idea that we need to withdraw it, so be it.

It's still possible to cancel a command by means of RAISE EXCEPTION in a
BEFORE command trigger, or by means of an INSTEAD OF trigger that does
nothing.

 to do; there may be code calling this code that expects that such
 things will not happen.  This diff hunk, for example, scares the crap
 out of me:
[...]
 I don't immediately understand why that's necessary, but I'm sure
 there's a good reason, and I bet a nickel that there are other places
 where similar adjustments are necessary but you haven't found them.  I

Might be.  That idea was sound to me when under the illusion that I
wouldn't have to edit each and every command implementation in order to
implement command triggers.  I'm ok to step back now, but read on.

 think we should rip out the option to silently cancel the command.  We
 can always add that later, but if we add it here and in the first
 commit I think we are going to be chasing bugs for months.

Ok, I'm going to remove that from the BEFORE command implementation.

What about having both INSTEAD OF triggers (the command is not executed)
and BEFORE trigger (you can cancel its execution only by raising an
exception, and that cancels the whole transaction).  I'd like that we'd
be able to keep that feature.

 4. This API forces all the work of setting up the CommandContext to be
 done regardless of whether any command triggers are present:

 +   cmd-objectId = InvalidOid;
 +   cmd-objectname = (char *)aggName;
 +   cmd-schemaname = get_namespace_name(aggNamespace);

At some point while developing the patch I had something way smarter
than that, and centralized.  I've not revised the API to get back the
smartness when breaking out of the centralized implementation, because
those elements only can be set from the innards of each command.

The API to list which triggers to run already exists and only need the
command tag, which we might have earlier in the processing. Note that we
have to deal with commands acting on more than one object, as in
RemoveObjects() where we loop over a list and call triggers each time:

  https://github.com/dimitri/postgres/compare/master...command_triggers#diff-19

The best I can think of would be to build the list of triggers to call
as soon as we have the command tag, and skip preparing the rest of the
CommandContextData structure when that list comes empty.

Still no general stop-gap, but that would address your point I think.

Now, about INSTEAD OF command triggers, 

[HACKERS] Designing an extension for feature-space similarity search

2012-02-15 Thread Jay Levitt
[Preamble: I've been told that the hackers list is appropriate for 
extension-related topics like this, even if it's not about contributing to 
core. If I'm misappropriating, please let me know.]


Goal: Personalized, context-relevant query results

We are building a deeply personalized site; think OKCupid for product 
recommendations or Pinterest for people with your tastes. We use psych 
research to measure and predict your personality and traits along a number 
of scales (dimensions), and then we connect you with people, products and 
content we think you'll like.


I won't go into the design history, but you can read a little here:

http://parapoetica.wordpress.com/2012/02/15/feature-space-similarity-search-in-postgresql/

Suffice to say, this ends up needing something like KNN-GiST cubes, only:

- The overall concept is more like N-dimensional vectors than cubes
- But a dimension might be in any domain, not just floats
- All vectors have the same number of dimensions with the same meanings
- The distance along each dimension is a domain-specific function
- NULLs are allowed (the distance function will handle the semantics)
- The distance between two vectors is a function that aggregates the 
distances of each dimension, along with arbitrary other arguments - for 
instances, it might take the weighted average of the dimensions


That aggregation (which may not literally be an aggregate; I'm not sure yet) 
needs to happen in a SELECT list, which means it needs to be fast, which 
means all this (or at least much of it) has to be C.


The simplest thing that works is probably to hack up the cube extension, 
declare that everything (except inner pages) must be a zero-volume cube 
(cube_is_point()), map our non-float features onto floats somehow, and 
hard-code all the distance functions and the aggregation function.


But I think this sort of similarity-search engine has general utility, and I 
also want to make it easy for us to add and subtract dimensions without too 
much pain; that should be DDL, not code. So thinking about how this might 
evolve...


- I'm not sure how to represent arbitrary column-like features without 
reinventing the wheel and putting a database in the database.  hstore only 
stores text, probably for this reason; I took a look at the earlier json 
patch and saw that it handled only a few core data types.  Have there been 
any other PoCs that involved collections of hetereogenous data? I almost 
want an actual instance of an anyarray.


- Alternatively, is there a way to index an entire, arbitrary row, rather 
than on a column on that row? I'm fine with this extension requiring its own 
table, so I leave the data where it is in the row, and only worry about 
indexing it. I can't just use functional indexes, because I'll need to 
provide operators and support functions to GiST. Maybe I have a fake 
sentinel column, where all the operators use SPI to introspect the row, 
treat each column as a feature dimension, call the underlying operators on 
each column's data type, etc.


- Can domains have operators, or are operators defined on types?

- Does KNN-GiST run into problems when - returns values that don't make 
sense in the physical world? For instance, let's say NULL - NULL returns 
a distance of 1.0. That means that NULL1 - NULL2 = 1.0, and NULL2 - 
NULL3 = 1.0, but NULL1 - NULL3 = 1.0 as well. I think that's fine - that 
could even describe a triangle - but my spidey sense is tingling on this.


- Are there previous discussions, patches, abandoned projects, etc. that 
this reminds you of and that I should go research?


Thanks for any thoughts, and I'd love collaborators or even mentors - we 
plan to open source whatever we produce here, and I don't have quite the 
theoretical background it takes to do this properly.


Jay Levitt

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Speed up in-memory tuplesorting.

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 2:54 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 15, 2012 at 2:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas rh...@postgresql.org writes:
 Speed up in-memory tuplesorting.

 This patch appears to have broken those members of the buildfarm that
 use VPATH builds.  I assume you didn't think about the path to
 qsort_tuple.c very carefully.

 So it appears. I copied the rule that modifies errcodes.h, but clearly
 that's not enough.   I think I might need to steal and adapt the
 following logic from src/backend/catalog/Makefile:

 # locations of headers that genbki.pl needs to read
 pg_includes = -I$(top_srcdir)/src/include/catalog
 -I$(top_builddir)/src/include/catalog

 I'll go test that now.

Nope, that's not it.  Committed what seems to be the correct fix,
after testing it locally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 3:25 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 1. I fear that the collection of commands to which this applies is
 going to end up being kind of a random selection.  I suggest that for
 the first version of the patch, just pick a couple of simple commands
 like VACUUM and ANALYZE, and do just those.  We can add on more later
 easily enough once the basic patch is committed.  Also, that way, if
 you don't get to absolutely everything, we'll have a coherent subset
 of the functionality, rather than a subset defined by what Dimitri
 got to.

 I share that feeling here. Now, given the current scope of the patch, I
 think we can add in 9.2 all the commands that make sense to support
 (yes, including alter operator family). FWIW, I've been adding support
 for 16 forms of ALTER commands today, triple (implementation of alter
 rename, owner and namespace are separated).

 So while your reaction is perfectly understandable, I don't think that's
 the main thing here, you've just happened to see an intermediate state
 of things.

I'm just saying that nobody's realistically going to be able to verify
a patch of this size.  It's either going to get committed warts and
all, or it's going to not get committed.  Decomposing it into a series
of patches would make it possible to actually verify the logic.  I
guess on reflection I don't really care whether you decompose it at
this point; the parts are pretty independent and it's easy enough to
revert pieces of it.  But if I commit any of this it's certainly not
going to be the whole thing in one go.

 It's still possible to cancel a command by means of RAISE EXCEPTION in a
 BEFORE command trigger, or by means of an INSTEAD OF trigger that does
 nothing.

I think that there is no problem with cancelling a command via RAISE
EXCEPTION.  It's an established precedent that errors can be thrown
anywhere, and any code that doesn't deal with that is flat broken.
But I think letting either a BEFORE or INSTEAD trigger cancel the
command is going to break things, and shouldn't be allowed without a
lot of careful study.  So -1 from me on supporting INSTEAD triggers in
the first version of this.

 I'll grant you that most people probably do not execute enough DDL for
 the cost of those extra get_namespace_name() calls to add up, but I'm
 not completely sure it's a negligible overhead in general, and in any
 case I think it's a safe bet that there will be continuing demand to
 add more information to the set of things that are supplied to the
 trigger.  So I think that should be rethought.  It'd be nice to have a
 cheap way to say if (AnyCommandTriggersFor(command_tag)) { ... do
 stuff ... }, emphasis on cheap.  There's a definitional problem here,

 It's as cheap as scanning a catalog, as of now. I didn't install a new
 catalog cache for command triggers, as I don't think this code path is
 performance critical enough to pay back for the maintenance cost. Also
 consider that a big user of command triggers might have as much as a
 couple of triggers (BEFORE/AFTER) per command, that's about 300 of them.

Yowza.  A catalog scan is WAY more expensive than a syscache lookup.
I definitely don't think you can afford to have every command result
in an extra index probe into pg_cmdtrigger.  You definitely need some
kind of caching there.

Or at least, I think you do.  You could try pgbench -f foo.sql, where
foo.sql repeatedly creates and drops a function.  See if there's a
significant slowdown with your patch vs. HEAD.  If there is, you need
some caching.  You might actually need some whole new type of sinval
message to make this work efficiently.

 too, which is that you're supplying to the trigger the aggregate name
 *as supplied by the user* and the schema name as it exists at the time
 we do the reverse lookup.  That reverse lookup isn't guaranteed to
 work at all, and it's definitely not guaranteed to produce an answer
 that's consistent with the aggName field.  Maybe there's no better
 way, but it would at least be better to pass the namespace OID rather
 than the name.  That way, the name lookup can be deferred until we are
 sure that we actually need to call something.

 I'm confused here, because all error messages that needs to contain the
 namespace are doing exactly the same thing as I'm doing in my patch.

Hmm.  I wonder what happens if those errors fire after the schema has
been dropped?  I suppose the real answer here is probably to add
enough locking that that can't happen in the first place... so maybe
this isn't an issue for your patch to worry about.

 5. I'm not entirely convinced that it makes sense to support command
 triggers on commands that affect shared objects.  It seems odd that
 such triggers will fire or not fire depending on which database is
 currently selected.  I think that could lead to user confusion, too.

 You mean tablespace here, I guess, what else? I don't think I've added
 other shared objects in there yet. I share your 

Re: [HACKERS] Command Triggers

2012-02-15 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I'm just saying that nobody's realistically going to be able to verify
 a patch of this size.  It's either going to get committed warts and
 all, or it's going to not get committed.  Decomposing it into a series
 of patches would make it possible to actually verify the logic.  I
 guess on reflection I don't really care whether you decompose it at
 this point; the parts are pretty independent and it's easy enough to
 revert pieces of it.  But if I commit any of this it's certainly not
 going to be the whole thing in one go.

Ok, I can perfectly understand that.  The principled implementation is
not saving us here, we still need to review each call site.  The way I
read your comment, I continue working on my big patch and we'll see what
pieces get in, right?

 I think that there is no problem with cancelling a command via RAISE
 EXCEPTION.  It's an established precedent that errors can be thrown
 anywhere, and any code that doesn't deal with that is flat broken.

Sure.

 But I think letting either a BEFORE or INSTEAD trigger cancel the
 command is going to break things, and shouldn't be allowed without a
 lot of careful study.  So -1 from me on supporting INSTEAD triggers in
 the first version of this.

I'm sad about that, but I hear you.  Removing.

 Yowza.  A catalog scan is WAY more expensive than a syscache lookup.
 I definitely don't think you can afford to have every command result
 in an extra index probe into pg_cmdtrigger.  You definitely need some
 kind of caching there.

 Or at least, I think you do.  You could try pgbench -f foo.sql, where
 foo.sql repeatedly creates and drops a function.  See if there's a
 significant slowdown with your patch vs. HEAD.  If there is, you need
 some caching.  You might actually need some whole new type of sinval
 message to make this work efficiently.

Ok, I will test that down the road (before the end of this week).

 I'm confused here, because all error messages that needs to contain the
 namespace are doing exactly the same thing as I'm doing in my patch.

 Hmm.  I wonder what happens if those errors fire after the schema has
 been dropped?  I suppose the real answer here is probably to add
 enough locking that that can't happen in the first place... so maybe
 this isn't an issue for your patch to worry about.

I get it that I continue doing things this way, get_namespace_name() and
friends are trustworthy as far as I'm concerned.

 You mean tablespace here, I guess, what else? I don't think I've added
 other shared objects in there yet. I share your analyze btw, will
 remove support.

 And databases.

Right, on their way.

 something else, e.g. \dct would be your choice here?

 Yeah, probably.

Next version will sports that.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] run GUC check hooks on RESET

2012-02-15 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 11, 2012 at 1:36 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Kevin Grittner  wrote:
 Tom Lane wrote:

 I agree it's a bug that you can do what Kevin's example shows.

 I'll look at it and see if I can pull together a patch.

 Attached.

 Basically, if a GUC has a check function, this patch causes it to
 be run on a RESET just like it is on a SET, to make sure that the
 resulting value is valid to set within the context.  Some
 messages needed adjustment.  While I was there, I made cod a
 little more consistent among related GUCs.

 I also added a little to the regression tests to cover this.
 
 This patch makes me a little nervous, because the existing
 behavior seems to have been coded for quite deliberately.
 
It does, although I'm not clear *why* it was.  I suspect it may have
been based on an assumption that whatever value is in the reset_val
field had to have been already determined to be good, so it was a
waste of cycles to check it again -- without considering that the
validity of making a change might depend on context.
 
 Sadly, I don't see any comments explaining why the RESET case was
 excluded originally.
 
That is unfortunate.  I guess it points out the value of adding a
comment to point out why we would want to check these values even on
a reset to a previously-used value.
 
 On the other hand, I can't see what it would break, either.  Have
 you gone through all the check hooks and verified that we're not
 violating any of their assumptions?
 
I studied the code enough to be convinced that the patch as it
stands can't break a check hook which only validates the value
and/or changes the value to a canonical form.  There appear to be 34
check hooks, and I reviewed the 10 in the variable.c file, although
not at great depth.  I could set aside some time this weekend to
look at all of them, in depth, if you think that is warranted.  I do
think that a check hook would have to be doing something which is
probably more appropriate for an assign hook to cause trouble, but I
can't swear that that isn't happening without spending about a full
day in reviewing it.
 
 I assume that you're thinking we'd only fix this in master?
 
Without this, I don't think it's possible for someone to enforce
protection of their data through SSI in an ironclad way.  So there
is at least some case to be made to take it back as far as 9.1.  I
don't think it makes sense to take it further, because read-only was
broken in other ways before 9.1, and I'm not aware of specific
threats further back.  On the other hand, it is a change in behavior
with at least some chance to break code which is functioning as
intended, so it's a pretty marginal candidate for back-patching from
that point of view.  I don't think a decision either way on that
would be crazy.  Personally I would hope to see it included in a 9.1
patch, perhaps after some settling time on master, but totally
understand if the consensus is to just patch master.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] run GUC check hooks on RESET

2012-02-15 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 This patch makes me a little nervous, because the existing
 behavior seems to have been coded for quite deliberately.
 
 It does, although I'm not clear *why* it was.  I suspect it may have
 been based on an assumption that whatever value is in the reset_val
 field had to have been already determined to be good, so it was a
 waste of cycles to check it again -- without considering that the
 validity of making a change might depend on context.

Yes, I'm inclined to think the same, although obviously we need to
review the patch carefully.  The GUC code is a bit ticklish.

The main thing I would be worried about is whether you're sure that
you have separated the RESET-as-a-command case from the cases where
we actually are rolling back to a previous state.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] run GUC check hooks on RESET

2012-02-15 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 The main thing I would be worried about is whether you're sure
 that you have separated the RESET-as-a-command case from the cases
 where we actually are rolling back to a previous state.
 
I will double-check that, and make sure there is regression test
coverage of that case.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Peter Geoghegan
On 15 February 2012 20:00, Gaetano Mendola mend...@gmail.com wrote:
 On 13/02/2012 19:48, Greg Stark wrote:

 I don't think we should be looking at either CUDA or OpenCL directly.
 We should be looking for a generic library that can target either and
 is well maintained and actively developed. Any GPU code we write
 ourselves would rapidly be overtaken by changes in the hardware and
 innovations in parallel algorithms. If we find a library that provides
 a sorting api and adapt our code to use it then we'll get the benefits
 of any new hardware feature as the library adds support for them.


 I think one option is to make the sort function pluggable with a shared
 library/dll. I see several benefits from this:

  - It could be in the interest of the hardware vendor to provide the most
 powerful sort implementation (I'm sure for example that TBB sort
 implementation is faster that pg_sort)

  - It can permit people to play with it without being deep involved in pg
 development and stuffs.

Sorry, but I find it really hard to believe that the non-availability
of pluggable sorting is what's holding people back here. Some vanguard
needs to go and prove the idea by building a rough prototype before we
can even really comment on what an API should look like. For example,
I am given to understand that GPUs generally sort using radix sort -
resolving the impedance mismatch that prevents someone from using a
non-comparison based sort sure sounds like a lot of work for an
entirely speculative reward.

Someone who cannot understand tuplesort, which is not all that
complicated, has no business trying to build GPU sorting into
Postgres.

I had a patch committed a few hours ago that almost included the
capability of assigning an alternative sorting function, but only one
with the exact same signature as my variant of qsort_arg. pg_qsort
isn't used to sort tuples at all, by the way.

Threading building blocks is not going to form the basis of any novel
sorting implementation, because comparators in general are not thread
safe, and it isn't available on all the platforms we support, and
because of how longjmp interacts with C++ stack unwinding and so on
and so on. Now, you could introduce some kind of parallelism into
sorting integers and floats, but that's an awful lot of work for a
marginal reward.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Google Summer of Code? Call for mentors.

2012-02-15 Thread Josh Berkus
Hackers,

The call is now open for Google Summer of Code.

If you are interested in being a GSoC mentor this summer, please reply
to this email.  I want to gauge whether or not we should participate
this summer.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 4:32 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Ok, I can perfectly understand that.  The principled implementation is
 not saving us here, we still need to review each call site.  The way I
 read your comment, I continue working on my big patch and we'll see what
 pieces get in, right?

Yeah, I think that makes sense.  I'll have a look at your next version
when it shows up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] run GUC check hooks on RESET

2012-02-15 Thread Robert Haas
On Wed, Feb 15, 2012 at 4:47 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 That is unfortunate.  I guess it points out the value of adding a
 comment to point out why we would want to check these values even on
 a reset to a previously-used value.

+1 for such a comment.

 I assume that you're thinking we'd only fix this in master?

 Without this, I don't think it's possible for someone to enforce
 protection of their data through SSI in an ironclad way.  So there
 is at least some case to be made to take it back as far as 9.1.

I'm OK with that, but perhaps the only-tangentially-related changes
where you swap the order of certain error messages ought to be
separated out and committed only to master?  That stuff doesn't seem
like material for a back-patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Designing an extension for feature-space similarity search

2012-02-15 Thread Tom Lane
Jay Levitt jay.lev...@gmail.com writes:
 - I'm not sure how to represent arbitrary column-like features without 
 reinventing the wheel and putting a database in the database.

ISTM you could define a composite type and then create operators and an
operator class over that type.  If you were trying to make a btree
opclass there might be a conflict with the built-in record_ops opclass,
but since you're only interested in GIST I don't see any real
roadblocks.  The main potential disadvantage of this is that you'd have
the standard tuple header as overhead in index entries --- but maybe the
entries are large enough that that doesn't matter, and in any case you
could probably make use of the GIST compress method to get rid of most
of the header.  Maybe convert to MinimalTuple, for instance, if you want
to still be able to leverage existing support code for field extraction.

 - Can domains have operators, or are operators defined on types?

I think the current state of play is that you can have such things but
the system will only consider them for exact type matches, so you might
need more explicit casts than you ordinarily would.  However, we only
support domains over base types not composites, so this isn't really
going to be a profitable direction for you anyway.

 - Does KNN-GiST run into problems when - returns values that don't make 
 sense in the physical world?

Wouldn't surprise me.  In general, non-strict index operators are a bad
idea.  However, if the indexed entities are records, it would be
entirely your own business how you handled individual fields being NULL.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] run GUC check hooks on RESET

2012-02-15 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 15, 2012 at 4:47 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 That is unfortunate.  I guess it points out the value of adding a
 comment to point out why we would want to check these values even
 on a reset to a previously-used value.
 
 +1 for such a comment.
 
Will do.
 
 I assume that you're thinking we'd only fix this in master?

 Without this, I don't think it's possible for someone to enforce
 protection of their data through SSI in an ironclad way.  So
 there is at least some case to be made to take it back as far as
 9.1.
 
 I'm OK with that, but perhaps the only-tangentially-related
 changes where you swap the order of certain error messages ought
 to be separated out and committed only to master?  That stuff
 doesn't seem like material for a back-patch.
 
Agreed.  I'm not sure we want to change the message text at all in
9.1.  Translations and all that.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Notes about fixing regexes and UTF-8 (yet again)

2012-02-15 Thread Tom Lane
In bug #6457 it's pointed out that we *still* don't have full
functionality for locale-dependent regexp behavior with UTF8 encoding.
The reason is that there's old crufty code in regc_locale.c that only
considers character codes up to 255 when searching for characters that
should be considered letters, digits, etc.  We could fix that, for
some value of fix, by iterating up to perhaps 0x when dealing with
UTF8 encoding, but the time that would take is unappealing.  Especially
so considering that this code is executed afresh anytime we compile a
regex that requires locale knowledge.

I looked into the upstream Tcl code and observed that they deal with
this by having hard-wired tables of which Unicode code points are to be
considered letters etc.  The tables are directly traceable to the
Unicode standard (they provide a script to regenerate them from files
available from unicode.org).  Nonetheless, I do not find that approach
appealing, mainly because we'd be risking deviating from the libc locale
code's behavior within regexes when we follow it everywhere else.
It seems entirely likely to me that a particular locale setting might
consider only some of what Unicode says are letters to be letters.

However, we could possibly compromise by using Unicode-derived tables
as a guide to which code points are worth probing libc for.  That is,
assume that a utf8-based locale will never claim that some code is a
letter that unicode.org doesn't think is a letter.  That would cut the
number of required probes by a pretty large factor.

The other thing that seems worth doing is to install some caching.
We could presumably assume that the behavior of iswupper() et al are
fixed for the duration of a database session, so that we only need to
run the probe loop once when first asked to create a cvec for a
particular category.

Thoughts, better ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Gaetano Mendola

On 15/02/2012 23:11, Peter Geoghegan wrote:

On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com  wrote:

On 13/02/2012 19:48, Greg Stark wrote:


I don't think we should be looking at either CUDA or OpenCL directly.
We should be looking for a generic library that can target either and
is well maintained and actively developed. Any GPU code we write
ourselves would rapidly be overtaken by changes in the hardware and
innovations in parallel algorithms. If we find a library that provides
a sorting api and adapt our code to use it then we'll get the benefits
of any new hardware feature as the library adds support for them.



I think one option is to make the sort function pluggable with a shared
library/dll. I see several benefits from this:

  - It could be in the interest of the hardware vendor to provide the most
powerful sort implementation (I'm sure for example that TBB sort
implementation is faster that pg_sort)

  - It can permit people to play with it without being deep involved in pg
development and stuffs.


Sorry, but I find it really hard to believe that the non-availability
of pluggable sorting is what's holding people back here. Some vanguard
needs to go and prove the idea by building a rough prototype before we
can even really comment on what an API should look like. For example,
I am given to understand that GPUs generally sort using radix sort -
resolving the impedance mismatch that prevents someone from using a
non-comparison based sort sure sounds like a lot of work for an
entirely speculative reward.


AFAIK thrust library uses the radix sort if the keys you are sorting are 
POD data comparable with a  operator otherwise it does the

comparison based sort using the operator provided.

http://docs.thrust.googlecode.com/hg/modules.html

I'm not saying that the non-availability of pluggable sort completely
holds people back, I'm saying that it will simplify the process now
and int the future, of course that's my opinion.


Someone who cannot understand tuplesort, which is not all that
complicated, has no business trying to build GPU sorting into
Postgres.


That sounds a bit harsh. I'm one of those indeed, I haven't look in the 
details not having enough time for it. At work we do GPU computing (not

the sort type stuff) and given the fact I'm a Postgres enthusiast I
asked my self: my server is able to sort around 500 milions integer per
seconds, if postgres was able to do that as well it would be very nice.

What I have to say? Sorry for my thoughts.


I had a patch committed a few hours ago that almost included the
capability of assigning an alternative sorting function, but only one
with the exact same signature as my variant of qsort_arg. pg_qsort
isn't used to sort tuples at all, by the way.


Then I did look in the wrong direction. Thank you for point that out.


Threading building blocks is not going to form the basis of any novel
sorting implementation, because comparators in general are not thread
safe, and it isn't available on all the platforms we support, and
because of how longjmp interacts with C++ stack unwinding and so on
and so on. Now, you could introduce some kind of parallelism into
sorting integers and floats, but that's an awful lot of work for a
marginal reward.


The TBB was just example that did come in my mind.
What do you mean with you could introduce some kind of parallelism?
As far as I know any algorithm using the divide and conquer can be
parallelized.

Regards
Gaetano Mendola



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Gaetano Mendola

On 15/02/2012 23:11, Peter Geoghegan wrote:

On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com  wrote:

On 13/02/2012 19:48, Greg Stark wrote:


I don't think we should be looking at either CUDA or OpenCL directly.
We should be looking for a generic library that can target either and
is well maintained and actively developed. Any GPU code we write
ourselves would rapidly be overtaken by changes in the hardware and
innovations in parallel algorithms. If we find a library that provides
a sorting api and adapt our code to use it then we'll get the benefits
of any new hardware feature as the library adds support for them.



I think one option is to make the sort function pluggable with a shared
library/dll. I see several benefits from this:

  - It could be in the interest of the hardware vendor to provide the most
powerful sort implementation (I'm sure for example that TBB sort
implementation is faster that pg_sort)

  - It can permit people to play with it without being deep involved in pg
development and stuffs.


Sorry, but I find it really hard to believe that the non-availability
of pluggable sorting is what's holding people back here. Some vanguard
needs to go and prove the idea by building a rough prototype before we
can even really comment on what an API should look like. For example,
I am given to understand that GPUs generally sort using radix sort -
resolving the impedance mismatch that prevents someone from using a
non-comparison based sort sure sounds like a lot of work for an
entirely speculative reward.


AFAIK thrust library uses the radix sort if the keys you are sorting are 
POD data comparable with a  operator otherwise it does the

comparison based sort using the operator provided.

http://docs.thrust.googlecode.com/hg/modules.html

I'm not saying that the non-availability of pluggable sort completely
holds people back, I'm saying that it will simplify the process now
and int the future, of course that's my opinion.


Someone who cannot understand tuplesort, which is not all that
complicated, has no business trying to build GPU sorting into
Postgres.


That sounds a bit harsh. I'm one of those indeed, I haven't look in the 
details not having enough time for it. At work we do GPU computing (not

the sort type stuff) and given the fact I'm a Postgres enthusiast I
asked my self: my server is able to sort around 500 milions integer per
seconds, if postgres was able to do that as well it would be very nice.

What I have to say? Sorry for my thoughts.


I had a patch committed a few hours ago that almost included the
capability of assigning an alternative sorting function, but only one
with the exact same signature as my variant of qsort_arg. pg_qsort
isn't used to sort tuples at all, by the way.


Then I did look in the wrong direction. Thank you for point that out.


Threading building blocks is not going to form the basis of any novel
sorting implementation, because comparators in general are not thread
safe, and it isn't available on all the platforms we support, and
because of how longjmp interacts with C++ stack unwinding and so on
and so on. Now, you could introduce some kind of parallelism into
sorting integers and floats, but that's an awful lot of work for a
marginal reward.


The TBB was just example that did come in my mind.
What do you mean with you could introduce some kind of parallelism?
As far as I know any algorithm using the divide and conquer can be
parallelized.

Regards
Gaetano Mendola



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Progress on fast path sorting, btree index creation time

2012-02-15 Thread Peter Geoghegan
On 15 February 2012 15:27, Robert Haas robertmh...@gmail.com wrote:
 I am inclined to agree that given that we already use Perl to generate
 source code like this, it seems natural that we should prefer to do
 that, if only to avoid paranoia about the inclusion of a dial-a-bloat
 knob. I am at a disadvantage here, since I've never written a line of
 Perl.

 I think it's still dial-a-bloat, but I feel pretty comfortable about
 how we've got that knob adjusted in this version.  It's almost as much
 improvement as any previous version, it applies to more cases, and the
 code footprint is the least of any version I've measured.

I'm happy that the principle that a dial-a-bloat knob isn't
necessarily a bad thing has been accepted, though that term is kind of
pejorative in that it implies that the knob necessarily adds bloat to
the binary.

I define bloat here as the addition of dead instructions to the
binary, or at least code that doesn't pull its weight. Clearly, that
isn't the case here, and I suspect that we will find that it isn't the
case in other places too.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Peter Geoghegan
On 15 February 2012 22:54, Gaetano Mendola mend...@gmail.com wrote:
 That sounds a bit harsh. I'm one of those indeed, I haven't look in the
 details not having enough time for it. At work we do GPU computing (not
 the sort type stuff) and given the fact I'm a Postgres enthusiast I
 asked my self: my server is able to sort around 500 milions integer per
 seconds, if postgres was able to do that as well it would be very nice.

 What I have to say? Sorry for my thoughts.

I'm not trying to sound harsh.

The only reason that my patch *nearly* had support for this was
because the implementation that we nearly went with would have only
needed another couple of lines of code to support it. It very probably
wouldn't have turned out to have been useful for any novel sorting
idea, and was really only intended to be used to support user-defined
full sorting specialisations. That didn't end up making the cut.

My point is that whatever is holding back the development of a useful
prototype here, it definitely isn't the lack of an existing API. We
don't know what such an API should look like, and just how invasive it
needs to be. More importantly, it remains to be seen how useful this
idea is in the real world - we don't have so much as a synthetic test
case with a single client, as far as I'm aware.

I'd encourage the OP to share his work on github or something along those lines.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Dann Corbit
-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Gaetano Mendola
Sent: Wednesday, February 15, 2012 2:54 PM
To: Peter Geoghegan; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CUDA Sorting

On 15/02/2012 23:11, Peter Geoghegan wrote:
 On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com  wrote:
 On 13/02/2012 19:48, Greg Stark wrote:

 I don't think we should be looking at either CUDA or OpenCL directly.
 We should be looking for a generic library that can target either 
 and is well maintained and actively developed. Any GPU code we write 
 ourselves would rapidly be overtaken by changes in the hardware and 
 innovations in parallel algorithms. If we find a library that 
 provides a sorting api and adapt our code to use it then we'll get 
 the benefits of any new hardware feature as the library adds support for 
 them.


 I think one option is to make the sort function pluggable with a 
 shared library/dll. I see several benefits from this:

   - It could be in the interest of the hardware vendor to provide the 
 most powerful sort implementation (I'm sure for example that TBB sort 
 implementation is faster that pg_sort)

   - It can permit people to play with it without being deep 
 involved in pg development and stuffs.

 Sorry, but I find it really hard to believe that the non-availability 
 of pluggable sorting is what's holding people back here. Some vanguard 
 needs to go and prove the idea by building a rough prototype before we 
 can even really comment on what an API should look like. For example, 
 I am given to understand that GPUs generally sort using radix sort - 
 resolving the impedance mismatch that prevents someone from using a 
 non-comparison based sort sure sounds like a lot of work for an 
 entirely speculative reward.

AFAIK thrust library uses the radix sort if the keys you are sorting are POD 
data comparable with a  operator otherwise it does the comparison based sort 
using the operator provided.

http://docs.thrust.googlecode.com/hg/modules.html

I'm not saying that the non-availability of pluggable sort completely holds 
people back, I'm saying that it will simplify the process now and int the 
future, of course that's my opinion.

 Someone who cannot understand tuplesort, which is not all that 
 complicated, has no business trying to build GPU sorting into 
 Postgres.

That sounds a bit harsh. I'm one of those indeed, I haven't look in the details 
not having enough time for it. At work we do GPU computing (not the sort type 
stuff) and given the fact I'm a Postgres enthusiast I asked my self: my server 
is able to sort around 500 milions integer per seconds, if postgres was able to 
do that as well it would be very nice.

What I have to say? Sorry for my thoughts.

 I had a patch committed a few hours ago that almost included the 
 capability of assigning an alternative sorting function, but only one 
 with the exact same signature as my variant of qsort_arg. pg_qsort 
 isn't used to sort tuples at all, by the way.

Then I did look in the wrong direction. Thank you for point that out.

 Threading building blocks is not going to form the basis of any novel 
 sorting implementation, because comparators in general are not thread 
 safe, and it isn't available on all the platforms we support, and 
 because of how longjmp interacts with C++ stack unwinding and so on 
 and so on. Now, you could introduce some kind of parallelism into 
 sorting integers and floats, but that's an awful lot of work for a 
 marginal reward.

The TBB was just example that did come in my mind.
What do you mean with you could introduce some kind of parallelism?
As far as I know any algorithm using the divide and conquer can be parallelized.

Radix sorting can be used for any data type, if you create a callback that 
provides the most significant bits in width buckets.  At any rate, I can't 
imagine why anyone would want to complain about sorting 40 times faster than 
before, considering the amount of time database spend in ordering data.

I have a Cuda card in this machine (NVIDIA GeForce GTX 460) and I would not 
mind it a bit if my database ORDER BY clause suddenly started running ten 
times faster than before when I am dealing with a huge volume of data.

There have been other experiments along these lines such as:
GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - 
Carnegie Mellon University
www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf (This is for 
SQLite, but the grammar of SQLite is almost a pure subset of PostgreSQL, 
including things like vacuum...)
http://wiki.postgresql.org/images/6/65/Pgopencl.pdf
http://dl.acm.org/citation.cfm?id=1807207
http://www.scribd.com/doc/51484335/PostgreSQL-OpenCL-Procedural-Language-pgEast-March-2011

See also
http://highscalability.com/scaling-postgresql-using-cuda




-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

2012-02-15 Thread Kyotaro HORIGUCHI
Hello, sorry for long absense.

As far as I see, on an out-of-memory in getAnotherTuple() makes
conn-result-resultStatus = PGRES_FATAL_ERROR and
qpParseInputp[23]() skips succeeding 'D' messages consequently.

When exception raised within row processor, pg_conn-inCursor
always positioned in consistent and result-resultStatus ==
PGRES_TUPLES_OK.

The choices of the libpq user on that point are,

- Continue to read succeeding tuples.

  Call PQgetResult() to read 'D' messages and hand it to row
  processor succeedingly.

- Throw away the remaining results.

  Call pqClearAsyncResult() and pqSaveErrorResult(), then call
  PQgetResult() to skip over the succeeding 'D' messages. (Of
  course the user can't do that on current implement.)

To make the users able to select the second choice (I think this
is rather major), we should only provide and export the new PQ*
function to do that, I think.

void
PQskipRemainingResult(PGconn *conn)
{
  pqClearAsyncResult(conn);
  
  /* conn-result is always NULL here */
  pqSaveErrorResult(conn);

  /* Skip over remaining 'D' messages. * /
  PQgetResult(conn);
}

User may write code with this function.

...
PG_TRY();
{
  ...
  res = PQexec(conn, );
  ...
}
PG_CATCH();
{
  PQskipRemainingResult(conn);
  goto error;
}
PG_END_TRY();


Of cource, this is applicable to C++ client in the same manner.

try {
  ...
  res = PQexec(conn, );
  ...
} catch (const myexcep ex) {
  PQskipRemainingResult(conn);
  throw ex;
}
 


By the way, where should I insert this function ?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Google Summer of Code? Call for mentors.

2012-02-15 Thread Pavel Golub
Hello, Josh.

You wrote:

JB Hackers,

JB The call is now open for Google Summer of Code.

JB If you are interested in being a GSoC mentor this summer, please reply
JB to this email.  I want to gauge whether or not we should participate
JB this summer.

JB -- 
JB Josh Berkus
JB PostgreSQL Experts Inc.
JB http://pgexperts.com

What are the requirements for mentors?

-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers