Re: [HACKERS] Avoid endless futile table locks in vacuuming.

2015-12-28 Thread Tom Lane
I wrote:
> Jeff Janes  writes:
>> If a partially-active table develops a slug of stable all-visible,
>> non-empty pages at the end of it, then every autovacuum of that table
>> will skip the end pages on the forward scan, think they might be
>> truncatable, and take the access exclusive lock to do the truncation.
>> And then immediately fail when it sees the last page is not empty.
>> This can persist for an indefinite number of autovac rounds.
>> The simple solution is to always scan the last page of a table, so it
>> can be noticed that it is not empty and avoid the truncation attempt.

> This seems like a reasonable proposal, but I find your implementation
> unconvincing: there are two places in lazy_scan_heap() that pay attention
> to scan_all, and you touched only one of them.

After further investigation, there is another pre-existing bug: the short
circuit path for pages not requiring freezing doesn't bother to update
vacrelstats->nonempty_pages, causing the later logic to think that the
page is potentially truncatable even if we fix the second check of
scan_all!  So this is pretty broken, and I almost think we should treat it
as a back-patchable bug fix.

In the attached proposed patch, I added another refinement, which is to
not bother with forcing the last page to be scanned if we already know
that we're not going to attempt truncation, because we already found a
nonempty page too close to the end of the rel.  I'm not quite sure this
is worthwhile, but it takes very little added logic, and saving an I/O
is probably worth the trouble.

regards, tom lane

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 2429889..2833909 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*** static BufferAccessStrategy vac_strategy
*** 138,144 
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  			   Relation *Irel, int nindexes, bool scan_all);
  static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
! static bool lazy_check_needs_freeze(Buffer buf);
  static void lazy_vacuum_index(Relation indrel,
    IndexBulkDeleteResult **stats,
    LVRelStats *vacrelstats);
--- 138,144 
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  			   Relation *Irel, int nindexes, bool scan_all);
  static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
! static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
  static void lazy_vacuum_index(Relation indrel,
    IndexBulkDeleteResult **stats,
    LVRelStats *vacrelstats);
*** static void lazy_cleanup_index(Relation 
*** 147,152 
--- 147,153 
     LVRelStats *vacrelstats);
  static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
   int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+ static bool should_attempt_truncation(LVRelStats *vacrelstats);
  static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
  static BlockNumber count_nondeletable_pages(Relation onerel,
  		 LVRelStats *vacrelstats);
*** lazy_vacuum_rel(Relation onerel, int opt
*** 175,181 
  	LVRelStats *vacrelstats;
  	Relation   *Irel;
  	int			nindexes;
- 	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
  	TimestampTz starttime = 0;
  	long		secs;
--- 176,181 
*** lazy_vacuum_rel(Relation onerel, int opt
*** 263,276 
  
  	/*
  	 * Optionally truncate the relation.
- 	 *
- 	 * Don't even think about it unless we have a shot at releasing a goodly
- 	 * number of pages.  Otherwise, the time taken isn't worth it.
  	 */
! 	possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
! 	if (possibly_freeable > 0 &&
! 		(possibly_freeable >= REL_TRUNCATE_MINIMUM ||
! 		 possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))
  		lazy_truncate_heap(onerel, vacrelstats);
  
  	/* Vacuum the Free Space Map */
--- 263,270 
  
  	/*
  	 * Optionally truncate the relation.
  	 */
! 	if (should_attempt_truncation(vacrelstats))
  		lazy_truncate_heap(onerel, vacrelstats);
  
  	/* Vacuum the Free Space Map */
*** lazy_scan_heap(Relation onerel, LVRelSta
*** 498,503 
--- 492,504 
  	 * started skipping blocks, we may as well skip everything up to the next
  	 * not-all-visible block.
  	 *
+ 	 * We don't skip the last page, unless we've already determined that it's
+ 	 * not worth trying to truncate the table.  This avoids having every
+ 	 * vacuum take access exclusive lock on the table to attempt a truncation
+ 	 * that just fails immediately (if there are tuples in the last page).
+ 	 * This is worth avoiding mainly because such a lock must be replayed on
+ 	 * any hot standby, where it can be disruptive.
+ 	 *
  	 * Note: if scan_all is true, we won't actually skip any pages; but we
  	 * maintain next_not_all_visible_block 

Re: [HACKERS] pgbench stats per script & other stuff

2015-12-28 Thread Fabien COELHO


Hello Michaël,


And then I also had a look at src/port/snprintf.c, where things get
actually weird when no transactions are run for a script (emulated
with 2 scripts, one with @1 and the second with @1):
 - 0 transactions (0.0% of total, tps = 0.00)
 - latency average = -1.#IO ms
 - latency stddev = -1.#IO ms
And it seems that this is a bug in fmtfloat() because it does not
handle nan values correctly. Others, any thoughts about that?
It is possible to address things within your patch by using isnan()
and print another message but I think that we had better patch
snprintf.c if my analysis is right.


FWIW, I just had a closer look at this portion and I arrived at the
conclusion that sprintf implementation on Windows is just broken as it
is not able to handle appropriately inf or nan as exceptions.
fmtfloat@src/port/snprintf.c relies on the system's implementation of
sprintf to handle those exceptions, however even directly calling
sprintf results in the same weird output, inf showing up as "1.#IO"
and nan as "-1.#IO". Anyone, feel free to disagree if I am missing
something.


I have no opinion any about M$ implementation of double prettyprinting, 
but I agree that "-1.#IO" looks strange. WWW seems to say that "-1.INF" 
and "-1.IND" are the "normal" way for windows to say infinity or not a 
number. Well, if someone there thought it look good, I cannot help it.



Still, it would be cool to have better error message when there is no
value to show up to the user, like "no latency average" or "undefined
latency average". That would be more elegant, and the patches proposed
still lack that.


Hmmm. I do not buy that for several reasons:

For --progress style reporting you want NaN or whatever, because the 
output could be processed further unix-style from a pipe (grep/cut/...). 
This is also true for the final report. I would not want to change the 
output organisations for some special values, I would just like to get the 
value whatever it is, "NaN" or "Infinity" or even "-1.IND", so that the 
pipe commands would work.


Also, for the final report, it seems to me overkill to try to work around 
cases when pgbench does not run any transactions, which is basically not 
often, as the point is to run many transactions.


Finally this behavior already exists, the patch does not change anything 
AFAICS, and it is not its purpose.


So I would suggest to keep it that way.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix compiler warnings in Cube Extension

2015-12-28 Thread David Rowley
On 29 December 2015 at 07:38, Tom Lane  wrote:

> David Rowley  writes:
> > My compiler is complaining about cube_coord() and cube_coord_llur() not
> > returning a value on all code paths.
>
> Yeah, looking at that code, this would be expected from any compiler
> that doesn't know that ereport() doesn't return.  On it now.  The
> documentation aspect of that commit leaves much to be desired as well.
>

Great. Thanks for committing.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2015-12-28 Thread Aleksander Alekseev
Here is another preliminary result I would like to share. As always you
will find corresponding patch in attachment. It has work in progress
quality.

The idea was suggested by colleague of mine Aleksander Lebedev.

freeList is partitioned like in "no lock" patch. When there is no
enough free items in a freeList we borrow AN items from a global list.
When freeList become too large we return AN items back to global list.
This global list is also partitioned into PN partitions. Each partition
is protected by a spinlock. 

This way we have less lock contention than in "lwlock" or "spinlock
array" versions since we borrow multiple free elements, not one a time.
Also in worst case only AN*(NUM_LOCK_PARTITIONS-1) free items are not
used instead of (Total/NUM_LOCK_PARTITIONS)*(NUM_LOCK_PARTITIONS-1).

On 60-core server amount of TPS depends on AN and PN like this:

   ||||||
   | AN = 1 | AN = 2 | AN = 4 | AN = 8 | AN =16 | AN =32 
---||||||
   |  733.0 | 1120.6 | 1605.5 | 1842.5 | 1545.5 | 1237.0 
PN = 1 |  740.3 | 1127.0 | 1634.2 | 1800.8 | 1573.5 | 1245.1 
   |  742.9 | 1102.1 | 1647.2 | 1853.6 | 1533.4 | 1251.9 
---||||||
   | 1052.0 | 1438.1 | 1755.6 | 1981.0 | 2022.0 | 1816.8 
PN = 2 | 1044.8 | 1453.1 | 1784.0 | 1958.3 | 2033.2 | 1819.2 
   | 1028.7 | 1419.8 | 1809.2 | 1981.2 | 2028.2 | 1790.2 
---||||||
   | 1182.0 | 1521.5 | 1813.2 | 1932.6 | 2035.2 | 1948.4 
PN = 4 | 1212.4 | 1535.4 | 1816.8 | 1927.0 | 2018.7 | 2014.6 
   | 1189.4 | 1528.9 | 1816.9 | 1942.6 | 2011.9 | 2018.3 
---||||||
   | 1148.1 | 1522.2 | 1795.4 | 1926.6 | 2031.7 | 2015.6 
PN = 8 | 1175.6 | 1529.4 | 1807.6 | 1913.5 | 2007.3 | 2062.0 
   | 1169.9 | 1528.0 | 1796.3 | 1926.0 | 2011.1 | 2042.8 
---||||||
   | 1117.7 | 1491.0 | 1803.9 | 1925.3 | 2029.4 | 2056.2 
PN =16 | 1132.8 | 1481.0 | 1809.6 | 1968.1 | 2033.8 | 2068.5 
   | 1131.4 | 1481.8 | 1819.4 | 1946.2 | 2071.1 | 2073.8 

AN = GLOBAL_FREE_LIST_ALLOC_NUMBER
PN = GLOBAL_FREE_LIST_PARTITIONS_NUM

There is no performance degradation on Core i7. By increasing PN or AN
any further we don't gain any more TPS.

As you may see this version is about 30% faster than "lwlock" or
"spinlock array" and 3.1 times faster than master. Still it's about 2.5
slower than "no locks" version which I find frustrating.

Next I will try to speedup this version by modifying global_free_list_*
procedures. Current implementations are not most efficient ones. Also
I'm planning to explore approaches which involve lock free algorithms.

I would like to know your opinion about such approach. For instance
could we spare AN*(NUM_LOCK_PARTITIONS-1) items in a worst case or we
can't by same reason we can't do it for (Total / NUM_LOCK_PARTITIONS) *
(NUM_LOCK_PARTITIONS-1)?
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 78f15f0..91fcc05 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -265,7 +265,7 @@ InitShmemIndex(void)
  */
 HTAB *
 ShmemInitHash(const char *name, /* table string name for shmem index */
-			  long init_size,	/* initial table size */
+			  long init_size,	/* initial table size */ // AALEKSEEV: is ignored, refactor!
 			  long max_size,	/* max size of the table */
 			  HASHCTL *infoP,	/* info about key and bucket size */
 			  int hash_flags)	/* info about infoP */
@@ -299,7 +299,7 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
 	/* Pass location of hashtable header to hash_create */
 	infoP->hctl = (HASHHDR *) location;
 
-	return hash_create(name, init_size, infoP, hash_flags);
+	return hash_create(name, max_size, infoP, hash_flags);
 }
 
 /*
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index eacffc4..8375c3b 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -87,6 +87,8 @@
 #include "access/xact.h"
 #include "storage/shmem.h"
 #include "storage/spin.h"
+#include "storage/lock.h"
+#include "storage/lwlock.h"
 #include "utils/dynahash.h"
 #include "utils/memutils.h"
 
@@ -118,6 +120,13 @@ typedef HASHELEMENT *HASHBUCKET;
 /* A hash segment is an array of bucket headers */
 typedef HASHBUCKET *HASHSEGMENT;
 
+// AALEKSEEV: TODO: comment, should be power of two
+#define GLOBAL_FREE_LIST_PARTITIONS_NUM 16
+#define GLOBAL_FREE_LIST_PARTITIONS_MASK (GLOBAL_FREE_LIST_PARTITIONS_NUM-1)
+
+// AALEKSEEV: TODO: comment
+#define GLOBAL_FREE_LIST_ALLOC_NUMBER 16
+
 /*
  * Header structure for a hash table --- contains all changeable info
  *
@@ -129,11 +138,20 @@ typedef HASHBUCKET *HASHSEGMENT;
 struct HASHHDR
 {
 	/* In a partitioned table, take this lock to touch nentries or 

[HACKERS] Fix compiler warnings in Cube Extension

2015-12-28 Thread David Rowley
Hi,

My compiler is complaining about cube_coord() and cube_coord_llur() not
returning a value on all code paths.  On looking at this I noticed that
this is happening due the pattern used is not quite aligned with other
code, as normally we do:

if ()
   ereport();

do stuff;
PG_RETURN_(value);

I've rearranged the code to be aligned more with what's normal.  In passing
I also adding some missing and removed some unneeded parenthesis, and also
adjusted some white space.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


cube_warning_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread David Rowley
On 28 December 2015 at 23:23, Tomas Vondra 
wrote:

> On 12/28/2015 03:15 AM, David Rowley wrote:
>
>> Maybe it would be better to, once the filter is built, simply count the
>>
> number of 1 bits and only use the filter if there's less than
>>  1 bits compared to the size of the filter in bits. There's
>> functionality in bms_num_members() to do this, and there's
>> also __builtin_popcount() in newer version of GCC, which we could have
>> some wrapper around, perhaps.
>>
>
> I don't really see how this is relevant with the previous point. The
> number of 1s in the bitmap can tell you the false positive rate for the
> bloom filter, not what fraction of lookups will be answered with "true".
>
> So while this needs to be watched, so that we stop using the bloom filter
> when it gets too full (e.g. due to under-estimate), it's completely
> unrelated to the previous issue.


Why is it not related? this has got me confused. If a bloom filter has all
of the bits set to 1, then it will never filter any Tuples right?

If so, then a filter with all 1 bits set should be thrown away, as it'll
never help us, and the filter should generally become more worthwhile as it
contains a higher ratio of 0 bits vs 1 bits. Of course we don't have a
count of how many Tuples matched each bit, so this is based on the
assumption that each bit matches an equal number of Tuples. Are you saying
this is not an assumption that we should make?


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] request patch pg_recvlogical.c, pg_receivexlog.c for nls

2015-12-28 Thread Ioseph Kim
Hello,

at PostgreSQL version 9.5rc1,
at line 934 in pg_recvlogical.c 

set_pglocale_pgservice(argv[0],PG_TEXTDOMAIN("pg_recvlogical"));

this references invalid catalog file, because pg_basebackup,
pg_recvlogical, pg_receivexlog commands use same catalog file
pg_basebackup.mo

pg_receivexlog.c too.

patch please. 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch to fix misspellings of the word "segment"

2015-12-28 Thread David Rowley
I noticed this while reading nodeGather.c. I checked to find other
misspellings of the word and found one more, so the attached fixes both.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


segment_spelling_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2015-12-28 Thread Masahiko Sawada
On Mon, Dec 21, 2015 at 11:54 PM, Robert Haas  wrote:
> On Mon, Dec 21, 2015 at 3:27 AM, Kyotaro HORIGUCHI
>  wrote:
>> Hello,
>>
>> At Fri, 18 Dec 2015 12:09:43 -0500, Robert Haas  
>> wrote in 
>>> On Thu, Dec 17, 2015 at 1:17 AM, Michael Paquier
>>>  wrote:
>>> > I am not really getting the meaning of this sentence. Shouldn't this
>>> > be reworded something like:
>>> > "Freezing occurs on the whole table once all pages of this relation 
>>> > require it."
>>>
>>> That statement isn't remotely true, and I don't think this patch
>>> changes that.  Freezing occurs on the whole table once relfrozenxid is
>>> old enough that we think there might be at least one page in the table
>>> that requires it.
>>
>> I doubt I can explain this accurately, but I took the original
>> phrase as that if and only if all pages of the table are marked
>> as "requires freezing" by accident, all pages are frozen. It's
>> quite obvious but it is what I think "happen to require freezing"
>> means. Does this make sense?
>>
>> The phrase might not be necessary if this is correct.
>
> Maybe you are trying to say something like "only those pages which
> require freezing are frozen?".
>

I was thinking the same as what Horiguchi-san said.
That is, even if relfrozenxid is old enough, freezing on the whole
table is not required if the table are marked as "not requires
freezing".
In other word, only those pages which are marked as "not frozen" are frozen.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-28 Thread Shulgin, Oleksandr
On Thu, Dec 24, 2015 at 5:16 AM, Haribabu Kommi 
wrote:

> On Thu, Dec 24, 2015 at 2:37 AM, Tom Lane  wrote:
> > "Shulgin, Oleksandr"  writes:
> >> 1. Have you considered re-loading the HBA file upon call to this
> function
> >> in a local context instead of keeping it in the backends memory?
> >
> > Aside from the security questions, please consider that this feature
> should
> > work similarly to the current implementation of the pg_file_settings
> view,
> > namely it tells you about what is *currently* in the on-disk files, not
> > necessarily what is the active setting in the postmaster's memory.
> > A backend could not be entirely sure about the postmaster's state anyway;
> > and even if it could be, one of the major applications for features like
> > this is testing manual changes to the files before you SIGHUP the
> > postmaster.  So re-reading the files on each usage is a Good Thing, IMO,
> > even if it sounds inefficient.
> >
> >> 2. I also wonder why JSONB arrays for database/user instead of TEXT[]?
> >
> > Yes, that seems rather random to me too.
>
> Here I attached updated patch with the following changes,
> - Local loading of HBA file to show the authentication data
> - Changed database and user types are text[]
>

Still this requires a revert of the memory context handling commit for
load_hba() and load_ident().  I think you can get around the problem by
changing these functions to work with CurrentMemoryContext and set it
explicitly to the newly allocated PostmasterContext in
PerformAuthentication().  In your function you could then create a
temporary context to be discarded before leaving the function.

I still think you should not try to re-implement check_hba(), but extend
this function with means to report line skip reasons as per your
requirements.  Having an optional callback function might be a good fit (a
possible use case is logging the reasons line by line).

Thank you.
--
Alex


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2015-12-28 Thread Amit Kapila
On Thu, Dec 24, 2015 at 5:50 PM, Ildus Kurbangaliev <
i.kurbangal...@postgrespro.ru> wrote:
>
> On Tue, 15 Dec 2015 13:56:30 -0500
> Robert Haas  wrote:
>
> > On Sun, Dec 13, 2015 at 6:35 AM, and...@anarazel.de
> >  wrote:
> > > On 2015-12-12 21:15:52 -0500, Robert Haas wrote:
> > >> On Sat, Dec 12, 2015 at 1:17 PM, and...@anarazel.de
> > >>  wrote:
> > >> > Here's two patches doing that. The first is an adaption of your
> > >> > constants patch, using an enum and also converting xlog.c's
> > >> > locks. The second is the separation into distinct tranches.
> > >>
> > >> Personally, I prefer the #define approach to the enum, but I can
> > >> live with doing it this way.
> > >
> > > I think the lack needing to adjust the 'last defined' var is worth
> > > it...
> > >> Other than that, I think these patches look
> > >> good, although if it's OK with you I would like to make a pass over
> > >> the comments and the commit messages which seem to me that they
> > >> could benefit from a bit of editing (but not much substantive
> > >> change).
> > >
> > > Sounds good to me. You'll then commit that?
> >
> > Yes.  Done!
> >
> > In terms of this project overall, NumLWLocks() now knows about only
> > four categories of stuff: fixed lwlocks, backend locks (proc.c),
> > replication slot locks, and locks needed by extensions.  I think it'd
> > probably be fine to move the backend locks into PGPROC directly, and
> > the replication slot locks into ReplicationSlot.  I don't know if that
> > will improve performance but it doesn't seem like it should regress
> > anything, though we should probably test that.  I'm not sure what to
> > do about extension-requested locks - maybe give those their own
> > tranche somehow?
> >
> > I think we should also look at tranche-ifying the locks counted in
> > NUM_FIXED_LWLOCKS but not NUM_INDIVIDUAL_LWLOCKS.  That's basically
> > just the lock manager locks and the predicate lock manager locks.
> > That would get us to a place where every lock in the system has a
> > descriptive name, either via the tranche or because it's an
> > individually named lock, which sounds excellent.
> >
>
> There is a patch that moves backend LWLocks into PGPROC and to a
> separate tranche.
>

1.
@@ -437,6 +440,13 @@ InitProcessPhase2(void)
 {
  Assert(MyProc != NULL);

+ /* Register and initialize fields of ProcLWLockTranche */
+ ProcLWLockTranche.name = "proc";
+ ProcLWLockTranche.array_base = (char *) (ProcGlobal->allProcs) +
+ offsetof(PGPROC, backendLock);
+ ProcLWLockTranche.array_stride = sizeof(PGPROC);
+ LWLockRegisterTranche(LWTRANCHE_PROC, );
+

I think this will not work for Auxilary processes as they won't
call InitProcessPhase2().  It is better to initialize it in
InitProcGlobal() and then propagate it to backends for EXEC_BACKEND
cases as we do for ProcStructLock, AuxiliaryProcs.


2.
@@ -213,6 +213,7 @@ typedef enum BuiltinTrancheIds
  LWTRANCHE_WAL_INSERT,
  LWTRANCHE_BUFFER_CONTENT,
  LWTRANCHE_BUFFER_IO_IN_PROGRESS,
+ LWTRANCHE_PROC,
  LWTRANCHE_FIRST_USER_DEFINED
 } BuiltinTrancheIds;

Other trancheids are based on the name of their corresponding
LWLock, don't you think it is better to name it as
LWTRANCHE_BACKEND for the sake of consistency?  Also consider
changing name at other places in patch for this tranche.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread Tomas Vondra

On 12/28/2015 11:52 AM, David Rowley wrote:

On 28 December 2015 at 23:44, Tomas Vondra > wrote:

On 12/28/2015 11:38 AM, David Rowley wrote:

If so, then a filter with all 1 bits set should be thrown away, as

it'll never help us, and the filter should generally become more
worthwhile as it contains a higher ratio of 0 bits vs 1 bits. Of
course we don't have a count of how many Tuples matched each bit, so
this is based on the assumption that each bit matches an equal
number
of Tuples. Are you saying this is not an assumption that we should
make?


Sure we should check that. All I'm saying is it has nothing to do
with the first problem described in the first part of the e-mail.


Okay. I was merely suggesting this method as an alternative to checking
tracking and checking the usefulness of the filter during the hash
probe. I assumed that tracking and checking the usefulness during the
hash probe won't be free, and that it may be better if we can estimate
or determine the expected usefulness of the filter before the probe
stage, and throw it away before we waste cycles using it.


Consider this example:

CREATE TABLE t (id INT);
INSERT INTO t SELECT i FROM generate_series(1,100) s(i);
ANALYZE;

SELECT * FROM t AS t1 JOIN t AS t2 ON (t1.id = t2.id)
 WHERE t1.id < 1 AND t2.id < 1;

This will be executed like this:

  QUERY PLAN
---
 Hash Join  (cost=17046.26..34008.58 rows=94 width=8)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on t t1  (cost=0.00..16925.00 rows=9701 width=4)
 Filter: (id < 1)
   ->  Hash  (cost=16925.00..16925.00 rows=9701 width=4)
 ->  Seq Scan on t t2  (cost=0.00..16925.00 rows=9701 width=4)
   Filter: (id < 1)
(7 rows)

But of course the problem is that the two relations are (trivially) 
correlated, which means that in reality it works like this:


   QUERY PLAN
-
 Hash Join (actual rows= loops=1)
   Hash Cond: (t1.id = t2.id)
   ->  Seq Scan on t t1 (actual rows= loops=1)
 Filter: (id < 1)
 Rows Removed by Filter: 990001
   ->  Hash (actual rows= loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 480kB
 ->  Seq Scan on t t2 (actual rows= loops=1)
   Filter: (id < 1)
   Rows Removed by Filter: 990001
 Planning time: 0.316 ms
 Execution time: 176.283 ms
(12 rows)

So while we have very good estimates on the scan nodes, the final 
estimate is off - we expect about the bloom filter to eliminate ~99% of 
rows, in reality 100% of rows matches (due to the correlation). And 
that's even if the bloom filter is "perfect" in the sense that it has 
very low false probability etc.


This example illustrates that such cases can't be really solved before 
actually doing the lookups. Which does not make the checks you propose 
pointless, but they simply address different cases (and I indeed planned 
to implement them).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread Tomas Vondra

On 12/28/2015 11:38 AM, David Rowley wrote:

On 28 December 2015 at 23:23, Tomas Vondra > wrote:

On 12/28/2015 03:15 AM, David Rowley wrote:

Maybe it would be better to, once the filter is built, simply
count the

number of 1 bits and only use the filter if there's less than
 1 bits compared to the size of the filter in bits.
There's
functionality in bms_num_members() to do this, and there's
also __builtin_popcount() in newer version of GCC, which we
could have
some wrapper around, perhaps.


I don't really see how this is relevant with the previous point. The
number of 1s in the bitmap can tell you the false positive rate for
the bloom filter, not what fraction of lookups will be answered with
"true".

So while this needs to be watched, so that we stop using the bloom
filter when it gets too full (e.g. due to under-estimate), it's
completely unrelated to the previous issue.


Why is it not related? this has got me confused. If a bloom filter has
all of the bits set to 1, then it will never filter any Tuples right?


Because the false positive rate can be computed without having to look 
at the lookups. So it's inherently independent on the ordering of outer 
relation, and so on.



If so, then a filter with all 1 bits set should be thrown away, as
it'll never help us, and the filter should generally become more
worthwhile as it contains a higher ratio of 0 bits vs 1 bits. Of
course we don't have a count of how many Tuples matched each bit, so
this is based on the assumption that each bit matches an equal number
of Tuples. Are you saying this is not an assumption that we should
make?


Sure we should check that. All I'm saying is it has nothing to do with 
the first problem described in the first part of the e-mail.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread Tomas Vondra

On 12/28/2015 03:15 AM, David Rowley wrote:

On 18 December 2015 at 04:34, Tomas Vondra > wrote:

I think ultimately we'll need to measure the false positive rate, so
that we can use it to dynamically disable the bloom filter if it
gets inefficient. Also maybe put some of that into EXPLAIN ANALYZE.


I'm not so convinced that will be a good idea. What if the filter does
not help much to start with, we disable it because of that, then we get
some different probe values later in the scan which the bloom filter
would have helped to eliminate earlier.


Yes, that's true. This might happen quite easily for example when then 
outer table is sorted - in that case what we'll see are (possibly long) 
sequences of the same value, which might bias the decision quite easily.


I don't know how to fix this perfectly (if at all possible). But maybe 
we don't really need to do that. The overall goal is to improve the 
overall performance, and this check (disabling the bloom filter) is 
meant to limit the loss when the filter returns "true" for most values 
(making it somewhat pointless).


The costs however are asymmetric - once we build the hash table (and 
build the filter), the cost for dropping the bloom filter is constant, 
as we simply loose the time we spent building it. But the cost for 
continuing to use of inefficient filter is O(N) where N is the number of 
lookups (i.e. rows of outer table). So it's probably better to drop the 
filter a bit too early and sacrifice the small amount of time we spent 
building it, so that we have a change of not being much slower than the 
current implementation.


That is not to say we can't come up with better solutions - for example 
we can count (or estimate) the number of distinct hash values we've 
seen, and only do the decision once we see enough of them. For example 
we could use HyperLogLog to do that, or simply see the number of times a 
value changed (hash value not equal to the previous value). That should 
be better than simply waiting to X lookups, but I'm sure it's still 
possible to come up with counter-examples.




Maybe it would be better to, once the filter is built, simply count the
number of 1 bits and only use the filter if there's less than
 1 bits compared to the size of the filter in bits. There's
functionality in bms_num_members() to do this, and there's
also __builtin_popcount() in newer version of GCC, which we could have
some wrapper around, perhaps.


I don't really see how this is relevant with the previous point. The 
number of 1s in the bitmap can tell you the false positive rate for the 
bloom filter, not what fraction of lookups will be answered with "true".


So while this needs to be watched, so that we stop using the bloom 
filter when it gets too full (e.g. due to under-estimate), it's 
completely unrelated to the previous issue.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

2015-12-28 Thread David Rowley
On 28 December 2015 at 23:44, Tomas Vondra 
wrote:

> On 12/28/2015 11:38 AM, David Rowley wrote:
>
>> If so, then a filter with all 1 bits set should be thrown away, as
>>
> it'll never help us, and the filter should generally become more
>> worthwhile as it contains a higher ratio of 0 bits vs 1 bits. Of
>> course we don't have a count of how many Tuples matched each bit, so
>> this is based on the assumption that each bit matches an equal number
>> of Tuples. Are you saying this is not an assumption that we should
>> make?
>>
>
> Sure we should check that. All I'm saying is it has nothing to do with the
> first problem described in the first part of the e-mail.


Okay. I was merely suggesting this method as an alternative to checking
tracking and checking the usefulness of the filter during the hash probe. I
assumed that tracking and checking the usefulness during the hash probe
won't be free, and that it may be better if we can estimate or determine
the expected usefulness of the filter before the probe stage, and throw it
away before we waste cycles using it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Performance degradation in commit ac1d794

2015-12-28 Thread Amit Kapila
On Sat, Dec 26, 2015 at 5:41 PM, Andres Freund  wrote:

> On 2015-12-26 12:22:48 +0100, Andres Freund wrote:
> > > > 3) Replace the postmaster_alive_fds socketpair by some other
> signalling
> > > >mechanism. E.g. sending a procsignal to each backend, which sets
> the
> > > >latch and a special flag in the latch structure.
> > >
> > > And what would send the signal?  The entire point here is to notice the
> > > situation where the postmaster has crashed.  It can *not* depend on the
> > > postmaster taking some action.
> >
> > Ahem. Um. Look, over there --->
> >
> > I blame it on all the food.
>
> A unportable and easy version of this, actually making sense this time,
> would be to use prctl(PR_SET_PDEATHSIG, SIGQUIT). That'd send SIGQUIT to
> backends whenever postmaster dies.  Obviously that's not portable
> either - doing this for linux only wouldn't be all that kludgey tho.
>
>
There is a way to make backends exit in Windows as well by using
JobObjects and use limitFlags as JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE
for JOBOBJECT_BASIC_LIMIT_INFORMATION.




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] request patch pg_recvlogical.c, pg_receivexlog.c for nls

2015-12-28 Thread Alvaro Herrera
Michael Paquier wrote:
> On Mon, Dec 28, 2015 at 5:07 PM, Ioseph Kim  wrote:

> > at PostgreSQL version 9.5rc1,
> > at line 934 in pg_recvlogical.c
> >
> > set_pglocale_pgservice(argv[0],PG_TEXTDOMAIN("pg_recvlogical"));
> >
> > this references invalid catalog file, because pg_basebackup,
> > pg_recvlogical, pg_receivexlog commands use same catalog file
> > pg_basebackup.mo
> >
> > pg_receivexlog.c too.
> >
> > patch please.
> 
> Yes, you are right. Both values do not match CATALOG_NAME in nls.mk. I
> guess that something like the attached is fine then... The other
> binaries are fine.

Oops.  We should have noticed this a long ago, since pg_receivexlog was
introduced in 9.2 and pg_recvlogical in 9.4 ... judging by the
misalignment in the spanish translation for --help, I have never tested
the output of those programs.

Pushed and backpatched as appropriate.  Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] request patch pg_recvlogical.c, pg_receivexlog.c for nls

2015-12-28 Thread Michael Paquier
On Mon, Dec 28, 2015 at 5:07 PM, Ioseph Kim  wrote:
> Hello,
>
> at PostgreSQL version 9.5rc1,
> at line 934 in pg_recvlogical.c
>
> set_pglocale_pgservice(argv[0],PG_TEXTDOMAIN("pg_recvlogical"));
>
> this references invalid catalog file, because pg_basebackup,
> pg_recvlogical, pg_receivexlog commands use same catalog file
> pg_basebackup.mo
>
> pg_receivexlog.c too.
>
> patch please.

Yes, you are right. Both values do not match CATALOG_NAME in nls.mk. I
guess that something like the attached is fine then... The other
binaries are fine.
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index 0c322d1..c4a6062 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -383,7 +383,7 @@ main(int argc, char **argv)
 	char	   *db_name;
 
 	progname = get_progname(argv[0]);
-	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_receivexlog"));
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_basebackup"));
 
 	if (argc > 1)
 	{
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 93f61c3..b568a78 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -645,7 +645,7 @@ main(int argc, char **argv)
 	char	   *db_name;
 
 	progname = get_progname(argv[0]);
-	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_recvlogical"));
+	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_basebackup"));
 
 	if (argc > 1)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Backup and recovery of pg_statistic

2015-12-28 Thread Dmitry Ivanov
Here's the patch with an 'array_elemtype' procedure which returns array's 
element type as an oid. It should apply to the commit fc995b. I wasn't sure if 
I shoud start a new thread.


-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e08bf60..671c6d5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11695,6 +11695,9 @@ SELECT NULLIF(value, '(none)') ...
 array_dims
   
   
+array_elemtype
+  
+  
 array_fill
   
   
@@ -11794,6 +11797,17 @@ SELECT NULLIF(value, '(none)') ...

 
  
+  array_elemtype(anyarray)
+ 
+
+oid
+returns the element type of an array as oid
+array_elemtype(ARRAY[1,2,3])
+23
+   
+   
+
+ 
   array_fill(anyelement, int[],
   , int[])
  
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 72d308a..c2883e9 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -158,6 +158,18 @@ static int width_bucket_array_variable(Datum operand,
 			Oid collation,
 			TypeCacheEntry *typentry);
 
+/*
+ * array_elemtype :
+ *		returns the element type of the array
+ *		pointed to by "v" as an Oid.
+ */
+Datum
+array_elemtype(PG_FUNCTION_ARGS)
+{
+	AnyArrayType *v = PG_GETARG_ANY_ARRAY(0);
+	
+	return ObjectIdGetDatum(AARR_ELEMTYPE(v));
+}
 
 /*
  * array_in :
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d8640db..de55c01 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -892,6 +892,8 @@ DATA(insert OID = 2092 (  array_upper	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s
 DESCR("array upper dimension");
 DATA(insert OID = 2176 (  array_length	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 23 "2277 23" _null_ _null_ _null_ _null_ _null_ array_length _null_ _null_ _null_ ));
 DESCR("array length");
+DATA(insert OID = 3317 (  array_elemtype   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 26 "2277" _null_ _null_ _null_ _null_ _null_	array_elemtype _null_ _null_ _null_ ));
+DESCR("array element type");
 DATA(insert OID = 3179 (  cardinality	   PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 23 "2277" _null_ _null_ _null_ _null_ _null_ array_cardinality _null_ _null_ _null_ ));
 DESCR("array cardinality");
 DATA(insert OID = 378 (  array_append	   PGNSP PGUID 12 1 0 0 0 f f f f f f i s 2 0 2277 "2277 2283" _null_ _null_ _null_ _null_ _null_ array_append _null_ _null_ _null_ ));
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 716e756..22c0fc9 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -328,6 +328,7 @@ extern bool Array_nulls;
 /*
  * prototypes for functions defined in arrayfuncs.c
  */
+extern Datum array_elemtype(PG_FUNCTION_ARGS);
 extern Datum array_in(PG_FUNCTION_ARGS);
 extern Datum array_out(PG_FUNCTION_ARGS);
 extern Datum array_recv(PG_FUNCTION_ARGS);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade tests running calling psql without --no-psqlrc

2015-12-28 Thread Pavel Stehule
2015-12-28 14:21 GMT+01:00 Michael Paquier :

> Hi all,
>
> Since commit d5563d7d, -c does not imply any more --no-psqlrc hence
> the tests of pg_upgrade may run with a custom psqlrc file loaded. I
> think that we had better add -X to avoid that, like in the patch
> attached.
> Thoughts?
>

+1

Pavel


> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Patch: pg_trgm: gin index scan performance for similarity search

2015-12-28 Thread Fornaroli Christophe
On Fri, Dec 25, 2015 at 11:10 AM, Teodor Sigaev  wrote:

> Good catch, committed.
>

Thank you!


Re: [HACKERS] pam auth - add rhost item

2015-12-28 Thread Grzegorz Sampolski
Hi.
New patch available.

The new status of this patch is: Needs review


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pam auth - add rhost item

2015-12-28 Thread Grzegorz Sampolski
Hi.
I send new patch:
https://github.com/grzsmp/postgres/commit/3e3a1f187b71acef3f8dc0745da753fb5be821fa

On 12/27/2015 05:31 PM, Grzegorz Sampolski wrote:
> Hi there!
> I'm alive and working on new patch.
> So, I takes into account all suggestions from Tomas and I'll
> add additional parameter `usedns' with `yes/no' values to pass
> resolved hostname or ip address through rhost_item.
>
> On 12/24/2015 03:35 AM, Michael Paquier wrote:
>> On Wed, Dec 16, 2015 at 2:53 AM, Tomas Vondra
>>  wrote:
>>> Actually, one more thing - the patch should probably update the docs too,
>>> because client-auth.sgml currently says this in the "auth-pam" section:
>>>
>>>
>>> ...
>>> PAM is used only to validate user name/password pairs.
>>> ...
>>>
>>>
>>> I believe that's no longer true, because the patch adds PAM_RHOST to the
>>> user/password fields.
>>>
>>> Regarding the other PAM_* fields, none of them strikes me as very useful for
>>> our use case.
>>>
>>> In a broader sense, I think this patch is quite desirable, despite being
>>> rather simple (which is good). I certainly don't agree with suggestions that
>>> we can already do things like this through pg_hba.conf. If we're providing
>>> PAM authentication, let's make it as complete/useful as possible. In some
>>> cases modifying PAM may not be feasible - e.g. some management systems rely
>>> on PAM as much as possible, and doing changes in other ways is a major
>>> hassle.
>> There is no input from the author for more than 1 month, I have marked
>> the patch as returned with feedback because of a lack of activity.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade tests running calling psql without --no-psqlrc

2015-12-28 Thread Michael Paquier
Hi all,

Since commit d5563d7d, -c does not imply any more --no-psqlrc hence
the tests of pg_upgrade may run with a custom psqlrc file loaded. I
think that we had better add -X to avoid that, like in the patch
attached.
Thoughts?
-- 
Michael
diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index aa7f399..7d5a594 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -156,7 +156,7 @@ standard_initdb "$oldbindir"/initdb
 if "$MAKE" -C "$oldsrc" installcheck; then
 	pg_dumpall -f "$temp_root"/dump1.sql || pg_dumpall1_status=$?
 	if [ "$newsrc" != "$oldsrc" ]; then
-		oldpgversion=`psql -A -t -d regression -c "SHOW server_version_num"`
+		oldpgversion=`psql -X -A -t -d regression -c "SHOW server_version_num"`
 		fix_sql=""
 		case $oldpgversion in
 			804??)
@@ -169,7 +169,7 @@ if "$MAKE" -C "$oldsrc" installcheck; then
 fix_sql="UPDATE pg_proc SET probin = replace(probin, '$oldsrc', '$newsrc') WHERE probin LIKE '$oldsrc%';"
 ;;
 		esac
-		psql -d regression -c "$fix_sql;" || psql_fix_sql_status=$?
+		psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$?
 
 		mv "$temp_root"/dump1.sql "$temp_root"/dump1.sql.orig
 		sed "s;$oldsrc;$newsrc;g" "$temp_root"/dump1.sql.orig >"$temp_root"/dump1.sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5rc1 brin_summarize_new_values

2015-12-28 Thread Tom Lane
Alvaro Herrera  writes:
> This creates a new sub-section at the bottom of "9.26 System
> Administration Functions" named "Indexing Helper Functions", containing
> a table with a single row which is for this function.

Perhaps call it "Index Maintenance Functions"?

> We can bikeshed whether the new subsection should be at the bottom, or
> some other placement.  Maybe it should become 9.26.3, for example.

Perhaps right after Database Object Management Functions.  I'm not
feeling especially picky about that though; bottom would be OK.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-12-28 Thread Shulgin, Oleksandr
On Tue, Dec 22, 2015 at 10:45 AM, Pavel Stehule 
wrote:

> Hi
>
> 2015-12-21 16:11 GMT+01:00 Robert Haas :
>
>> On Sun, Dec 20, 2015 at 4:54 AM, Pavel Stehule 
>> wrote:
>> > new update:
>> >
>> > 1. unit searching is case insensitive
>> >
>> > 2. initial support for binary byte prefixes - KiB, MiB, ..  (IEC
>> standard),
>> > change behave for SI units
>> >
>> > Second point is much more complex then it is looking - if pg_size_bytes
>> > should be consistent with pg_size_pretty.
>> >
>> > The current pg_size_pretty and transformations in guc.c are based on
>> JEDEC
>> > standard. Using this standard for GUC has sense - using it for object
>> sizes
>> > is probably unhappy.
>> >
>> > I tried to fix (and enhance) pg_size_pretty - now reports correct
>> units, and
>> > via second parameter it allows to specify base: 2 (binary, IEC  -
>> default)
>> > or 10 (SI).
>>
>> -1 from me.  I don't think we should muck with the way pg_size_pretty
>> works.
>>
>
> new update - I reverted changes in pg_size_pretty
>

Hi,

I didn't check out earlier versions of this patch, but the latest one still
changes pg_size_pretty() to emit PB suffix.

I don't think it is worth it to throw a number of changes together like
that.  We should focus on adding pg_size_bytes() first and make it
compatible with both pg_size_pretty() and existing GUC units: that is
support suffixes up to TB and make sure they have the meaning of powers of
2^10, not 10^3.  Re-using the table present in guc.c would be a plus.

Next, we could think about adding handling of PB suffix on input and
output, but I don't see a big problem if that is emitted as 1024TB or the
user has to specify it as 1024TB in a GUC or argument to pg_size_bytes():
an minor inconvenience only.

--
Alex


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Joe Conway
On 12/28/2015 09:53 AM, Alvaro Herrera wrote:
>>> The name is just as misleading at the source-code level, maybe more so
>>> since they're all just numbers in C.  +1 for changing it everywhere
>>> before somebody makes a mistake based on the incorrect names.
>>
>> Ok, I'm on it now
> 
> Great, thanks.

I think the attached does the job. Note I settled on
(new|old)estCommitTsXid as that seemed most descriptive and not horribly
longer than before. It did mean, however, that I needed to add a single
space to all the output in pg_resetxlog and pg_controldata, which added
a fair amount to the patch size.

It is all fairly straightforward, but given the impending 9.5 release,
it'd be nice if someone has a chance to give this a quick review before
I commit.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 83cc9e8..5e210b9 100644
*** a/src/backend/access/rmgrdesc/xlogdesc.c
--- b/src/backend/access/rmgrdesc/xlogdesc.c
*** xlog_desc(StringInfo buf, XLogReaderStat
*** 59,66 
  		 checkpoint->oldestXidDB,
  		 checkpoint->oldestMulti,
  		 checkpoint->oldestMultiDB,
! 		 checkpoint->oldestCommitTs,
! 		 checkpoint->newestCommitTs,
  		 checkpoint->oldestActiveXid,
   (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online");
  	}
--- 59,66 
  		 checkpoint->oldestXidDB,
  		 checkpoint->oldestMulti,
  		 checkpoint->oldestMultiDB,
! 		 checkpoint->oldestCommitTsXid,
! 		 checkpoint->newestCommitTsXid,
  		 checkpoint->oldestActiveXid,
   (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online");
  	}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 8f09dc8..9b106db 100644
*** a/src/backend/access/transam/commit_ts.c
--- b/src/backend/access/transam/commit_ts.c
*** TransactionTreeSetCommitTsData(Transacti
*** 217,224 
  	commitTsShared->dataLastCommit.nodeid = nodeid;
  
  	/* and move forwards our endpoint, if needed */
! 	if (TransactionIdPrecedes(ShmemVariableCache->newestCommitTs, newestXact))
! 		ShmemVariableCache->newestCommitTs = newestXact;
  	LWLockRelease(CommitTsLock);
  }
  
--- 217,224 
  	commitTsShared->dataLastCommit.nodeid = nodeid;
  
  	/* and move forwards our endpoint, if needed */
! 	if (TransactionIdPrecedes(ShmemVariableCache->newestCommitTsXid, newestXact))
! 		ShmemVariableCache->newestCommitTsXid = newestXact;
  	LWLockRelease(CommitTsLock);
  }
  
*** TransactionIdGetCommitTsData(Transaction
*** 285,292 
  	int			entryno = TransactionIdToCTsEntry(xid);
  	int			slotno;
  	CommitTimestampEntry entry;
! 	TransactionId oldestCommitTs;
! 	TransactionId newestCommitTs;
  
  	/* error if the given Xid doesn't normally commit */
  	if (!TransactionIdIsNormal(xid))
--- 285,292 
  	int			entryno = TransactionIdToCTsEntry(xid);
  	int			slotno;
  	CommitTimestampEntry entry;
! 	TransactionId oldestCommitTsXid;
! 	TransactionId newestCommitTsXid;
  
  	/* error if the given Xid doesn't normally commit */
  	if (!TransactionIdIsNormal(xid))
*** TransactionIdGetCommitTsData(Transaction
*** 314,331 
  		return *ts != 0;
  	}
  
! 	oldestCommitTs = ShmemVariableCache->oldestCommitTs;
! 	newestCommitTs = ShmemVariableCache->newestCommitTs;
  	/* neither is invalid, or both are */
! 	Assert(TransactionIdIsValid(oldestCommitTs) == TransactionIdIsValid(newestCommitTs));
  	LWLockRelease(CommitTsLock);
  
  	/*
  	 * Return empty if the requested value is outside our valid range.
  	 */
! 	if (!TransactionIdIsValid(oldestCommitTs) ||
! 		TransactionIdPrecedes(xid, oldestCommitTs) ||
! 		TransactionIdPrecedes(newestCommitTs, xid))
  	{
  		*ts = 0;
  		if (nodeid)
--- 314,331 
  		return *ts != 0;
  	}
  
! 	oldestCommitTsXid = ShmemVariableCache->oldestCommitTsXid;
! 	newestCommitTsXid = ShmemVariableCache->newestCommitTsXid;
  	/* neither is invalid, or both are */
! 	Assert(TransactionIdIsValid(oldestCommitTsXid) == TransactionIdIsValid(newestCommitTsXid));
  	LWLockRelease(CommitTsLock);
  
  	/*
  	 * Return empty if the requested value is outside our valid range.
  	 */
! 	if (!TransactionIdIsValid(oldestCommitTsXid) ||
! 		TransactionIdPrecedes(xid, oldestCommitTsXid) ||
! 		TransactionIdPrecedes(newestCommitTsXid, xid))
  	{
  		*ts = 0;
  		if (nodeid)
*** ActivateCommitTs(void)
*** 655,668 
  	 * enabled again?  It doesn't look like it does, because there should be a
  	 * checkpoint that sets the value to InvalidTransactionId at end of
  	 * recovery; and so any chance of injecting new transactions without
! 	 * CommitTs values would occur after the oldestCommitTs has been set to
  	 * Invalid temporarily.
  	 */
  	LWLockAcquire(CommitTsLock, LW_EXCLUSIVE);
! 	if (ShmemVariableCache->oldestCommitTs == 

Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Tom Lane
Joe Conway  writes:
> I think the attached does the job. Note I settled on
> (new|old)estCommitTsXid as that seemed most descriptive and not horribly
> longer than before. It did mean, however, that I needed to add a single
> space to all the output in pg_resetxlog and pg_controldata, which added
> a fair amount to the patch size.

Hm.  That will break all the translations for those messages, no?
Not sure it's worth it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Alvaro Herrera
Joe Conway wrote:
> On 12/28/2015 09:03 AM, Tom Lane wrote:
> > Joe Conway  writes:
> >> Ok, but now next question -- should we just change the user visible
> >> output to oldestCommitXID/newestCommitXID, or should we change the
> >> variable name everywhere it appears in source as well? Looks like each
> >> one appears about 25-30 times scattered across 9 or 10 files. Since they
> >> are new in 9.5, if we're going to change them, I'd think we ought to do
> >> it now or never.
> > 
> > The name is just as misleading at the source-code level, maybe more so
> > since they're all just numbers in C.  +1 for changing it everywhere
> > before somebody makes a mistake based on the incorrect names.
> 
> Ok, I'm on it now

Great, thanks.



-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5rc1 brin_summarize_new_values

2015-12-28 Thread Alvaro Herrera
Jeff Janes wrote:

> Another issue is:  it seems to have no SGML documentation.  Is that OK?

Here's a patch (which I had to write afresh, because I couldn't find
anything in my brin development tree.  Maybe I was just remembering
something I planned to do and never got around to.)

This creates a new sub-section at the bottom of "9.26 System
Administration Functions" named "Indexing Helper Functions", containing
a table with a single row which is for this function.  Also, in the BRIN
chapter it creates a subsection "62.1.1. Index Maintenance" which has
one paragraph mentioning that pages that aren't already summarized are
only processed by VACUUM or this function.

I thought about moving the last paragraph of the introduction (which
talks about pages_per_range) to the new subsection.  It's clearly of a
different spirit that the preceding paragraphs, but then that parameter
is not really "maintenance" of the index.  Perhaps I should name the
subsection something like "Operation and Maintenance" instead, and then
those two paragraphs fit in there.

Other opinions?


Regarding 9.26, this is its TOC:

9.26. System Administration Functions

9.26.1. Configuration Settings Functions
9.26.2. Server Signaling Functions
9.26.3. Backup Control Functions
9.26.4. Recovery Control Functions
9.26.5. Snapshot Synchronization Functions
9.26.6. Replication Functions
9.26.7. Database Object Management Functions
9.26.8. Generic File Access Functions
9.26.9. Advisory Lock Functions
9.26.10. Indexing Helper Functions

We can bikeshed whether the new subsection should be at the bottom, or
some other placement.  Maybe it should become 9.26.3, for example.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 2202b7a..c1b6e1a 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -58,6 +58,24 @@
   store more index entries), but at the same time the summary data stored can
   be more precise and more data blocks can be skipped during an index scan.
  
+
+ 
+  Index Maintenance
+
+  
+   At the time of creation, all existing index pages are scanned and a
+   summary is created for each range, including the possibly-incomplete
+   range at the end.  As new pages are filled with data, page ranges that
+   are already summarized will cause the summary information to be updated
+   with the new tuples.  When a new page is created that does not fall
+   into the last summarized range, that range does not automatically
+   acquire a summary tuple; those insertions remain unsummarized until
+   a summarization run is invoked later, which creates initial summaries
+   for all unsummarized ranges.  This process can be invoked manually
+   by the brin_summarize_new_pages(regclass) function,
+   or automatically when VACUUM processes the table.
+  
+ 
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 81c1d3f..577d3bd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18281,6 +18281,48 @@ SELECT (pg_stat_file('filename')).modification;
 
   
 
+  
+   Indexing Helper Functions
+
+   
+brin_summarize_new_values
+   
+
+   
+ shows the functions
+available for index maintenance tasks.
+   
+
+   
+Indexing Helper Functions
+
+ 
+  Name Return Type Description
+ 
+
+ 
+  
+   
+brin_summarize_new_values(index_oid)
+   
+   integer
+   summarize page ranges not already summarized
+  
+ 
+
+   
+
+   
+brin_summarize_new_values receives a BRIN index OID as
+argument; it scans the table that the index is for, looking for pages
+that are not currently summarized.  Then the data in those pages is
+scanned and a new summary index tuple is constructed and inserted into
+the index.  It returns the number of new page range summaries that were
+inserted into the index.
+   
+
+  
+
   
 
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade tests running calling psql without --no-psqlrc

2015-12-28 Thread Tom Lane
Michael Paquier  writes:
> Since commit d5563d7d, -c does not imply any more --no-psqlrc hence
> the tests of pg_upgrade may run with a custom psqlrc file loaded. I
> think that we had better add -X to avoid that, like in the patch
> attached.

Yeah.  "grep" found a few more, but I think I got them all now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Joe Conway
On 12/26/2015 06:32 PM, Michael Paquier wrote:
> On Sun, Dec 27, 2015 at 10:08 AM, Joe Conway  wrote:
>> In looking at the exposing pg_controldata as function patch again, it
>> struck me that the following output seems wrong:
>>
>> --
>> Latest checkpoint's oldestCommitTs:   20257
>> Latest checkpoint's newestCommitTs:   84159
>> --
>>
>> Those numbers are XIDs, not timestamps. Shouldn't we either emit the
>> actual timestamps, or else rename those to oldest/newestCommitXID?
> 
> I recall from the commit_ts thread that Alvaro had some real need to
> make those fields XIDs and not timestamps, so +1 for renaming that as
> suggested.

Ok, but now next question -- should we just change the user visible
output to oldestCommitXID/newestCommitXID, or should we change the
variable name everywhere it appears in source as well? Looks like each
one appears about 25-30 times scattered across 9 or 10 files. Since they
are new in 9.5, if we're going to change them, I'd think we ought to do
it now or never.

If there is consensus to make the change either way (output-only or
globally), I'll come up with a patch ASAP.

Opinions?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Tom Lane
Joe Conway  writes:
> Ok, but now next question -- should we just change the user visible
> output to oldestCommitXID/newestCommitXID, or should we change the
> variable name everywhere it appears in source as well? Looks like each
> one appears about 25-30 times scattered across 9 or 10 files. Since they
> are new in 9.5, if we're going to change them, I'd think we ought to do
> it now or never.

The name is just as misleading at the source-code level, maybe more so
since they're all just numbers in C.  +1 for changing it everywhere
before somebody makes a mistake based on the incorrect names.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Joe Conway
On 12/28/2015 09:03 AM, Tom Lane wrote:
> Joe Conway  writes:
>> Ok, but now next question -- should we just change the user visible
>> output to oldestCommitXID/newestCommitXID, or should we change the
>> variable name everywhere it appears in source as well? Looks like each
>> one appears about 25-30 times scattered across 9 or 10 files. Since they
>> are new in 9.5, if we're going to change them, I'd think we ought to do
>> it now or never.
> 
> The name is just as misleading at the source-code level, maybe more so
> since they're all just numbers in C.  +1 for changing it everywhere
> before somebody makes a mistake based on the incorrect names.

Ok, I'm on it now

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pgbench stats per script & other stuff

2015-12-28 Thread Michael Paquier
On Wed, Dec 16, 2015 at 4:09 PM, Michael Paquier
 wrote:
> Yeah, that's actually fine. I just had a look at Windows stuff, and
> things seem to be correct on this side for double:
> https://msdn.microsoft.com/en-us/library/aa691373%28v=vs.71%29.aspx
> And then I also had a look at src/port/snprintf.c, where things get
> actually weird when no transactions are run for a script (emulated
> with 2 scripts, one with @1 and the second with @1):
>  - 0 transactions (0.0% of total, tps = 0.00)
>  - latency average = -1.#IO ms
>  - latency stddev = -1.#IO ms
> And it seems that this is a bug in fmtfloat() because it does not
> handle nan values correctly. Others, any thoughts about that?
> It is possible to address things within your patch by using isnan()
> and print another message but I think that we had better patch
> snprintf.c if my analysis is right.

FWIW, I just had a closer look at this portion and I arrived at the
conclusion that sprintf implementation on Windows is just broken as it
is not able to handle appropriately inf or nan as exceptions.
fmtfloat@src/port/snprintf.c relies on the system's implementation of
sprintf to handle those exceptions, however even directly calling
sprintf results in the same weird output, inf showing up as "1.#IO"
and nan as "-1.#IO". Anyone, feel free to disagree if I am missing
something.

Still, it would be cool to have better error message when there is no
value to show up to the user, like "no latency average" or "undefined
latency average". That would be more elegant, and the patches proposed
still lack that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Some 9.5beta2 backend processes not terminating properly?

2015-12-28 Thread Shay Rojansky
After setting up 9.5beta2 on the Npgsql build server and running the Npgsql
test suite against I've noticed some weird behavior.

The tests run for a couple minutes, open and close some connection. With my
pre-9.5 backends, the moment the test runner exits I can see that all
backend processes exit immediately, and pg_activity_stat has no rows
(except the querying one). With 9.5beta2, however, some backend processes
continue to stay alive beyond the test runner, and pg_activity_stat
contains extra rows (state idle, waiting false). This situation persists
until I restart PostgreSQL.

This happens consistently on two machines, running Windows 7 and Windows
10. Both client and server are on the same machine and use TCP to
communicate. I can investigate further and try to produce a more isolated
repro but I thought I'd talk to you guys first.

Any thoughts or ideas on what might cause this? Any suggestions for
tracking this down?

Shay


Re: [HACKERS] missing "SPI_finish" that isn't missing

2015-12-28 Thread Chapman Flack
On 12/24/15 16:37, Tom Lane wrote:

> to make this coding pattern work is to set up a subtransaction, and either
> commit it in the success path or roll it back upon catching an error.
> (This is not terribly well documented, but the exception-block handling
> in plpgsql provides a working model to follow.)

Ok, I've just had a look at that in plpgsql, and I see what you mean about
the path of least resistance ... though it's not completely obvious to me
how many pieces of that are there for other plpgsql purposes, and how many
are inherently necessary just to be able to catch and recover from an
error thrown from SPI.

BeginInternalSubTransaction? CreateExecutorState? CreateExprContext?
xact callback? subxact callback?

> 99% of the time, the path of least resistance at the C-code level is to
> avoid expected error conditions in the first place, rather than try to
> catch and clean up.  In this example, there are plenty of ways you might
> test whether "mytable" exists before risking the SPI_execute call.

I can think of:

- another SPI query
  "select 1 from pg_catalog.pg_class join pg_catalog.pg_namespace n
  on relnamespace = n.oid
  where nspname = 'myschema' and relname = 'mytable'"

- InvalidOid != get_relname_relid("mytable",
GetSysCacheOid1(NAMESPACENAME, CStringGetDatum("myschema")))

- ... other variations on those ...

The first one strikes me funny because it's an even heavier SQL query
to plan and execute just to find out if I can execute the original
trivial one. (Which doesn't really matter, admittedly, since I need to
do it exactly once.) The second one strikes me funny for the mixture of API
levels, undocumented low level to check existence followed by documented
SPI level for the original query. Almost tempts me to ditch SPI entirely
and learn how to use heap_* methods for the single trivial row retrieval
I need, only by the time I've figured out all that, I'll have forgotten
what I was trying to do in the first place.

Is there a particular, reasonably tidy idiom that has emerged as the usual,
customary approach to a task like this?

Thanks,
-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing "SPI_finish" that isn't missing

2015-12-28 Thread Tom Lane
Chapman Flack  writes:
> Is there a particular, reasonably tidy idiom that has emerged as the usual,
> customary approach to a task like this?

You could construct a RangeVar node and then use RangeVarGetRelid,
but I dunno that that's really any cleaner than inquiring about it
using the syscaches.  The syscache APIs are really quite unlikely
to change in any meaningful way, if only because there's too much
code depending on them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Testing Postgresql 9.5 RC1 with Alfresco 5.0.d

2015-12-28 Thread Boriss Mejias

Hi there,

The following wiki page:
https://wiki.postgresql.org/wiki/HowToBetaTest#How_to_Test
suggests that I should send the following report to this mailing list.

Name: Boriss Mejías (Tchorix)

Release: Postgresql 9.5 RC1

Test Type: Compatibility with Alfresco Community Edition 5.0.d

Test Detail: New Alfresco installation with Postgresql 9.5 RC1. Testing updates 
into the database.


Platform: CentOS 6.5

Installation Method: Compiled from tarball. Default parameters.

Platform Detail:
- VMWare virtual machine
- RAM: 1GB
- CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
- Alfresco running on a different VM.

Test Procedure:
- Vanilla installation of Alfresco Community Edition 5.0.d
- Using the default jdbc driver shipped with Alfresco
- Test that Alfresco Shares works correctly with the basic functionality
- Script with multiple HTTP REST calls to create/read/delete users. Each 
operation/transaction involves several tables.


Failure? No

Test Results:
- Alfresco worked as expected.
- Scripts are run in parallel to expect race conditions. Alfresco uses 
optimistic approach.
- Constraint violations on table-keys are observed in the logs, which is 
good, because data remains consistent, and Alfresco handles the error messages 
correctly.


Comments:
- I just wrote my own scripts to test the consistency of users data. I'm 
not part of Alfresco Inc to use their official test bed
- Alfresco uses Ibatis to control de database model. Hence, I wasn't 
expecting any failure.
- I don't have control of the table indexes, and I was not able to compare 
the test with a previous version of Postgresql to test performance.
- This is the first time I run a test of a previous-release version of 
Postgresql, so any comment of ideas are welcome to contribute again next time.


Cheers
Boriss


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2015-12-28 Thread Peter Geoghegan
On Fri, Dec 18, 2015 at 11:57 AM, Peter Geoghegan  wrote:
> BTW, I'm not necessarily determined to make the new special-purpose
> allocator work exactly as proposed. It seemed useful to prioritize
> simplicity, and currently so there is one big "huge palloc()" with
> which we blow our memory budget, and that's it. However, I could
> probably be more clever about "freeing ranges" initially preserved for
> a now-exhausted tape. That kind of thing.

Attached is a revision that significantly overhauls the memory patch,
with several smaller changes.

We can now grow memtuples to rebalance the size of the array
(memtupsize) against the need for memory for tuples. Doing this makes
a big difference with a 500MB work_mem setting in this datum sort
case, as my newly expanded trace_sort instrumentation shows:

LOG:  grew memtuples 1.40x from 9362286 (219429 KB) to 13107200
(307200 KB) for final merge
LOG:  tape 0 initially used 34110 KB of 34110 KB batch (1.000) and
13107200 slots remaining
LOG:  tape 1 initially used 34110 KB of 34110 KB batch (1.000) and has
1534 slots remaining
LOG:  tape 2 initially used 34110 KB of 34110 KB batch (1.000) and has
1535 slots remaining
LOG:  tape 3 initially used 34110 KB of 34110 KB batch (1.000) and has
1533 slots remaining
LOG:  tape 4 initially used 34110 KB of 34110 KB batch (1.000) and has
1534 slots remaining
LOG:  tape 5 initially used 34110 KB of 34110 KB batch (1.000) and has
1535 slots remaining

This is a big improvement. With the new batchmemtuples() call
commented out (i.e. no new grow_memtuples() call), the LOG output
around the same point is:

LOG:  tape 0 initially used 24381 KB of 48738 KB batch (0.500) and has
1 slots remaining
LOG:  tape 1 initially used 24381 KB of 48738 KB batch (0.500) and has
1 slots remaining
LOG:  tape 2 initially used 24381 KB of 48738 KB batch (0.500) and has
1 slots remaining
LOG:  tape 3 initially used 24381 KB of 48738 KB batch (0.500) and has
1 slots remaining
LOG:  tape 4 initially used 24381 KB of 48738 KB batch (0.500) and has
1 slots remaining
LOG:  tape 5 initially used 24381 KB of 48738 KB batch (0.500) and has
1 slots remaining

(I actually added a bit more detail to what you see here during final clean-up)

Obviously we're using memory a lot more efficiently here as compared
to my last revision (or the master branch -- it always has palloc()
overhead, of course). With no grow_memtuples, we're not wasting ~1530
slots per tape anymore (which is a tiny fraction of 1% of the total),
but we are wasting 50% of all batch memory, or almost 30% of all
work_mem.

Note that this improvement is possible despite the fact that memory is
still MAXALIGN()'d -- I'm mostly just clawing back what I can, having
avoided much STANDARDCHUNKHEADERSIZE overhead for the final on-the-fly
merge. I tend to think that the bigger problem here is that we use so
many memtuples when merging in the first place though (e.g. 60% in the
above case), because memtuples are much less useful than something
like a simple array of pointers when merging; I can certainly see why
you'd need 6 memtuples here, for the merge heap, but the other ~13
million seem mostly unnecessary. Anyway, what I have now is as far as
I want to go to accelerate merging for 9.6, since parallel CREATE
INDEX is where the next big win will come from. As wasteful as this
can be, I think it's of secondary importance.

With this revision, I've given up on the idea of trying to map
USEMEM()/FREEMEM() to "logical" allocations and deallocations that
consume from each tape's batch. The existing merge code in the master
branch is concerned exclusively with making each tape's use of memory
fair; each tape only gets so many "slots" (memtuples), and so much
memory, and that's it (there is never any shuffling of those resource
budgets between tapes). I get the same outcome from simply only
allowing tapes to get memory from their own batch allocation, which
isn't much complexity, because only READTUP() routines regularly need
memory. We detect when memory has been exhausted within
mergeprereadone() in a special way, not using LACKMEM() at all -- this
seems simpler. (Specifically, we use something called overflow
allocations for this purpose. This means that there are still a very
limited number of retail palloc() calls.)

This new version somewhat formalizes the idea that batch allocation
may one day have uses beyond the final on-the-fly merge phase, which
makes a lot of sense. We should really be saving a significant amount
of memory when initially sorting runs, too. This revision also
pfree()s tape memory early if the tape is exhausted early, which will
help a lot when there is a logical/physical correlation.

Overall, I'm far happier with how memory is managed in this revision,
mostly because it's easier to reason about. trace_sort now closely
monitors where memory goes, and I think that's a good idea in general.
That makes production performance problems a lot easier to reason
about -- 

Re: [HACKERS] Some 9.5beta2 backend processes not terminating properly?

2015-12-28 Thread Tom Lane
Shay Rojansky  writes:
> After setting up 9.5beta2 on the Npgsql build server and running the Npgsql
> test suite against I've noticed some weird behavior.

> The tests run for a couple minutes, open and close some connection. With my
> pre-9.5 backends, the moment the test runner exits I can see that all
> backend processes exit immediately, and pg_activity_stat has no rows
> (except the querying one). With 9.5beta2, however, some backend processes
> continue to stay alive beyond the test runner, and pg_activity_stat
> contains extra rows (state idle, waiting false). This situation persists
> until I restart PostgreSQL.

No idea what's happening, but a couple of questions:

* Are you using SSL connections?

* Can you get stack traces from the seemingly-stuck backends?

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5rc1 brin_summarize_new_values

2015-12-28 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > This creates a new sub-section at the bottom of "9.26 System
> > Administration Functions" named "Indexing Helper Functions", containing
> > a table with a single row which is for this function.
> 
> Perhaps call it "Index Maintenance Functions"?
> 
> > We can bikeshed whether the new subsection should be at the bottom, or
> > some other placement.  Maybe it should become 9.26.3, for example.
> 
> Perhaps right after Database Object Management Functions.  I'm not
> feeling especially picky about that though; bottom would be OK.

Picked up both suggestions (along with some additional rewording and
fixes to the sgml tags), and pushed.  Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Joe Conway
On 12/28/2015 10:35 AM, Joe Conway wrote:
> An alternative would be to not have a space at all for those two, e.g.
> 
>   "Latest checkpoint's oldestCommitTsXid:%u\n"
>   "Latest checkpoint's newestCommitTsXid:%u\n"
> 
> That isn't too bad and would leave everything aligned.

That seems like the best solution to me.

8<---
...
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:Thu 24 Dec 2015 08:55:27 AM PST
Fake LSN counter for unlogged rels:   0/1
...
8<---

Anyone parsing based on fixed length would be ok, and so would anyone
splitting on the colon.

I retract my earlier suggestion of doing HEAD different from
REL9_5_STABLE, at least for the moment. My patch for pg_controldata
related functions is going to impact all this anyway, so we might as
well not fuss about it now.

Any objections to the attached?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/access/rmgrdesc/xlogdesc.c b/src/backend/access/rmgrdesc/xlogdesc.c
index 83cc9e8..5e210b9 100644
*** a/src/backend/access/rmgrdesc/xlogdesc.c
--- b/src/backend/access/rmgrdesc/xlogdesc.c
*** xlog_desc(StringInfo buf, XLogReaderStat
*** 59,66 
  		 checkpoint->oldestXidDB,
  		 checkpoint->oldestMulti,
  		 checkpoint->oldestMultiDB,
! 		 checkpoint->oldestCommitTs,
! 		 checkpoint->newestCommitTs,
  		 checkpoint->oldestActiveXid,
   (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online");
  	}
--- 59,66 
  		 checkpoint->oldestXidDB,
  		 checkpoint->oldestMulti,
  		 checkpoint->oldestMultiDB,
! 		 checkpoint->oldestCommitTsXid,
! 		 checkpoint->newestCommitTsXid,
  		 checkpoint->oldestActiveXid,
   (info == XLOG_CHECKPOINT_SHUTDOWN) ? "shutdown" : "online");
  	}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index ed8d98a..a284894 100644
*** a/src/backend/access/transam/commit_ts.c
--- b/src/backend/access/transam/commit_ts.c
*** TransactionTreeSetCommitTsData(Transacti
*** 217,224 
  	commitTsShared->dataLastCommit.nodeid = nodeid;
  
  	/* and move forwards our endpoint, if needed */
! 	if (TransactionIdPrecedes(ShmemVariableCache->newestCommitTs, newestXact))
! 		ShmemVariableCache->newestCommitTs = newestXact;
  	LWLockRelease(CommitTsLock);
  }
  
--- 217,224 
  	commitTsShared->dataLastCommit.nodeid = nodeid;
  
  	/* and move forwards our endpoint, if needed */
! 	if (TransactionIdPrecedes(ShmemVariableCache->newestCommitTsXid, newestXact))
! 		ShmemVariableCache->newestCommitTsXid = newestXact;
  	LWLockRelease(CommitTsLock);
  }
  
*** TransactionIdGetCommitTsData(Transaction
*** 285,292 
  	int			entryno = TransactionIdToCTsEntry(xid);
  	int			slotno;
  	CommitTimestampEntry entry;
! 	TransactionId oldestCommitTs;
! 	TransactionId newestCommitTs;
  
  	/* error if the given Xid doesn't normally commit */
  	if (!TransactionIdIsNormal(xid))
--- 285,292 
  	int			entryno = TransactionIdToCTsEntry(xid);
  	int			slotno;
  	CommitTimestampEntry entry;
! 	TransactionId oldestCommitTsXid;
! 	TransactionId newestCommitTsXid;
  
  	/* error if the given Xid doesn't normally commit */
  	if (!TransactionIdIsNormal(xid))
*** TransactionIdGetCommitTsData(Transaction
*** 314,331 
  		return *ts != 0;
  	}
  
! 	oldestCommitTs = ShmemVariableCache->oldestCommitTs;
! 	newestCommitTs = ShmemVariableCache->newestCommitTs;
  	/* neither is invalid, or both are */
! 	Assert(TransactionIdIsValid(oldestCommitTs) == TransactionIdIsValid(newestCommitTs));
  	LWLockRelease(CommitTsLock);
  
  	/*
  	 * Return empty if the requested value is outside our valid range.
  	 */
! 	if (!TransactionIdIsValid(oldestCommitTs) ||
! 		TransactionIdPrecedes(xid, oldestCommitTs) ||
! 		TransactionIdPrecedes(newestCommitTs, xid))
  	{
  		*ts = 0;
  		if (nodeid)
--- 314,331 
  		return *ts != 0;
  	}
  
! 	oldestCommitTsXid = ShmemVariableCache->oldestCommitTsXid;
! 	newestCommitTsXid = ShmemVariableCache->newestCommitTsXid;
  	/* neither is invalid, or both are */
! 	Assert(TransactionIdIsValid(oldestCommitTsXid) == TransactionIdIsValid(newestCommitTsXid));
  	LWLockRelease(CommitTsLock);
  
  	/*
  	 * Return empty if the requested value is outside our valid range.
  	 */
! 	if (!TransactionIdIsValid(oldestCommitTsXid) ||
! 		TransactionIdPrecedes(xid, oldestCommitTsXid) ||
! 		TransactionIdPrecedes(newestCommitTsXid, xid))
  	{
  		*ts = 0;
  		if (nodeid)
*** ActivateCommitTs(void)
*** 655,668 
  	 * enabled again?  It doesn't look like it does, because there should be a
  	 * checkpoint that sets the value to InvalidTransactionId at end of
  	 * recovery; and so any chance of 

Re: [HACKERS] On columnar storage (2)

2015-12-28 Thread Alvaro Herrera
Konstantin Knizhnik wrote:

> 3. Transpose of data and role of CS.
> Let's look once again on Quote example above. Data is received in time
> ascending order. But most queries require grouping it by symbol.  So at some
> stage we have to "transpose"  data. To efficiently append data to timeseries
> we need to buffer it somewhere and then use append range of values. In
> Fujitsu approach two different representations of data are used: reader and
> writer optimized. In IMCS approach, CS is just temporary projection of
> normal PostgreSQL tables. So we do not need to worry about durability - it
> is enforced by PostgreSQL.
> 
> So the question is whether CS should be only storage for the data or just
> copy (may be transient) of normal table?

Our original plan was that a CS was the primary storage of data, not a
duplicate.  However, after some discussion it became apparent that are
several use cases that are better served by allowing redundant storage,
i.e. having CSs that are just a reader-optimized copy of data that
exists elsewhere.  While I'm not a fan of that approach, I think it
would be good to leave the door open for a future implementation of
that.  However, I think it'll bring interesting challenges to the
optimizer side, so I'm not promising to work on it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Tom Lane
Joe Conway  writes:
> I retract my earlier suggestion of doing HEAD different from
> REL9_5_STABLE, at least for the moment. My patch for pg_controldata
> related functions is going to impact all this anyway, so we might as
> well not fuss about it now.

Seems reasonable.

> Any objections to the attached?

Looks OK in a quick once-over.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Joe Conway
On 12/28/2015 10:30 AM, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Joe Conway  writes:
>>> I think the attached does the job. Note I settled on
>>> (new|old)estCommitTsXid as that seemed most descriptive and not horribly
>>> longer than before. It did mean, however, that I needed to add a single
>>> space to all the output in pg_resetxlog and pg_controldata, which added
>>> a fair amount to the patch size.
>>
>> Hm.  That will break all the translations for those messages, no?
>> Not sure it's worth it.
> 
> Yeah, I would leave the others alone and just let this line have the
> value shifted one more column to the right.

Seems like the translation issue would be mostly 9.5. Maybe we should
add the space in master but not 9.5?

An alternative would be to not have a space at all for those two, e.g.

  "Latest checkpoint's oldestCommitTsXid:%u\n"
  "Latest checkpoint's newestCommitTsXid:%u\n"

That isn't too bad and would leave everything aligned.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Fix compiler warnings in Cube Extension

2015-12-28 Thread Tom Lane
David Rowley  writes:
> My compiler is complaining about cube_coord() and cube_coord_llur() not
> returning a value on all code paths.

Yeah, looking at that code, this would be expected from any compiler
that doesn't know that ereport() doesn't return.  On it now.  The
documentation aspect of that commit leaves much to be desired as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On columnar storage (2)

2015-12-28 Thread Alvaro Herrera
Konstantin Knizhnik wrote:

Hi,

> May be you know, that I have implemented IMCS (in-memory-columnar-store) as
> PostgreSQL extension.
> It was not so successful, mostly because people prefer to use standard SQL
> rather than using some special functions for accessing columnar storage
> (CS). Now I am thinking about second reincarnation of IMCS, based on FDW and
> CSP (custom nodes). This is why I am very interested in your patch.

Great to hear.

> I have investigated previous version of the patch and have some
> questions.  I will be pleased if you can clarify them to me:
> 
> 1. CS API.
> I agree with you that FDW API seems to be not enough to efficiently support
> work with CS.
> At least we need batch insert.
> But may be it is better to extend FDW API rather than creating special API
> for CS?

The patch we have proposed thus far does not mess with executor
structure too much, so probably it would be possible to add some things
here and there to the FDW API and it might work.  But in the long term I
think the columnar storage project is more ambitious; for instance, I'm
sure we will want to be able to vectorise certain operations, and the
FDW API will become a bottleneck, so to speak.  I'm thinking in
vectorisation in two different ways: one is that some operations such as
computing aggregates over large data sets can work a lot faster if you
feed the value of one column for multiple tuples at a time in columnar
format; that way you can execute the operation directly in the CPU
(this requires specific support from the aggregate functions.)
For this to work, the executor needs to be rejigged so that multiple
values (tuples) can be passed at once.

The other aspect of vectorisation is that one input tuple might have
been split in several data origins, so that one half of the tuple is in
columnar format and another format is in row format; that lets you do
very fast updates on the row-formatted part, while allowing fast reads
for the columnar format, for instance.  (It's well known that columnar
oriented storage does not go well with updates; some implementation even
disallow updates and deletes altogether.)  Currently within the executor
a tuple is a TupleTableSlot which contains one Datum array, which has
all the values coming out of the HeapTuple; but for split storage
tuples, we will need to have a TupleTableSlot that has multiple "Datum
arrays" (in a way --- because, actually, once we get to vectorise as in
the preceding paragraph, we no longer have a Datum array, but some more
complex representation).

I think that trying to make the FDW API address all these concerns,
while at the same time *also* serving the needs of external data
sources, insanity will ensue.

> 2. Horizontal<->Vertical data mapping. As far as I understand this patch,
> the model of CS assumes that some table columns are stored in horizontal
> format (in heap), some - in vertical format (in CS).  And there is
> one-to-one mapping between horizontal and vertical parts of row using CTID.

Yes, that part needs to go away.  We will deal with this eventually; the
patch I posted was just some very basic infrastructure.  In the future
we would like to be able to have real support for not having to
translate between column-oriented and row-oriented formats; at least for
some operations.  (I expect that we will leave most code as currently
and require translation, while other parts that have been optimized are
able to skip the translation step.  As things mature we make more things
understand the new format without translation.)  This is also dependent
on being able to vectorise the executor.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pam auth - add rhost item

2015-12-28 Thread David Fetter
On Mon, Dec 28, 2015 at 03:01:07PM +, Grzegorz Sampolski wrote:
> Hi.
> New patch available.

Please attach the patch or patch set to your email just like else
does. :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Alvaro Herrera
Tom Lane wrote:
> Joe Conway  writes:
> > I think the attached does the job. Note I settled on
> > (new|old)estCommitTsXid as that seemed most descriptive and not horribly
> > longer than before. It did mean, however, that I needed to add a single
> > space to all the output in pg_resetxlog and pg_controldata, which added
> > a fair amount to the patch size.
> 
> Hm.  That will break all the translations for those messages, no?
> Not sure it's worth it.

Yeah, I would leave the others alone and just let this line have the
value shifted one more column to the right.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oldest/newestCommitTs output by pg_controldata

2015-12-28 Thread Joe Conway
On 12/28/2015 11:48 AM, Tom Lane wrote:
> Joe Conway  writes:
>> I retract my earlier suggestion of doing HEAD different from
>> REL9_5_STABLE, at least for the moment. My patch for pg_controldata
>> related functions is going to impact all this anyway, so we might as
>> well not fuss about it now.
> 
> Seems reasonable.
> 
>> Any objections to the attached?
> 
> Looks OK in a quick once-over.

Pushed to HEAD and 9.5

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-28 Thread Haribabu Kommi
On Mon, Dec 28, 2015 at 9:09 PM, Shulgin, Oleksandr
 wrote:
> On Thu, Dec 24, 2015 at 5:16 AM, Haribabu Kommi 
> wrote:
>>
>> On Thu, Dec 24, 2015 at 2:37 AM, Tom Lane  wrote:
>> > "Shulgin, Oleksandr"  writes:
>> >> 1. Have you considered re-loading the HBA file upon call to this
>> >> function
>> >> in a local context instead of keeping it in the backends memory?
>> >
>> > Aside from the security questions, please consider that this feature
>> > should
>> > work similarly to the current implementation of the pg_file_settings
>> > view,
>> > namely it tells you about what is *currently* in the on-disk files, not
>> > necessarily what is the active setting in the postmaster's memory.
>> > A backend could not be entirely sure about the postmaster's state
>> > anyway;
>> > and even if it could be, one of the major applications for features like
>> > this is testing manual changes to the files before you SIGHUP the
>> > postmaster.  So re-reading the files on each usage is a Good Thing, IMO,
>> > even if it sounds inefficient.
>> >
>> >> 2. I also wonder why JSONB arrays for database/user instead of TEXT[]?
>> >
>> > Yes, that seems rather random to me too.
>>
>> Here I attached updated patch with the following changes,
>> - Local loading of HBA file to show the authentication data
>> - Changed database and user types are text[]
>
>
> Still this requires a revert of the memory context handling commit for
> load_hba() and load_ident().  I think you can get around the problem by
> changing these functions to work with CurrentMemoryContext and set it
> explicitly to the newly allocated PostmasterContext in
> PerformAuthentication().  In your function you could then create a temporary
> context to be discarded before leaving the function.

Thanks for the review. I didn't understand your point clearly.

In the attached patch, load_hba uses PostmasterContext if it is present,
otherwise CurretMemoryContext. PostmasterContext is present only
in the backend start phase.

> I still think you should not try to re-implement check_hba(), but extend
> this function with means to report line skip reasons as per your
> requirements.  Having an optional callback function might be a good fit (a
> possible use case is logging the reasons line by line).

check_hba function is enhanced to fill the hba line details with
reason for mismatch.
In check_hba function whenever a mismatch is found, the fill_hbaline function is
called to frame the tuple and inserted into tuple store.

Regards,
Hari Babu
Fujitsu Australia


pg_hba_lookup_poc_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing Postgresql 9.5 RC1 with Alfresco 5.0.d

2015-12-28 Thread Simon Riggs
On 28 December 2015 at 22:56, Boriss Mejias  wrote:

> Hi there,
>
> The following wiki page:
> https://wiki.postgresql.org/wiki/HowToBetaTest#How_to_Test
> suggests that I should send the following report to this mailing list.
>

Thanks for testing.

Do you have any comments on performance testing?

Are there any opportunities to improve Alfresco using the new features in
PostgreSQL 9.5?

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


[HACKERS] pg_controldata/pg_resetxlog "Latest checkpoint's NextXID" format

2015-12-28 Thread Joe Conway
I wonder why "Latest checkpoint's NextXID" is formated like this:

8<-
printf(_("Latest checkpoint's NextXID:  %u/%u\n"),
   ControlFile.checkPointCopy.nextXidEpoch,
   ControlFile.checkPointCopy.nextXid);
8<-

Shouldn't it use "%X/%X", same as e.g. "Prior checkpoint location" and
all the other XIDs?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Patch to fix misspellings of the word "segment"

2015-12-28 Thread David Rowley
On 28 December 2015 at 21:35, David Rowley 
wrote:

> I noticed this while reading nodeGather.c. I checked to find other
> misspellings of the word and found one more, so the attached fixes both.
>

Here's another tiny patch which fixes a spelling mistake in tqueue.c


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


associated_typo_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pam auth - add rhost item

2015-12-28 Thread David Fetter
On Tue, Dec 29, 2015 at 12:46:40AM +0100, Grzegorz Sampolski wrote:
> Hi.
> I thought link on commitfest to github url was sufficient.
> Sorry. Attached new patch.

Thanks!

My understanding for the reason behind the policy is that it is to
ensure that patch submissions are all together in a widely distributed
archive.

While github may seem like it's gigantic and eternal today, it may not
seem so ten years hence.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN pending list clean up exposure to SQL

2015-12-28 Thread Jeff Janes
On Wed, Nov 25, 2015 at 9:29 AM, Jeff Janes  wrote:
>
> I'll prepare a patch for core for the January commitfest, and see if
> it flies.  If not, there is always the extension to fall back to.

Here is an updated patch.  I've added type and permission checks,
docs, and some regression tests.

Cheers,

Jeff
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 8ef9fce..6cbe11b
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17919,17924 
--- 17919,17928 
  brin_summarize_new_values
 
  
+
+ gin_clean_pending_list
+
+ 
 
   shows the functions
  available for index maintenance tasks.
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17939,17944 
--- 17943,17955 
 integer
 summarize page ranges not already summarized

+   
+
+ gin_clean_pending_list(index_oid regclass)
+
+bigint
+move fast-update pending list entries into main index structure
+   
   
  
 
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17952,17957 
--- 17963,17975 
  into the index.
 
  
+
+gin_clean_pending_list accepts a GIN index OID argument 
+ and moves the fast-update entries from the pending list into the 
+ main index structure for that index.  It returns the number of pages
+ removed from the pending list.
+
+ 

  

diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
new file mode 100644
index 262f1e4..c6f19de
*** a/doc/src/sgml/gin.sgml
--- b/doc/src/sgml/gin.sgml
***
*** 728,734 
 from the indexed item). As of PostgreSQL 8.4,
 GIN is capable of postponing much of this work by inserting
 new tuples into a temporary, unsorted list of pending entries.
!When the table is vacuumed, or if the pending list becomes larger than
 , the entries are moved to the
 main GIN data structure using the same bulk insert
 techniques used during initial index creation.  This greatly improves
--- 728,736 
 from the indexed item). As of PostgreSQL 8.4,
 GIN is capable of postponing much of this work by inserting
 new tuples into a temporary, unsorted list of pending entries.
!When the table is vacuumed or autoanalyzed, or when 
!gin_clean_pending_list(regclass) is called, or if the 
!pending list becomes larger than
 , the entries are moved to the
 main GIN data structure using the same bulk insert
 techniques used during initial index creation.  This greatly improves
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
new file mode 100644
index 4bb22fe..373b52d
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***
*** 24,29 
--- 24,30 
  #include "miscadmin.h"
  #include "utils/memutils.h"
  #include "utils/rel.h"
+ #include "utils/acl.h"
  #include "storage/indexfsm.h"
  
  /* GUC parameter */
*** ginInsertCleanup(GinState *ginstate,
*** 962,964 
--- 963,999 
  	MemoryContextSwitchTo(oldCtx);
  	MemoryContextDelete(opCtx);
  }
+ 
+ /*
+  * SQL-callable function to clean the insert pending list
+  */
+ Datum
+ gin_clean_pending_list(PG_FUNCTION_ARGS)
+ {
+ 	Oid			indexoid = PG_GETARG_OID(0);
+ 	Relation	indexRel = index_open(indexoid, AccessShareLock);
+ 	IndexBulkDeleteResult stats;
+ 	GinState	ginstate;
+ 
+ 	/* Must be a GIN index */
+ 	if (indexRel->rd_rel->relkind != RELKIND_INDEX ||
+ 		indexRel->rd_rel->relam != GIN_AM_OID)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+  errmsg("\"%s\" is not a GIN index",
+ 		RelationGetRelationName(indexRel;
+ 
+ 	/* User must own the index (comparable to privileges needed for VACUUM) */
+ 	if (!pg_class_ownercheck(indexoid, GetUserId()))
+ 		aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
+ 	   RelationGetRelationName(indexRel));
+ 
+ 
+ 	memset(, 0, sizeof(stats));
+ 	initGinState(, indexRel);
+ 	ginInsertCleanup(, false, true, );
+ 
+ 	index_close(indexRel, AccessShareLock);
+ 
+ 	PG_RETURN_INT64((int64) stats.pages_deleted);
+ }
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
new file mode 100644
index 5021887..9373ef5
*** a/src/include/access/gin_private.h
--- b/src/include/access/gin_private.h
*** extern void ginFreeScanKeys(GinScanOpaqu
*** 878,883 
--- 878,886 
  /* ginget.c */
  extern Datum gingetbitmap(PG_FUNCTION_ARGS);
  
+ /* ginfast.c */
+ extern Datum gin_clean_pending_list(PG_FUNCTION_ARGS);
+ 
  /* ginlogic.c */
  extern void ginInitConsistentFunction(GinState *ginstate, GinScanKey key);
  
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index d8640db..4f76b84
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*** 

Re: [HACKERS] pam auth - add rhost item

2015-12-28 Thread Grzegorz Sampolski
Hi.
I thought link on commitfest to github url was sufficient.
Sorry. Attached new patch.

On 12/28/2015 09:07 PM, David Fetter wrote:
> Please attach the patch or patch set to your email just like else
> does

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index cdc5bf1..d42cc76 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1735,6 +1735,20 @@ CheckPAMAuth(Port *port, char *user, char *password)
 {
 	int			retval;
 	pam_handle_t *pamh = NULL;
+	char hostinfo[NI_MAXHOST];
+
+	if (port->hba->pamusedns == true)
+		retval = pg_getnameinfo_all(>raddr.addr, port->raddr.salen,
+hostinfo, sizeof(hostinfo), NULL, 0, 0);
+	else
+		retval = pg_getnameinfo_all(>raddr.addr, port->raddr.salen,
+hostinfo, sizeof(hostinfo), NULL, 0, NI_NUMERICHOST);
+	if (retval) {
+		ereport(LOG,
+(errmsg("(pam) couldn not determine the remote host information (%s)",
+	gai_strerror(retval;
+		return STATUS_ERROR;
+	}
 
 	/*
 	 * We can't entirely rely on PAM to pass through appdata --- it appears
@@ -1780,6 +1794,17 @@ CheckPAMAuth(Port *port, char *user, char *password)
 		return STATUS_ERROR;
 	}
 
+	retval = pam_set_item(pamh, PAM_RHOST, hostinfo);
+
+	if (retval != PAM_SUCCESS)
+	{
+		ereport(LOG,
+(errmsg("pam_set_item(PAM_RHOST) failed: %s",
+	pam_strerror(pamh, retval;
+		pam_passwd = NULL;
+		return STATUS_ERROR;
+	}
+
 	retval = pam_set_item(pamh, PAM_CONV, _passw_conv);
 
 	if (retval != PAM_SUCCESS)
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 94f7cfa..db3fe3c 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1447,6 +1447,15 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline, int line_num)
 		REQUIRE_AUTH_OPTION(uaPAM, "pamservice", "pam");
 		hbaline->pamservice = pstrdup(val);
 	}
+	else if (strcmp(name, "pamusedns") == 0)
+	{
+		REQUIRE_AUTH_OPTION(uaPAM, "pamusedns", "pam");
+		if (strcmp(val, "1") == 0)
+			hbaline->pamusedns = true;
+		else
+			hbaline->pamusedns = false;
+
+	}
 	else if (strcmp(name, "ldapurl") == 0)
 	{
 #ifdef LDAP_API_FEATURE_X_OPENLDAP
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index 68a953a..f39240d 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -64,6 +64,7 @@ typedef struct HbaLine
 
 	char	   *usermap;
 	char	   *pamservice;
+	bool		pamusedns;
 	bool		ldaptls;
 	char	   *ldapserver;
 	int			ldapport;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers