from:"Mithun Cy"

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2015-12-19 Thread Mithun Cy

On Thu, Dec 17, 2015 at 3:15 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

> I have rebased the patch and tried to run pgbench.

> I see memory corruptions, attaching the valgrind report for the same.
Sorry forgot to add re-based patch, adding the same now.
After some analysis I saw writing to shared memory to store shared snapshot
is not protected under exclusive write lock, this leads to memory
corruptions.
I think until this is fixed measuring the performance will not be much
useful.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***
*** 1559,1564  finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
--- 1559,1565 
  			elog(ERROR, "cache lookup failed for relation %u", OIDOldHeap);
  		relform = (Form_pg_class) GETSTRUCT(reltup);
  
+ 		Assert(TransactionIdIsNormal(frozenXid));
  		relform->relfrozenxid = frozenXid;
  		relform->relminmxid = cutoffMulti;
  
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 410,415  ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
--- 410,416 
  		if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
  		{
  			ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
+ ProcGlobal->cached_snapshot_valid = false;
  			LWLockRelease(ProcArrayLock);
  		}
  		else
***
*** 558,563  ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
--- 559,566 
  		nextidx = pg_atomic_read_u32(>nextClearXidElem);
  	}
  
+ ProcGlobal->cached_snapshot_valid = false;
+ 
  	/* We're done with the lock now. */
  	LWLockRelease(ProcArrayLock);
  
***
*** 1543,1548  GetSnapshotData(Snapshot snapshot)
--- 1546,1553 
  	 errmsg("out of memory")));
  	}
  
+ snapshot->takenDuringRecovery = RecoveryInProgress();
+ 
  	/*
  	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
  	 * going to set MyPgXact->xmin.
***
*** 1557,1565  GetSnapshotData(Snapshot snapshot)
  	/* initialize xmin calculation with xmax */
  	globalxmin = xmin = xmax;
  
! 	snapshot->takenDuringRecovery = RecoveryInProgress();
  
! 	if (!snapshot->takenDuringRecovery)
  	{
  		int		   *pgprocnos = arrayP->pgprocnos;
  		int			numProcs;
--- 1562,1592 
  	/* initialize xmin calculation with xmax */
  	globalxmin = xmin = xmax;
  
! 	if (!snapshot->takenDuringRecovery && ProcGlobal->cached_snapshot_valid)
! 	{
! 		Snapshot csnap = >cached_snapshot;
! 		TransactionId *saved_xip;
! 		TransactionId *saved_subxip;
  
! 		saved_xip = snapshot->xip;
! 		saved_subxip = snapshot->subxip;
! 
! 		memcpy(snapshot, csnap, sizeof(SnapshotData));
! 
! 		snapshot->xip = saved_xip;
! 		snapshot->subxip = saved_subxip;
! 
! 		memcpy(snapshot->xip, csnap->xip,
! 			   sizeof(TransactionId) * csnap->xcnt);
! 		memcpy(snapshot->subxip, csnap->subxip,
! 			   sizeof(TransactionId) * csnap->subxcnt);
! 		globalxmin = ProcGlobal->cached_snapshot_globalxmin;
! 		xmin = csnap->xmin;
! 
! 		Assert(TransactionIdIsValid(globalxmin));
! 		Assert(TransactionIdIsValid(xmin));
! 	}
! 	else if (!snapshot->takenDuringRecovery)
  	{
  		int		   *pgprocnos = arrayP->pgprocnos;
  		int			numProcs;
***
*** 1577,1591  GetSnapshotData(Snapshot snapshot)
  			TransactionId xid;
  
  			/*
! 			 * Backend is doing logical decoding which manages xmin
! 			 * separately, check below.
  			 */
! 			if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
! continue;
! 
! 			/* Ignore procs running LAZY VACUUM */
! 			if (pgxact->vacuumFlags & PROC_IN_VACUUM)
! continue;
  
  			/* Update globalxmin to be the smallest valid xmin */
  			xid = pgxact->xmin; /* fetch just once */
--- 1604,1615 
  			TransactionId xid;
  
  			/*
! 			 * Ignore procs running LAZY VACUUM (which don't need to retain
! 			 * rows) and backends doing logical decoding (which manages xmin
! 			 * separately, check below).
  			 */
! 			if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
! continue;
  
  			/* Update globalxmin to be the smallest valid xmin */
  			xid = pgxact->xmin; /* fetch just once */
***
*** 1653,1658  GetSnapshotData(Snapshot snapshot)
--- 1677,1706 
  }
  			}
  		}
+ 
+ 		/* upate cache */
+ 		{
+ 			Snapshot csnap = >cached_snapshot;
+ 			TransactionId *saved_xip;
+ 			TransactionId *saved_subxip;
+ 
+ 			ProcGlobal->cached_snapshot_globalxmin = globalxmin;
+ 
+ 			saved_xip = csnap->xip;
+ 			saved_subxip = csnap->subxip;
+ 			memcpy(csnap, snapshot, sizeof(SnapshotData));
+ 			csnap->xip = saved_xip;
+ 			csnap->subxip = saved_subxip;
+ 
+ 			/* not yet stored. Yuck */
+ 			csn

Re: [HACKERS] Hash Indexes

2016-06-24 Thread Mithun Cy

On Thu, Jun 16, 2016 at 12:58 PM, Amit Kapila 
 wrote:

I have a question regarding code changes in *_hash_first*.

+/*
+ * Conditionally get the lock on primary bucket page for search
while
+* holding lock on meta page. If we have to wait, then release the
meta
+ * page lock and retry it in a hard way.
+ */
+bucket = _hash_hashkey2bucket(hashkey,
+
 metap->hashm_maxbucket,
+
 metap->hashm_highmask,
+
 metap->hashm_lowmask);
+
+blkno = BUCKET_TO_BLKNO(metap, bucket);
+
+/* Fetch the primary bucket page for the bucket */
+buf = ReadBuffer(rel, blkno);
+if (!ConditionalLockBufferShared(buf))

Here we try to take lock on bucket page but I think if successful we do not
recheck whether any split happened before taking lock. Is this not
necessary now?

Also  below "if" is always true as we enter here only when outer "if
(retry)" is true.
+if (retry)
+{
+if (oldblkno == blkno)
+break;
+_hash_relbuf(rel, buf);
+}

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Mithun Cy

On Fri, Jan 29, 2016 at 5:11 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote
>
>
>
> >I just ran some tests on above patch. Mainly to compare
> >how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for
> sorting.
> >I have 8GB of ram and ssd storage.
>
>
> *Key length 520*
>
>
>
>
> *Number of records* 320 640 1280 2560
>
> 1.7 GB 3.5GB 7 GB 14GB
>
>
>
>
>
> *CASE 1*
>
>
>
> *RS* 23654.677 35172.811 44965.442 106420.155
> *Qsort* 14100.362 40612.829 101068.107 334893.391
>
>
>
>
>
> *CASE 2*
>
>
>
> *RS* 13427.378 36882.898 98492.644 310670.15
> *Qsort* 12475.133 32559.074 100772.531 322080.602
>
>
>
>
>
> *CASE 3*
>
>
>
> *RS* 17202.966 45163.234 122323.299 337058.856
> *Qsort* 12530.726 23343.753 59431.315 152862.837
>
>
>
> *CASE 1* *RS* 128822.735
>
> *Qsort* 90857.496
> *CSAE 2* *RS* *105631.775*
>
> *Qsort* *105938.334*
> *CASE 3* *RS* *152301.054*
>
> *Qsort* *149649.347*
>
>
Sorry forgot to mention above data in table is in unit of ms, returned by
psql client.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Mithun Cy

On Tue, Dec 29, 2015 at 4:33 AM, Peter Geoghegan  wrote:
>Attached is a revision that significantly overhauls the memory patch,
>with several smaller changes.

I just ran some tests on above patch. Mainly to compare
how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for
sorting.
I have 8GB of ram and ssd storage.

Settings and Results.

Work_mem= DEFAULT (4mb).
key width = 520.

CASE 1. Data is pre-sorted as per  sort key order.

CASE 2. Data is sorted in opposite order of sort key.

CASE 3. Data is randomly distributed.

*Key length 520*

*Number of records* 320 640 1280 2560

1.7 GB 3.5GB 7 GB 14GB

*CASE 1*

*RS* 23654.677 35172.811 44965.442 106420.155
*Qsort* 14100.362 40612.829 101068.107 334893.391

*CASE 2*

*RS* 13427.378 36882.898 98492.644 310670.15
*Qsort* 12475.133 32559.074 100772.531 322080.602

*CASE 3*

*RS* 17202.966 45163.234 122323.299 337058.856
*Qsort* 12530.726 23343.753 59431.315 152862.837

If data is sorted as same as sort key order then current code performs
better than proposed patch
as sort size increases.

It appears new algo do not seem have any major impact if rows are presorted
in opposite order.

For randomly distributed order quick sort performs well when compared to
current sort method (RS).

==
Now Increase the work_mem to 64MB and for 14 GB of data to sort.

CASE 1: We can see Qsort is able to catchup with current sort method(RS).
CASE 2:  No impact.
CASE 3: RS is able to catchup with Qsort.

*CASE 1* *RS* 128822.735

*Qsort* 90857.496
*CSAE 2* *RS* *105631.775*

*Qsort* *105938.334*
*CASE 3* *RS* *152301.054*

*Qsort* *149649.347*

I think for long keys both old (RS) and new (Qsort) sort method has its own
characteristics
based on data distribution. I think work_mem is the key If properly set new
method(Qsort) will
be able to fit most of the cases. If work_mem is not tuned right it, there
are cases it can regress.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Test Queries.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-02-24 Thread Mithun Cy

On Sat, Jan 16, 2016 at 10:23 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:


> >On Fri, Jan 15, 2016 at 11:23 AM, Mithun Cy <mithun...@enterprisedb.com>
> wrote:
>
>> On Mon, Jan 4, 2016 at 2:28 PM, Andres Freund <and...@anarazel.de> wrote:
>>
>> >> I think at the very least the cache should be protected by a separate
>> >> lock, and that lock should be acquired with TryLock. I.e. the cache is
>> >> updated opportunistically. I'd go for an lwlock in the first iteration.
>>
>
> >I also think this observation of yours is right and currently that is
> >okay because we always first check TransactionIdIsCurrentTransactionId().
>
> >+ const uint32 snapshot_cached= 0;
>

I have fixed all of the issues reported by regress test. Also now when
backend try to cache the snapshot we also try to store the self-xid and sub
xid, so other backends can use them.

I also did some read-only perf tests.

Non-Default Settings.

scale_factor=300.

./postgres -c shared_buffers=16GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

./pgbench -c $clients -j $clients -T 300 -M prepared -S  postgres

Machine Detail:
cpu   : POWER8
cores: 24 (192 with HT).

ClientsBase With cached snapshot
1   19653.91440919926.884664
16 190764.519336190040.299297
32 339327.881272354467.445076
48 462632.02493464767.917813
64 522642.515148533271.556703
80 515262.813189513353.962521

But did not see any perf improvement. Will continue testing the same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5cb28cf..a441ca0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1561,6 +1561,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 			elog(ERROR, "cache lookup failed for relation %u", OIDOldHeap);
 		relform = (Form_pg_class) GETSTRUCT(reltup);
 
+		Assert(TransactionIdIsNormal(frozenXid));
 		relform->relfrozenxid = frozenXid;
 		relform->relminmxid = cutoffMulti;
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 97e8962..8db028f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -372,11 +372,19 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 	(arrayP->numProcs - index - 1) * sizeof(int));
 			arrayP->pgprocnos[arrayP->numProcs - 1] = -1;		/* for debugging */
 			arrayP->numProcs--;
+
+			pg_atomic_write_u32(>snapshot_cached, 0);
+
+			ProcGlobal->cached_snapshot_valid = false;
 			LWLockRelease(ProcArrayLock);
 			return;
 		}
 	}
 
+	pg_atomic_write_u32(>snapshot_cached, 0);
+
+	ProcGlobal->cached_snapshot_valid = false;
+
 	/* Ooops */
 	LWLockRelease(ProcArrayLock);
 
@@ -420,6 +428,8 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
 		if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
 		{
 			ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
+			pg_atomic_write_u32(>snapshot_cached, 0);
+			ProcGlobal->cached_snapshot_valid = false;
 			LWLockRelease(ProcArrayLock);
 		}
 		else
@@ -568,6 +578,9 @@ ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
 		nextidx = pg_atomic_read_u32(>procArrayGroupNext);
 	}
 
+	pg_atomic_write_u32(>snapshot_cached, 0);
+	ProcGlobal->cached_snapshot_valid = false;
+
 	/* We're done with the lock now. */
 	LWLockRelease(ProcArrayLock);
 
@@ -1553,6 +1566,8 @@ GetSnapshotData(Snapshot snapshot)
 	 errmsg("out of memory")));
 	}
 
+	snapshot->takenDuringRecovery = RecoveryInProgress();
+
 	/*
 	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
 	 * going to set MyPgXact->xmin.
@@ -1567,12 +1582,39 @@ GetSnapshotData(Snapshot snapshot)
 	/* initialize xmin calculation with xmax */
 	globalxmin = xmin = xmax;
 
-	snapshot->takenDuringRecovery = RecoveryInProgress();
+	if (!snapshot->takenDuringRecovery && ProcGlobal->cached_snapshot_valid)
+	{
+		Snapshot csnap = >cached_snapshot;
+		TransactionId *saved_xip;
+		TransactionId *saved_subxip;
 
-	if (!snapshot->takenDuringRecovery)
+		saved_xip = snapshot->xip;
+		saved_subxip = snapshot->subxip;
+
+		memcpy(snapshot, csnap, sizeof(SnapshotData));
+
+		snapshot->xip = saved_xip;
+		snapshot->subxip = saved_subxip;
+
+		memcpy(snapshot->xip, csnap->xip,
+			   sizeof(TransactionId) * csnap->xcnt);
+		memcpy(snapshot->subxip, csnap->subxip,
+			   sizeof(TransactionId) * csnap->subxcnt);
+		globalxmin = ProcGlobal->cached_snapshot_globalxmin;
+		xmin = csnap->xmin;
+
+		count = csnap->

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-03-09 Thread Mithun Cy

On Thu, Mar 3, 2016 at 11:50 PM, Robert Haas  wrote:
>What if you apply both this and Amit's clog control log patch(es)?  Maybe
the combination of the two helps substantially more than either >one alone.


I did the above tests along with Amit's clog patch. Machine :8 socket, 64
core. 128 hyperthread.

clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
CHANGES + SAVE SNAPSHOT % Increase
64 29247.658034 30855.728835 5.4981181711 29254.532186 0.0235032562
32691.832776 11.7758992463
88 31214.305391 33313.393095 6.7247618606 32109.248609 2.8670931702
35433.655203 13.5173592978
128 30896.673949 34015.362152 10.0939285832 *** *** 34947.296355
13.110221549
256 27183.780921 31192.895437 14.7481857938 *** *** 32873.782735
20.9316056164
With clog changes, perf of caching the snapshot patch improves.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-03-18 Thread Mithun Cy

On Thu, Mar 17, 2016 at 9:00 AM, Amit Kapila 
wrote:
>If you see, for the Base readings, there is a performance increase up till
64 clients and then there is a fall at 88 clients, which to me >indicates
that it hits very high-contention around CLogControlLock at 88 clients
which CLog patch is able to control to a great degree (if >required, I
think the same can be verified by LWLock stats data).  One reason for
hitting contention at 88 clients is that this machine >seems to have
64-cores (it has 8 sockets and 8 Core(s) per socket) as per below
information of lscpu command.

I am attaching LWLock stats data for above test setups(unlogged tables).

*BASE* At 64 clients Block-time unit At 88 clients Block-time unit
ProcArrayLock 182946 117827
ClogControlLock 107420 120266
*clog patch*

ProcArrayLock 183663 121215
ClogControlLock 72806 65220
*clog patch + save snapshot*

ProcArrayLock 128260 83356
ClogControlLock 78921 74011
This is for unlogged lables, I mainly see ProcArrayLock have higher
contention at 64 clients and at 88 contention is slightly moved to other
entities.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


lwlock details.odt
Description: application/vnd.oasis.opendocument.text

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-11 Thread Mithun Cy

On Thu, Mar 10, 2016 at 9:39 PM, Robert Haas  wrote:
>I guess there must not be an occurrence of this pattern in the
>regression tests, or previous force_parallel_mode testing would have
>found this problem.  Perhaps this patch should add one?

I have added the test to select_into.sql. Added Explain select into
statement.
Explain Analyze produces planning time and execution time even with TIMING
OFF
so not adding the same to regress tests.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/src/test/regress/expected/select_into.out b/src/test/regress/expected/select_into.out
index 9d3f047..bb71260 100644
--- a/src/test/regress/expected/select_into.out
+++ b/src/test/regress/expected/select_into.out
@@ -94,3 +94,16 @@ INSERT INTO b SELECT 1 INTO f;
 ERROR:  SELECT ... INTO is not allowed here
 LINE 1: INSERT INTO b SELECT 1 INTO f;
 ^
+--
+-- EXPLAIN [ANALYZE] SELECT INTO should not use parallel scan.
+--
+CREATE TABLE mt1 (n INT);
+INSERT INTO mt1 VALUES (GENERATE_SERIES(1,500));
+ANALYZE mt1;
+EXPLAIN (COSTS OFF) SELECT INTO mt2 FROM mt1;
+   QUERY PLAN
+-
+ Seq Scan on mt1
+(1 row)
+
+DROP TABLE mt1;
diff --git a/src/test/regress/sql/select_into.sql b/src/test/regress/sql/select_into.sql
index 4d1cc86..6eb5e24 100644
--- a/src/test/regress/sql/select_into.sql
+++ b/src/test/regress/sql/select_into.sql
@@ -76,3 +76,13 @@ COPY (SELECT 1 INTO frak UNION SELECT 2) TO 'blob';
 SELECT * FROM (SELECT 1 INTO f) bar;
 CREATE VIEW foo AS SELECT 1 INTO b;
 INSERT INTO b SELECT 1 INTO f;
+
+--
+-- EXPLAIN [ANALYZE] SELECT INTO should not use parallel scan.
+--
+CREATE TABLE mt1 (n INT);
+INSERT INTO mt1 VALUES (GENERATE_SERIES(1,500));
+ANALYZE mt1;
+
+EXPLAIN (COSTS OFF) SELECT INTO mt2 FROM mt1;
+DROP TABLE mt1;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-03-11 Thread Mithun Cy

Thanks Amit,
I did a quick pgbench write tests for unlogged tables at 88 clients as it
had the peak performance from previous test. There is big jump in TPS due
to clog changes.


clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
CHANGES + SAVE SNAPSHOT % Increase
88 36055.425005 52796.618434 46.4318294034 37728.004118 4.6389111008
56025.454917 55.3870323515
Clients + WITH INCREASE IN CLOG BUFFER % increase
88 58217.924138 61.4678626862

I will continue to benchmark above tests with much wider range of clients.


On Thu, Mar 10, 2016 at 1:36 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Thu, Mar 10, 2016 at 1:04 PM, Mithun Cy <mithun...@enterprisedb.com>
> wrote:
>
>>
>>
>> On Thu, Mar 3, 2016 at 11:50 PM, Robert Haas <robertmh...@gmail.com>
>> wrote:
>> >What if you apply both this and Amit's clog control log patch(es)?
>> Maybe the combination of the two helps substantially more than either >one
>> alone.
>>
>>
>> I did the above tests along with Amit's clog patch. Machine :8 socket, 64
>> core. 128 hyperthread.
>>
>> clients BASE ONLY CLOG CHANGES % Increase ONLY SAVE SNAPSHOT % Increase CLOG
>> CHANGES + SAVE SNAPSHOT % Increase
>> 64 29247.658034 30855.728835 5.4981181711 29254.532186 0.0235032562
>> 32691.832776 11.7758992463
>> 88 31214.305391 33313.393095 6.7247618606 32109.248609 2.8670931702
>> 35433.655203 13.5173592978
>> 128 30896.673949 34015.362152 10.0939285832 *** *** 34947.296355
>> 13.110221549
>> 256 27183.780921 31192.895437 14.7481857938 *** *** 32873.782735
>> 20.9316056164
>> With clog changes, perf of caching the snapshot patch improves.
>>
>>
> This data looks promising to me and indicates that saving the snapshot has
> benefits and we can see noticeable performance improvement especially once
> the CLog contention gets reduced.  I wonder if we should try these tests
> with unlogged tables, because I suspect most of the contention after
> CLogControlLock and ProcArrayLock is for WAL related locks, so you might be
> able to see better gain with these patches.  Do you think it makes sense to
> check the performance by increasing CLOG buffers (patch for same is posted
> in Speed up Clog thread [1]) as that also relieves contention on CLOG as
> per the tests I have done?
>
>
> [1] -
> http://www.postgresql.org/message-id/caa4ek1lmmgnq439bum0lcs3p0sb8s9kc-cugu_thnqmwa8_...@mail.gmail.com
>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy

On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
>With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress

as per user manual.
Setting this value to regress has all of the same effects as setting it to
on plus some additional effect that are intended to facilitate automated
regression testing. Normally, messages from a parallel worker are prefixed
with a context line, but a setting of regress suppresses this to guarantee
reproducible results. *Also, the Gather nodes added to plans by this
setting are hidden from the EXPLAIN output so that the output matches what
would be obtained if this setting were turned off.  *

And my test is for EXPLAIN statements. I think under regress mode it will
never fail even if parallel scan is used as per above statement.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy

Sorry there was some issue with my mail settings same mail got set more
than once.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy

On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
>With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress

as per user manual.
Setting this value to regress has all of the same effects as setting it to
on plus some additional effect that are intended to facilitate automated
regression testing. Normally, messages from a parallel worker are prefixed
with a context line, but a setting of regress suppresses this to guarantee
reproducible results. *Also, the Gather nodes added to plans by this
setting are hidden from the EXPLAIN output so that the output matches what
would be obtained if this setting were turned off.  *

And my test is for EXPLAIN statements. I think under regress mode it will
never fail even if parallel scan is used as per above statement.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy

On Sat, Mar 12, 2016 at 2:32 PM, Amit Kapila 
wrote:
>With force_parallel_mode=on, I could see many other failures as well.  I
think it is better to have test, which tests this functionality with
>force_parallel_mode=regress

as per user manual.
Setting this value to regress has all of the same effects as setting it to
on plus some additional effect that are intended to facilitate automated
regression testing. Normally, messages from a parallel worker are prefixed
with a context line, but a setting of regress suppresses this to guarantee
reproducible results. *Also, the Gather nodes added to plans by this
setting are hidden from the EXPLAIN output so that the output matches what
would be obtained if this setting were turned off.  *

And my test is for EXPLAIN statements. I think under regress mode it will
never fail even if parallel scan is used as per above statement.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-12 Thread Mithun Cy

On Sat, Mar 12, 2016 at 12:28 PM, Amit Kapila 
wrote
>I don't see how this test will fail with force_parallel_mode=regress and
max_parallel_degree > 0 even without the patch proposed to fix the issue in
>hand.  In short, I don't think this test would have caught the issue, so I
don't see much advantage in adding such a test.  Even if we want to add
such a >test case, I think as proposed this will substantially increase the
timing for "Select Into" test which might not be an acceptable test case
addition >especially for testing one corner case.


Without above patch the make installcheck fails for select_into.sql with
below diff

when
force_parallel_mode = on
max_parallel_degree = 3

diff results/select_into.out expected/select_into.out

104,110c104,107

< QUERY PLAN

< 

< Gather

< Number of Workers: 1

< Single Copy: true

< -> Seq Scan on mt1

< (4 rows)

---

> QUERY PLAN

> -

> Seq Scan on mt1

> (1 row)


Again with postgresql.conf non default settings.

force_parallel_mode = on
max_parallel_degree = 3
parallel_tuple_cost = 0

[mithun@localhost regress]$ diff results/select_into.out
expected/select_into.out

104,109c104,107

< QUERY PLAN

< 

< Gather

< Number of Workers: 3

< -> Parallel Seq Scan on mt1

< (3 rows)

---

> QUERY PLAN

> -

> Seq Scan on mt1

> (1 row)

To reduce the time of execution I can set the generate_series parameter to
500, which is fast in my machine and also fails with above diff but this
time only one worker is assigned as per plan.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-03-16 Thread Mithun Cy

On Thu, Mar 3, 2016 at 6:20 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:
> I will continue to benchmark above tests with much wider range of clients.

Latest Benchmarking shows following results for unlogged tables.

clients BASE ONLY CLOG CHANGES % Increase CLOG CHANGES + SAVE SNAPSHOT %
Increase
1 1198.326337 1328.069656 10.8270439357 1234.078342 2.9834948875
32 37455.181727 38295.250519 2.2428640131 41023.126293 9.5259037641
64 48838.016451 50675.845885 3.7631123611 51662.814319 5.7840143259
88 36878.187766 53173.577363 44.1870671639 56025.454917 51.9203038731
128 35901.537773 52026.024098 44.9130798434 53864.486733 50.0339263281
256 28130.354402 46793.134156 66.3439197647 46817.04602 66.4289235427
Performance diff in 1 client seems just a run to run variance.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] Explain [Analyze] produces parallel scan for select Into table statements.

2016-03-09 Thread Mithun Cy

Hi All,

Explain [Analyze] Select Into table. produces the plan which uses
parallel scans.

*Test:*

create table table1 (n int);
insert into table1 values (generate_series(1,500));
analyze table1;

set parallel_tuple_cost=0;

set max_parallel_degree=3;

postgres=# explain select into table2 from table1;

  QUERY PLAN


---

 Gather  (cost=1000.00..39253.03 rows=500 width=0)

   Number of Workers: 3

   ->  Parallel Seq Scan on table1  (cost=0.00..38253.03 rows=1612903
width=0)

(3 rows)

-

*So Explain Analyze Fails.*

postgres=#  explain analyze select into table2 from table1;

ERROR:  cannot insert tuples during a parallel operation

STATEMENT:  explain analyze select into table2 from table1;

*But actual execution is successful.*

postgres=# select into table2 from table1;

SELECT 500

Reason is in ExplainOneQuery we unconditionally

pass CURSOR_OPT_PARALLEL_OK to pg_plan_query even if query might be from

CreateTableAs/ SelectInto. Whereas in *ExecCreateTableAs *it is always 0*.*

*Possible Fix:*

I tried to make a patch to fix this. Now in ExplainOneQuery if into clause
is

defined then parallel plans are disabled as similar to their execution.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Analyze_select_into_disable_parallel_scan.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-03-22 Thread Mithun Cy

On Thu, Mar 10, 2016 at 1:36 PM, Amit Kapila 
wrote:
>Do you think it makes sense to check the performance by increasing CLOG
buffers (patch for same is posted in Speed up Clog thread [1]) >as that
also relieves contention on CLOG as per the tests I have done?

Along with clog patches and save snapshot, I also applied Amit's increase
clog buffer patch. Below is the benchmark results.

clients BASE + WITH INCREASE IN CLOG BUFFER Patch % Increase
1 1198.326337 1275.461955 6.4369458985
32 37455.181727 41239.13593 10.102618726
64 48838.016451 56362.163626 15.4063324471
88 36878.187766 58217.924138 57.8654691695
128 35901.537773 58239.783982 62.22086182
256 28130.354402 56417.133939 100.5560723934

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-03-03 Thread Mithun Cy

On Tue, Mar 1, 2016 at 12:59 PM, Amit Kapila 
wrote:
>Don't we need to add this only when the xid of current transaction is
valid?  Also, I think it will be better if we can explain why we need to
add the our >own transaction id while caching the snapshot.
I have fixed the same thing and patch is attached.

Some more tests done after that

*pgbench write tests: on 8 socket, 64 core machine.*

/postgres -c shared_buffers=16GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

./pgbench -c $clients -j $clients -T 1800 -M prepared postgres

[image: Inline image 3]

A small improvement in performance at 64 thread.

*LWLock_Stats data:*

ProcArrayLock: Base.

=

postgresql-2016-03-01_115252.log:PID 110019 lwlock main 4: shacq 1867601
exacq 35625 blk 134682 spindelay 128 dequeue self 28871

postgresql-2016-03-01_115253.log:PID 110115 lwlock main 4: shacq 2201613
exacq 43489 blk 155499 spindelay 127 dequeue self 32751

postgresql-2016-03-01_115253.log:PID 110122 lwlock main 4: shacq 2231327
exacq 44824 blk 159440 spindelay 128 dequeue self 6

postgresql-2016-03-01_115254.log:PID 110126 lwlock main 4: shacq 2247983
exacq 44632 blk 158669 spindelay 131 dequeue self 33365

postgresql-2016-03-01_115254.log:PID 110059 lwlock main 4: shacq 2036809
exacq 38607 blk 143538 spindelay 117 dequeue self 31008


ProcArrayLock: With Patch.

=

postgresql-2016-03-01_124747.log:PID 1789 lwlock main 4: shacq 2273958
exacq 61605 blk 79581 spindelay 307 dequeue self 66088

postgresql-2016-03-01_124748.log:PID 1880 lwlock main 4: shacq 2456388
exacq 65996 blk 82300 spindelay 470 dequeue self 68770

postgresql-2016-03-01_124748.log:PID 1765 lwlock main 4: shacq 2244083
exacq 60835 blk 79042 spindelay 336 dequeue self 65212

postgresql-2016-03-01_124749.log:PID 1882 lwlock main 4: shacq 2489271
exacq 67043 blk 85650 spindelay 463 dequeue self 68401

postgresql-2016-03-01_124749.log:PID 1753 lwlock main 4: shacq 2232791
exacq 60647 blk 78659 spindelay 364 dequeue self 65180

postgresql-2016-03-01_124750.log:PID 1849 lwlock main 4: shacq 2374922
exacq 64075 blk 81860 spindelay 339 dequeue self 67584

*-Block time of ProcArrayLock has reduced significantly.*


ClogControlLock : Base.

===

postgresql-2016-03-01_115302.log:PID 110040 lwlock main 11: shacq 586129
exacq 268808 blk 90570 spindelay 261 dequeue self 59619

postgresql-2016-03-01_115303.log:PID 110047 lwlock main 11: shacq 593492
exacq 271019 blk 89547 spindelay 268 dequeue self 5

postgresql-2016-03-01_115303.log:PID 110078 lwlock main 11: shacq 620830
exacq 285244 blk 92939 spindelay 262 dequeue self 61912

postgresql-2016-03-01_115304.log:PID 110083 lwlock main 11: shacq 633101
exacq 289983 blk 93485 spindelay 262 dequeue self 62394

postgresql-2016-03-01_115305.log:PID 110105 lwlock main 11: shacq 646584
exacq 297598 blk 93331 spindelay 312 dequeue self 63279


ClogControlLock : With Patch.

===

postgresql-2016-03-01_124730.log:PID 1865 lwlock main 11: shacq 722881
exacq 330273 blk 106163 spindelay 468 dequeue self 80316

postgresql-2016-03-01_124731.log:PID 1857 lwlock main 11: shacq 713720
exacq 327158 blk 106719 spindelay 439 dequeue self 79996

postgresql-2016-03-01_124732.log:PID 1826 lwlock main 11: shacq 696762
exacq 317239 blk 104523 spindelay 448 dequeue self 79374

postgresql-2016-03-01_124732.log:PID 1862 lwlock main 11: shacq 721272
exacq 330350 blk 105965 spindelay 492 dequeue self 81036

postgresql-2016-03-01_124733.log:PID 1879 lwlock main 11: shacq 737398
exacq 335357 blk 105424 spindelay 520 dequeue self 80977

*-Block time of ClogControlLock has increased slightly.*


Will continue with further tests on lower clients.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Cache_data_in_GetSnapshotData_POC_01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Perf Benchmarking and regression.

2016-05-06 Thread Mithun Cy

I tried to do some benchmarking on postgres master head
commit 72a98a639574d2e25ed94652848555900c81a799
Author: Andres Freund 
Date:   Tue Apr 26 20:32:51 2016 -0700

CASE : Read-Write Tests when data exceeds shared buffers.

Non Default settings and test
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c $threads -j $threads -T 1800 -M prepared postgres


Machine : "cthulhu" 8 node numa machine with 128 hyper threads.
>numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103
node 0 size: 65498 MB
node 0 free: 37885 MB
node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111
node 1 size: 65536 MB
node 1 free: 31215 MB
node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119
node 2 size: 65536 MB
node 2 free: 15331 MB
node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127
node 3 size: 65536 MB
node 3 free: 36774 MB
node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40
node 4 size: 65536 MB
node 4 free: 62 MB
node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48
node 5 size: 65536 MB
node 5 free: 9653 MB
node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56
node 6 size: 65536 MB
node 6 free: 50209 MB
node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64
node 7 size: 65536 MB
node 7 free: 43966 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  21  21  21  21  21  21  21
  1:  21  10  21  21  21  21  21  21
  2:  21  21  10  21  21  21  21  21
  3:  21  21  21  10  21  21  21  21
  4:  21  21  21  21  10  21  21  21
  5:  21  21  21  21  21  10  21  21
  6:  21  21  21  21  21  21  10  21
  7:  21  21  21  21  21  21  21  10


I see some regression when compared to 9.5

*Sessions* *PostgreSQL-9.5 scale 1000* *PostgreSQL-9.6 scale 1000* %diff
*1* 747.367249 892.149891 19.3723557185
*8* 5281.282799 4941.905008 -6.4260484416
*16* 9000.915419 8695.396233 -3.3943123758
*24* 11852.839627 10843.328776 -8.5170379653
*32* 14323.048334 11977.505153 -16.3760054864
*40* 16098.926583 12195.447024 -24.2468312336
*48* 16959.646965 12639.951087 -25.4704351271
*56* 17157.737762 12543.212929 -26.894715941
*64* 17201.914922 12628.002422 -26.5895542487
*72* 16956.994835 11280.870599 -33.4736448954
*80* 16775.954896 11348.830603 -32.3506132834
*88* 16609.137558 10823.465121 -34.834273705
*96* 16510.099404 11091.757753 -32.8183466278
*104* 16275.724927 10665.743275 -34.4683980416
*112* 16141.815128 10977.84664 -31.9912503461
*120* 15904.086614 10716.17755 -32.6199749153
*128* 15738.391503 10962.333439 -30.3465450271

When I run git bisect on master (And this is for 128 clients).

2 commitIds which affected the performance

1. # first bad commit: [ac1d7945f866b1928c2554c0f80fd52d7f92] Make idle
backends exit if the postmaster dies.
this made performance to drop from

15947.21546 (15K +) to 13409.758510 (arround 13K+).

2. # first bad commit: [428b1d6b29ca599c5700d4bc4f4ce4c5880369bf] Allow to
trigger kernel writeback after a configurable number of writes.

this made performance to drop further to 10962.333439 (10K +)

I think It did not recover afterwards.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Perf Benchmarking and regression.

2016-05-06 Thread Mithun Cy

On Fri, May 6, 2016 at 8:35 PM, Andres Freund  wrote:
> Also, do you see read-only workloads to be affected too?
Thanks, I have not tested with above specific commitid which reported
performance issue but
At HEAD commit 72a98a639574d2e25ed94652848555900c81a799
Author: Andres Freund 
Date:   Tue Apr 26 20:32:51 2016 -0700

READ-Only (prepared) tests (both when data fits to shared buffers or it
exceeds shared buffer=8GB) performance of master has improved over 9.5

*Sessions* *PostgreSQL-9.5 scale 300* *PostgreSQL-9.6 scale 300* *%diff*
*1* 5287.561594 5213.723197 -1.396454598
*8* 84265.389083 84871.305689 0.719057507
*16* 148330.4155 158661.128315 6.9646624936
*24* 207062.803697 219958.12974 6.2277366155
*32* 265145.089888 290190.501443 9.4459269699
*40* 311688.752973 34.551772 9.0833559212
*48* 327169.9673 372408.073033 13.8270960829
*56* 274426.530496 390629.24948 42.3438356248
*64* 261777.692042 384613.9666 46.9238893505
*72* 210747.55937 376390.162022 78.5976374517
*80* 220192.818648 398128.779329 80.8091570713
*88* 185176.91888 423906.711882 128.9198429512
*96* 161579.719039 421541.656474 160.8877271115
*104* 146935.568434 450672.740567 206.7145316618
*112* 136605.466232 432047.309248 216.2738074582
*120* 127687.175016 455458.086889 256.6983816753
*128* 120413.936453 428127.879242 255.5467845776

*Sessions* *PostgreSQL-9.5 scale 1000* *PostgreSQL-9.6 scale 1000* %diff
*1* 5103.812202 5155.434808 1.01145191
*8* 47741.9041 53117.805096 11.2603405694
*16* 89722.57031 86965.10079 -3.0733287182
*24* 130914.537373 153849.634245 17.5191367836
*32* 197125.725706 212454.474264 7.7761279017
*40* 248489.551052 270304.093767 8.7788571482
*48* 291884.652232 317257.836746 8.6928806705
*56* 304526.216047 359676.785476 18.1102862489
*64* 301440.463174 388324.710185 28.8230206709
*72* 194239.941979 393676.628802 102.6754254511
*80* 144879.527847 383365.678053 164.6099719885
*88* 122894.325326 372905.436117 203.4358463076
*96* 109836.31148 362208.867756 229.7715144249
*104* 103791.981583 352330.402278 239.4582094921
*112* 105189.206682 345722.499429 228.6672752217
*120* 108095.811432 342597.969088 216.939171416
*128* 113242.59492 333821.98763 194.7848270925

Even for READ-WRITE when data fits into shared buffer (scale_factor=300 and
shared_buffers=8GB) performance has improved.
Only case is when data exceeds shared_buffer(scale_factor=1000 and
shared_buffers=8GB) I see some regression.

I will try to run the tests as you have suggested and will report the same.


Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] Avoid parallel full and right join paths.

2016-04-19 Thread Mithun Cy

Tests:
create table mytab(x int,x1 char(9),x2 varchar(9));
create table mytab1(y int,y1 char(9),y2 varchar(9));
insert into mytab values (generate_series(1,5),'aa','aaa');
insert into mytab1 values (generate_series(1,1),'aa','aaa');
insert into mytab values (generate_series(1,50),'aa','aaa');
insert into mytab values (generate_series(1,50),'aa','aaa');
analyze mytab;
analyze mytab1;
vacuum mytab;
vacuum mytab1;

set max_parallel_degree=0;
SET
df=# SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;
count
---
3
(1 row)

# set max_parallel_degree=5;
SET
df=# SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;
count
---
39089
(1 row)

Casue:
==
Normal plan
==
explain SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;postgres-#
QUERY PLAN
--
Aggregate (cost=21682.71..21682.72 rows=1 width=8)
-> Hash Right Join (cost=289.00..21629.07 rows=21457 width=0)
Hash Cond: (mytab.x = mytab1.y)
-> Seq Scan on mytab (cost=0.00..17188.00 rows=105 width=4)
-> Hash (cost=164.00..164.00 rows=1 width=4)
-> Seq Scan on mytab1 (cost=0.00..164.00 rows=1 width=4)
=

Parallel plan.
==
explain SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;postgres-#
QUERY PLAN
---
Finalize Aggregate (cost=14135.88..14135.89 rows=1 width=8)
-> Gather (cost=14135.67..14135.88 rows=2 width=8)
Number of Workers: 2
-> Partial Aggregate (cost=13135.67..13135.68 rows=1 width=8)
-> Hash Right Join (cost=289.00..13082.02 rows=21457 width=0)
Hash Cond: (mytab.x = mytab1.y)
-> Parallel Seq Scan on mytab (cost=0.00..11063.00 rows=437500 width=4)
-> Hash (cost=164.00..164.00 rows=1 width=4)
-> Seq Scan on mytab1 (cost=0.00..164.00 rows=1 width=4)


As above Right and Full join paths cannot be parallel as they can produce
false null extended rows because outer table is partial path and not
completely visible.
Adding a patch to fix same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


avoid_parallel_full_right_join.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-08-04 Thread Mithun Cy

I did some basic testing of same. In that I found one issue with cursor.

+BEGIN;

+SET enable_seqscan = OFF;

+SET enable_bitmapscan = OFF;

+CREATE FUNCTION declares_cursor(int)

+ RETURNS void

+ AS 'DECLARE c CURSOR FOR SELECT * from con_hash_index_table WHERE keycol
= $1;'

+LANGUAGE SQL;

+

+SELECT declares_cursor(1);

+MOVE FORWARD ALL FROM c;

+MOVE BACKWARD 1 FROM c;

+ CLOSE c;

+ WARNING: buffer refcount leak: [5835] (rel=base/16384/30537,
blockNum=327, flags=0x9380, refcount=1 1)

ROLLBACK;

closing the cursor comes with the warning which say we forgot to unpin the
buffer.

I have also added tests [1] for coverage improvements.

[1] Some tests to cover hash_index.



On Thu, Jul 14, 2016 at 4:33 PM, Amit Kapila 
wrote:

> On Wed, Jun 22, 2016 at 8:48 PM, Robert Haas 
> wrote:
> > On Wed, Jun 22, 2016 at 5:14 AM, Amit Kapila 
> wrote:
> >> We can do it in the way as you are suggesting, but there is another
> thing
> >> which we need to consider here.  As of now, the patch tries to finish
> the
> >> split if it finds split-in-progress flag in either old or new bucket.
> We
> >> need to lock both old and new buckets to finish the split, so it is
> quite
> >> possible that two different backends try to lock them in opposite order
> >> leading to a deadlock.  I think the correct way to handle is to always
> try
> >> to lock the old bucket first and then new bucket.  To achieve that, if
> the
> >> insertion on new bucket finds that split-in-progress flag is set on a
> >> bucket, it needs to release the lock and then acquire the lock first on
> old
> >> bucket, ensure pincount is 1 and then lock new bucket again and ensure
> that
> >> pincount is 1. I have already maintained the order of locks in scan (old
> >> bucket first and then new bucket; refer changes in _hash_first()).
> >> Alternatively, we can try to  finish the splits only when someone tries
> to
> >> insert in old bucket.
> >
> > Yes, I think locking buckets in increasing order is a good solution.
> > I also think it's fine to only try to finish the split when the insert
> > targets the old bucket.  Finishing the split enables us to remove
> > tuples from the old bucket, which lets us reuse space instead of
> > accelerating more.  So there is at least some potential benefit to the
> > backend inserting into the old bucket.  On the other hand, a process
> > inserting into the new bucket derives no direct benefit from finishing
> > the split.
> >
>
> Okay, following this suggestion, I have updated the patch so that only
> insertion into old-bucket can try to finish the splits.  Apart from
> that, I have fixed the issue reported by Mithun upthread.  I have
> updated the README to explain the locking used in patch.   Also, I
> have changed the locking around vacuum, so that it can work with
> concurrent scans when ever possible.  In previous patch version,
> vacuum used to take cleanup lock on a bucket to remove the dead
> tuples, moved-due-to-split tuples and squeeze operation, also it holds
> the lock on bucket till end of cleanup.  Now, also it takes cleanup
> lock on a bucket to out-wait scans, but it releases the lock as it
> proceeds to clean the overflow pages.  The idea is first we need to
> lock the next bucket page and  then release the lock on current bucket
> page.  This ensures that any concurrent scan started after we start
> cleaning the bucket will always be behind the cleanup.  Allowing scans
> to cross vacuum will allow it to remove tuples required for sanctity
> of scan.  Also for squeeze-phase we are just checking if the pincount
> of buffer is one (we already have Exclusive lock on buffer of bucket
> by that time), then only proceed, else will try to squeeze next time
> the cleanup is required for that bucket.
>
> Thoughts/Suggestions?
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] "Some tests to cover hash_index"

2016-08-04 Thread Mithun Cy

I am attaching the patch to improve some coverage of hash index code [1].
I have added some basic tests, which mainly covers overflow pages. It took
2 sec extra time in my machine in parallel schedule.




Hit Total Coverage
old tests Line Coverage 780 1478 52.7

Function Coverage 63 85 74.1
improvement after tests Line Coverage 1181 1478 79.9 %

Function Coverage 78 85 91.8 %


[1]Concurrent Hash Index. 

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


commit-hash_coverage_test
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cache Hash Index meta page.

2016-07-22 Thread Mithun Cy

I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.

Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.

I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.

[1] Concurrent Hash Indexes 


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


hash_index_perf_stat_and_benchmarking.odc
Description: application/vnd.oasis.opendocument.chart


cache_hash_index_metapage_base_01
Description: Binary data


cache_hash_index_metapage_onAmit_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-31 Thread Mithun Cy

On Tue, Jan 31, 2017 at 9:47 PM, Robert Haas  wrote:
> Now, I assume that this patch sorts the I/O (although I haven't
> checked that) and therefore I expect that the prewarm completes really
> fast.  If that's not the case, then that's bad.  But if it is the
> case, then it's not really hurting you even if the workload changes
> completely.

-- JFYI Yes in the patch we load the sorted
.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-06 Thread Mithun Cy

Hi all,
Here is the new patch which fixes all of above comments, I changed the
design a bit now as below

What is it?
===
A pair of bgwrokers one which automatically dumps buffer pool's block
info at a given interval and another which loads those block into
buffer pool when
the server restarts.

How does it work?
=
When the shared library pg_prewarm is preloaded during server startup.
A bgworker "auto pg_prewarm load" is launched immediately after the
server is started. The bgworker will start loading blocks obtained
from block info entry
 in
$PGDATA/AUTO_PG_PREWARM_FILE, until there is a free buffer in the
buffer pool. This way we do not replace any new blocks which were
loaded either by the recovery process or the querying clients.

Once the "auto pg_prewarm load" bgworker has completed its job, it
will register a dynamic bgworker "auto pg_prewarm dump" which has to
be launched
when the server reaches a consistent state. The new bgworker will
periodically scan the buffer pool and then dump the meta info of
blocks
which are currently in the buffer pool. The GUC
pg_prewarm.dump_interval if set > 0 indicates the minimum time
interval between two dumps. If
pg_prewarm.dump_interval is set to AT_PWARM_DUMP_AT_SHUTDOWN_ONLY the
bgworker will only dump at the time of server shutdown. If it is set
to AT_PWARM_LOAD_ONLY we do not want the bgworker to dump anymore, so
it stops there.

To relaunch a stopped "auto pg_prewarm dump" bgworker we can use the
utility function "launch_pg_prewarm_dump".

==
One problem now I have kept it open is multiple "auto pg_prewarm dump"
can be launched even if already a dump/load worker is running by
calling launch_pg_prewarm_dump. I can avoid this by dropping a
lock-file before starting the bgworkers. But, if there is an another
method to avoid launching bgworker on a simple method I can do same.
Any suggestion on this will be very helpful.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


pg_auto_prewarm_03.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Mithun Cy

On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson  wrote:
> launched by other  applications. Also with max_worker_processes = 2 and
> restart, the system crashes when the 2nd worker is not launched:
> 2017-02-07 11:36:39.132 IST [20573] LOG:  auto pg_prewarm load : number of
> buffers actually tried to load 64
> 2017-02-07 11:36:39.143 IST [18014] LOG:  worker process: auto pg_prewarm
> load (PID 20573) was terminated by signal 11: Segmentation fault

SEGFAULT was the coding mistake I have called the C-language function
directly without initializing the functioncallinfo. Thanks for
raising. Below patch fixes same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


pg_auto_prewarm_04.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Mithun Cy

Thanks Beena,

On Tue, Feb 7, 2017 at 4:46 PM, Beena Emerson  wrote:
> Few more comments:
>
> = Background worker messages:
>
> - Workers when launched, show messages like: "logical replication launcher
> started”, "autovacuum launcher started”. We should probably have a similar
> message to show that the pg_prewarm load and dump bgworker has started.

-- Thanks, I will add startup and shutdown message.

> - "auto pg_prewarm load: number of buffers to load x”, other messages show
> space before and after “:”, we should keep it consistent through out.
>

-- I think you are testing patch 03. The latest patch_04 have
corrected same. Can you please re-test it.

>
> = Action of -1.
> I thought we decided that interval value of -1 would mean that the auto
> prewarm worker will not be run at all. With current code, -1 is explained to
> mean it will not dump. I noticed that reloading with new option as -1 stops
> both the workers but restarting loads the data and then quits. Why does it
> allow loading with -1? Please explain this better in the documents.
>

-- '-1' means we do not want to dump at all. So we decide not to
continue with launched bgworker and it exits. As per your first
comment, if I register the startup and shutdown messages for auto
pg_prewarm I think it will look better. Will try to explain it in a
better way in documentation. The "auto pg_prewarm load" will not be
affected with dump_interval value. It will always start, load(prewarm)
and then exit.

>
> = launch_pg_prewarm_dump()

> =# SELECT launch_pg_prewarm_dump();
>  launch_pg_prewarm_dump
> 
>   53552
> (1 row)
>
> $ ps -ef | grep 53552
> b_emers+  53555   4391  0 16:21 pts/100:00:00 grep --color=auto 53552

-- If dump_interval = -1 "auto pg_prewarm dump" will exit immediately.

> = Function names
> - load_now could be better named as load_file, load_dumpfile or similar.
> - dump_now -> dump_buffer or  better?

I did choose load_now and dump_now to indicate we are doing it
immediately as invoking them was based on a timer/state. Probably we
can improve that but dump_buffer, load_file may not be the right
replacement.

>
> = Corrupt file
> if the dump file is corrupted, the system crashes and the prewarm bgworkers
> are not restarted. This needs to be handled better.
>
> WARNING:  terminating connection because of crash of another server process
> 2017-02-07 16:36:58.680 IST [54252] DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory

--- Can you please paste you autopgprewarm.save file, I edited the
file manually to some illegal entry but did not see any crash.  Only
we failed to load as block number were invalid. Please share your
tests so that I can reproduce same.

> = Documentation
>
> I feel the documentation needs to be improved greatly.
>
> - The first para in pg_prewarm should mention the autoload feature too.
>
> - The new section should be named “The pg_prewarm autoload” or something
> better. "auto pg_prewarm bgworker” does not seem appropriate.  The
> configuration parameter should also be listed out clearly like in auth-delay
> page. The new function launch_pg_prewarm_dump() should be listed under
> Functions.

-- Thanks I will try to improve the documentation. And, put more details there.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Mithun Cy

On Tue, Feb 7, 2017 at 12:24 PM, Amit Kapila  wrote:
> On Tue, Feb 7, 2017 at 11:53 AM, Beena Emerson  
> wrote:
>> Are 2 workers required?
>>
>
> I think in the new design there is a provision of launching the worker
> dynamically to dump the buffers, so there seems to be a need of
> separate workers for loading and dumping the buffers.  However, there
> is no explanation in the patch or otherwise when and why this needs a
> pair of workers.  Also, if the dump interval is greater than zero,
> then do we really need to separately register a dynamic worker?

We have introduced a new value  -1 for pg_prewarm.dump_interval this
means we will not dump at all, At this state, I thought auto
pg_prewarm process need not run at all, so I coded to exit the auto
pg_prewarm immediately. But If the user decides to start the auto
pg_prewarm to dump only without restarting the server, I have
introduced a launcher function "launch_pg_prewarm_dump" to restart the
auto pg_prewarm only to dump. Since now we can launch worker only to
dump, I thought we can redistribute the code between two workers, one
which only does prewarm (load only) and another dumps periodically.
This helped me to modularize and reuse the code. So once load worker
has finished its job, it registers a dump worker and then exists.
But if max_worker_processes is not enough to launch the "auto
pg_prewarm dump" bgworker
We throw an error
2017-02-07 14:51:59.789 IST [50481] ERROR:  registering dynamic
bgworker "auto pg_prewarm dump" failed c
2017-02-07 14:51:59.789 IST [50481] HINT:  Consider increasing
configuration parameter "max_worker_processes".

Now thinking again instead of such error and then correcting same by
explicitly launching the auto pg_prewarm dump bgwroker through
launch_pg_prewarm_dump(), I can go back to original design where there
will be one worker which loads and then dumps periodically. And
launch_pg_prewarm_dump will relaunch dump only activity of that
worker. Does this sound good?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-07 Thread Mithun Cy

On Tue, Feb 7, 2017 at 6:11 PM, Beena Emerson  wrote:
>  Yes it would be better to have only one pg_prewarm worker as the loader is
> idle for the entire server run time after the initial load activity of few
> secs.
Sorry, that is again a bug in the code. The code to handle SIGUSR1
somehow got deleted before I submitted patch_03 and I failed to notice
same.
As in the code loader bgworker is waiting on the latch to know the
status of dump bgworker. Actually, the loader bgworker should exit
right after launching the dump bgworker. I will try to fix this and
other comments given by you in my next patch.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-02-07 Thread Mithun Cy

On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers  wrote:
> On 2017-02-07 18:41, Robert Haas wrote:
>>
>> Committed with some changes (which I noted in the commit message).

Thanks, Robert and all who have reviewed the patch and given their
valuable comments.

> This has caused a warning with gcc 6.20:
>
> hashpage.c: In function ‘_hash_getcachedmetap’:
> hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
> rel->rd_amcache = cache;
> ^~~

Yes, I also see the warning.  I think the compiler is not able to see
cache is always initialized and used under condition if
(rel->rd_amcache == NULL).
I think to make the compiler happy we can initialize the cache with
NULL when it is defined.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

cache_metap_compiler_warning01
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-24 Thread Mithun Cy

On Fri, Oct 28, 2016 at 6:36 AM, Tsunakawa, Takayuki
 wrote:
> I welcome this feature!  I remember pg_hibernate did this.   I wonder what 
> happened to pg_hibernate.  Did you check it?
Thanks, when I checked with pg_hibernate there were two things people
complained about it. Buffer loading will start after the recovery is
finished and the database has reached the consistent state. Two It can
replace existing buffers which are loaded due to recovery and newly
connected clients. And this solution tried to solve them.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-24 Thread Mithun Cy

On Tue, Jan 24, 2017 at 5:07 AM, Jim Nasby  wrote:
> I took a look at this again, and it doesn't appear to be working for me. The 
> library is being loaded during startup, but I don't see any further activity 
> in the log, and I don't see an autoprewarm file in $PGDATA.

Hi Jim,
Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file " autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG:  Buffer Dump: saved metadata
of 59 blocks.

In my code by default, we only dump at shutdown time. If we want to
dump at regular interval then we need to set the GUC
pg_autoprewarm.buff_dump_interval to > 0. I think I am missing
something while trying to recreate the bug reported above. Can you
please let me know what exactly you mean by the library is not
working.

> There needs to be some kind of documentation change as part of this patch.
Thanks, I will add a sgml for same.

For remaining suggestions, I will try to address in my next patch
based on its feasibility.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-26 Thread Mithun Cy

On Tue, Jan 24, 2017 at 3:10 PM, Amit Kapila  wrote:
> 1.
> @@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,

> In the above flow, do we really need an updated metapage, can't we use
> the cached one?  We are already taking care of bucket split down in
> that function.

Yes, we can use the old cached metap entry, the only reason I decided
to use the latest metapage content is because the old code used to do
that. And, cached metap is used to avoid ad-hoc local saving of same
and hence unify the cached metap API. I did not intend to save the
metapage read here which I thought will not be much useful if new
buckets are added anyway we need to read the metapage at the end. I
have taken you comments now I only read metap cache which is already
set.

> 2.
> The above two chunks of code look worse as compare to your previous
> patch.  I think what we can do is keep the patch ready with both the
> versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
> and let committer take the final call.

_v11 API's was self-sustained one but it does not hold pins on the
metapage buffer. Whereas in _v12 we hold the pin for two consecutive
reads of metapage. I have taken your advice and producing 2 different
patches.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

cache_hash_index_meta_page_13_donotholdpin.patch
Description: Binary data

cache_hash_index_meta_page_13_holdpin.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-26 Thread Mithun Cy

On Thu, Jan 26, 2017 at 8:45 PM, Peter Eisentraut
 wrote:
> Just a thought with an additional use case:  If I want to set up a
> standby for offloading queries, could I take the dump file from the
> primary or another existing standby, copy it to the new standby, and
> have it be warmed up to the state of the other instance from that?
>
> In my experience, that kind of use is just as interesting as preserving
> the buffers across a restart.

Initially, I did not think about this thanks for asking. For now, we
dump the buffer pool info in the format
; If BlockNum in
new standby correspond to the same set of rows as it was with the
server where the dump was produced, I think we can directly use the
dump file in new standby. All we need to do is just drop the ".save"
file in data-directory and preload the library. Buffer pool will be
warmed with blocks mentioned in ".save".

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-24 Thread Mithun Cy

On Thu, Jan 19, 2017 at 5:08 PM, Ashutosh Sharma  wrote:

Thanks, Ashutosh and Jesper. I have tested the patch I do not have any
more comments so making it ready for committer.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-29 Thread Mithun Cy

> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
> force_refresh);
>
> If the cache is initialized and force_refresh is not true, then this
> just returns the cached data, and the metabuf argument isn't used.
> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
> the metabuffer, pin and lock it, use it to set the cache, release the
> lock, and return with the pin still held.  If *metabuf !=
> InvalidBuffer, we assume it's pinned and return with it still pinned.

Thanks, Robert I have made a new patch which tries to do same. Now I
think code looks less complicated.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-02-21 Thread Mithun Cy

Thanks, Amit

On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila  wrote:
> How will high and lowmask calculations work in this new strategy?
> Till now they always work on doubling strategy and I don't see you
> have changed anything related to that code.  Check below places.

It is important that the mask has to be (2^x) -1, if we have to retain
the same hash map function. So mask variables will take same values as
before. Only place I think we need change is  _hash_metapinit();
unfortunately, I did not test for the case where we build the hash
index on already existing tuples. Now I have fixed in the latest
patch.

> Till now, we have worked hard for not changing the page format in a
> backward incompatible way, so it will be better if we could find some
> way of doing this without changing the meta page format in a backward
> incompatible way.

We are not adding any new variable/ deleting some, we increase the
size of hashm_spares and hence mapping functions should be adjusted.
The problem is the block allocation, and its management is based on
the fact that all of the buckets(will be 2^x in number) belonging to a
particular split-point is allocated at once and together. The
hashm_spares is used to record those allocations and that will be used
further by map functions to reach a particular block in the file. If
we want to change the way we allocate the buckets then hashm_spares
will change and hence mapping function. So I do not think we can avoid
incompatibility issue.

One thing I can think of is if we can increase the hashm_version of
hash index; then for old indexes, we can continue to use doubling
method and its mapping. For new indexes, we can use new way as above.

Have you considered to store some information in
> shared memory based on which we can decide how much percentage of
> buckets are allocated in current table half?  I think we might be able
> to construct this information after crash recovery as well.

I think all of above data has to be persistent. I am not able to
understand what should be/can be stored in shared buffers. Can you
please correct me if I am wrong?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

expand_hashbucket_efficiently_02
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-02-21 Thread Mithun Cy

On Fri, Feb 10, 2017 at 1:06 AM, Robert Haas  wrote:

> Alright, committed with a little further hacking.

I did pull the latest code, and tried
Test:

create table t1(t int);
create index i1 on t1 using hash(t);
insert into t1 select generate_series(1, 1000);

postgres=# SELECT spares FROM hash_metapage_info(get_raw_page('i1', 0));
   spares

 {0,0,0,1,9,17,33,65,-127,1,1,0,1,-1,-1,-4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}

spares are showing negative numbers; I think the wrong type has been
chosen, seems it is rounding at 127, spares are defined as following
uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before each
splitpoint */

it should be always positive.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-02-22 Thread Mithun Cy

Hi all thanks,
I have tried to fix all of the comments given above with some more
code cleanups.

On Wed, Feb 22, 2017 at 6:28 AM, Robert Haas  wrote:
> I think it's OK to treat that as something of a corner case.  There's
> nothing to keep you from doing that today: just use pg_buffercache to
> dump a list of blocks on one server, and then pass those blocks to
> pg_prewarm on another server.  The point of this patch, AIUI, is to
> automate a particularly common case of that, which is to dump before
> shutdown and load upon startup.  It doesn't preclude other things that
> people want to do.
>
> I suppose we could have an SQL-callable function that does an
> immediate dump (without starting a worker).  That wouldn't hurt
> anything, and might be useful in a case like the one you mention.

In the latest patch, I have moved the things back as in old ways there
will be one bgworker "auto pg_prewarm" which automatically records
information about blocks which were present in buffer pool before
server shutdown and then prewarm the buffer pool upon server restart
with those blocks. I have reverted back the code which helped us to
launch the stopped "auto pg_prewarm" bgworker. The reason I introduced
a launcher SQL utility function is the running auto pg_prewarm can be
stopped by the user by setting dump_interval to -1. So if the user
wants to restart the stopped auto pg_prewarm(this time dump only to
prewarm on next restart), he can use that utility. The user can launch
the auto pg_prewarm to dump periodically while the server is still
running. If that was not the concern I think I misunderstood the
comments and overdid the design. So as a first patch I will keep the
things simple. Also,
using a separate process for prewarm and dump activity was a bad
design hence reverted back same. The auto pg_prewarm can only be
launched by preloading the library. And I can add additional
utilities, once we can formalize what is really needed out of it.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

pg_auto_prewarm_05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [POC] A better way to expand hash indexes.

2017-02-17 Thread Mithun Cy

Hi all,

As of now, we expand the hash index by doubling the number of bucket
blocks. But unfortunately, those blocks will not be used immediately.
So I thought if we can differ bucket block allocation by some factor,
hash index size can grow much efficiently. I have written a POC patch
which does following things -
Say at overflow point 'i' we need to add new "x = 2^(i-1)" buckets as
per the old code, I think we can do this addition of buckets in a
controlled way. Instead of adding all of 'x' bucket blocks at a time,
the patch will add x/4 blocks at a time. And, once those blocks are
consumed, it adds next installment of x/4 blocks. And, this will
continue until all of 'x' blocks of the overflow point 'i' is
allocated. My test result shows index size grows in a more efficient
way with above patch.

Note: This patch is just a POC. It can have bugs and I have to change
comments in the code, and README which are related to new changes.

Test:
create table t1(t int);
create index i1 on t1 using hash(t);
And then, Incrementally add rows as below.
insert into t1 select generate_series(1, 1000);


recordsbase index  new patch
in million  --  size(MB) --  index size(MB)

10384  384

20768  768

301417   1161

401531   1531

502556   1788

602785   2273

702963   2709

80   30603061

90   5111 3575

100 5111 3575

To implement such an incremental addition of bucket blocks I have to
increase the size of array hashm_spares in meta page by four times.
Also, mapping methods which map a total number of buckets to a
split-point position of hashm_spares array, need to be changed. These
changes create backward compatibility issues.

Implementation Details in brief:
===
Each element of hashm_spares (we call overflow point) is expanded into
4 slots {0, 1, 2, 3}. If 'x' (a power2 number) is the total number of
buckets to be added before the overflow point we add only a quarter of
it (x/4) to each slot, once we have consumed previously added blocks
we add next quarter of buckets and so on.
As in old code new hashm_spares[i] stores total overflow pages
allocated between those bucket allocation.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


expand_hashbucket_efficiently_01
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-17 Thread Mithun Cy

On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila  wrote:
> 1.
> (a) I think you can retain the previous comment or modify it slightly.
> Just removing the whole comment and replacing it with a single line
> seems like a step backward.

-- Fixed, Just modified to say it

> (b) Another somewhat bigger problem is that with this new change it
> won't retain the pin on meta page till the end which means we might
> need to perform an I/O again during operation to fetch the meta page.
> AFAICS, you have just changed it so that you can call new API
> _hash_getcachedmetap, if that's true, then I think you have to find
> some other way of doing it.  BTW, why can't you design your new API
> such that it always take pinned metapage?  You can always release the
> pin in the caller if required.  I understand that you don't always
> need a metapage in that API, but I think the current usage of that API
> is also not that good.


-- Yes what you say is right. I wanted to make _hash_getcachedmetap
API self-sufficient. But always 2 possible consecutive reads for
metapage are connected as we pin the page to buffer to avoid I/O. Now
redesigned the API such a way that caller pass pinned meta page if we
want to set the metapage cache; So this gives us the flexibility to
use the cached meta page data in different places.


> 2.
> + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
> + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
> + cachedmetap = _hash_getcachedmetap(rel, true);
>
> I don't understand the meaning of above if check.  It seems like you
> will update the metapage when previous block number is not a valid
> block number which will be true at the first split.  How will you
> ensure that there is a re-split and cached metapage is not relevant.
> I think if there is && in the above condition, then we can ensure it.
>

-- Oops that was a mistake corrected as you stated.

> 3.
> + Given a hashkey get the target bucket page with read lock, using cached
> + metapage. The getbucketbuf_from_hashkey method below explains the same.
> +
>
> All the sentences in algorithm start with small letters, then why do
> you need an exception for this sentence.  I think you don't need to
> add an empty line. Also, I think the usage of
> getbucketbuf_from_hashkey seems out of place.  How about writing it
> as:
>
> The usage of cached metapage is explained later.
>

-- Fixed as like you have asked.

>
> 4.
> + If target bucket is split before metapage data was cached then we are
> + done.
> + Else first release the bucket page and then update the metapage cache
> + with latest metapage data.
>
> I think it is better to retain original text of readme and add about
> meta page update.
>

-- Fixed. Now where ever it is meaning full I have kept original
wordings. But, the way we get to right target buffer after the latest
split is slightly different than what the original code did. So there
is a slight modification to show we use metapage cache.

> 5.
> + Loop:
> ..
> ..
> + Loop again to reach the new target bucket.
>
> No need to write "Loop again ..", that is implicit.
>

-- Fixed as liked you have asked.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Tuple sort is broken. It crashes on simple test.

2017-01-16 Thread Mithun Cy

Test:
--
create table seq_tab(t int);
insert into seq_tab select generate_series(1, 1000);
select count(distinct t) from seq_tab;

#0  0x0094a9ad in pfree (pointer=0x0) at mcxt.c:1007
#1  0x00953be3 in mergeonerun (state=0x1611450) at tuplesort.c:2803
#2  0x00953824 in mergeruns (state=0x1611450) at tuplesort.c:2721
#3  0x009521bc in tuplesort_performsort (state=0x1611450) at
tuplesort.c:1813
#4  0x00662b85 in process_ordered_aggregate_single
(aggstate=0x160b540, pertrans=0x160d440, pergroupstate=0x160d3f0) at
nodeAgg.c:1178
#5  0x006636e0 in finalize_aggregates (aggstate=0x160b540,
peraggs=0x160c5e0, pergroup=0x160d3f0, currentSet=0) at nodeAgg.c:1600
#6  0x00664509 in agg_retrieve_direct (aggstate=0x160b540) at
nodeAgg.c:2266
#7  0x00663f66 in ExecAgg (node=0x160b540) at nodeAgg.c:1946
#8  0x00650ad2 in ExecProcNode (node=0x160b540) at execProcnode.c:503

Git bisect shows me the following commit has caused it.

commit Id e94568ecc10f2638e542ae34f2990b821bbf90ac

Author: Heikki Linnakangas   2016-10-03 16:07:49
Committer: Heikki Linnakangas   2016-10-03 16:07:49

Change the way pre-reading in external sort's merge phase works.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-16 Thread Mithun Cy

>
> 7. I think it is not your bug, but probably a bug in Hash index
> itself; page flag is set to 66 (for below test); So the page is both
> LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index?
>
> I have inserted 3000 records. Hash index is on integer column.
> select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1));
>  hasho_flag
> 
>  66
>

Here is the test for same. After insertion of 3000 records, I think at
first split we can see bucket page flag is set with LH_BITMAP_PAGE.

create table t1( ti int);
create index i1 on t1 using hash(ti);
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

  2
postgres=# insert into t1 select generate_series(1, 1000);
INSERT 0 1000
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

  2
(1 row)

postgres=# insert into t1 select generate_series(1, 1000);
INSERT 0 1000
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

  2
(1 row)

postgres=# insert into t1 select generate_series(1, 1000);
INSERT 0 1000
postgres=# select hasho_flag from hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

 66
(1 row)

I think If this is a bug then example given in pageinspect.sgml should
be changed in your patch after the bug fix.
+hasho_prevblkno | -1
+hasho_nextblkno | 8474
+hasho_bucket| 0
+hasho_flag  | 66
+hasho_page_id   | 65408



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-17 Thread Mithun Cy

On Tue, Jan 17, 2017 at 3:06 PM, Amit Kapila  wrote:
>
> I think your calculation is not right.  66 indicates LH_BUCKET_PAGE |
> LH_BUCKET_NEEDS_SPLIT_CLEANUP which is a valid state after the split.
> This flag will be cleared either during next split or when vacuum
> operates on that index page.

Oops, I errored in decimal to binary conversion. Sorry, I was wrong.
What you said is correct. No need to change .sgml file.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLocks in DSM memory

2016-08-19 Thread Mithun Cy

On Fri, Aug 19, 2016 at 4:44 PM, Amit Kapila 
wrote:
>Can you specify the m/c details as Andres wants tests to be conducted on
some high socket m/c?

As I have specified at the last line of my mail it is a 8 socket intel
machine.

available: 8 nodes (0-7)
node 0 cpus: 0 65 66 67 68 69 70 71 96 97 98 99 100 101 102 103
node 0 size: 65498 MB
node 0 free: 50308 MB
node 1 cpus: 72 73 74 75 76 77 78 79 104 105 106 107 108 109 110 111
node 1 size: 65536 MB
node 1 free: 44594 MB
node 2 cpus: 80 81 82 83 84 85 86 87 112 113 114 115 116 117 118 119
node 2 size: 65536 MB
node 2 free: 61814 MB
node 3 cpus: 88 89 90 91 92 93 94 95 120 121 122 123 124 125 126 127
node 3 size: 65536 MB
node 3 free: 55258 MB
node 4 cpus: 1 2 3 4 5 6 7 8 33 34 35 36 37 38 39 40
node 4 size: 65536 MB
node 4 free: 46705 MB
node 5 cpus: 9 10 11 12 13 14 15 16 41 42 43 44 45 46 47 48
node 5 size: 65536 MB
node 5 free: 40948 MB
node 6 cpus: 17 18 19 20 21 22 23 24 49 50 51 52 53 54 55 56
node 6 size: 65536 MB
node 6 free: 47468 MB
node 7 cpus: 25 26 27 28 29 30 31 32 57 58 59 60 61 62 63 64
node 7 size: 65536 MB
node 7 free: 17285 MB



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] LWLocks in DSM memory

2016-08-19 Thread Mithun Cy

On Wed, Aug 17, 2016 at 9:12 PM, Andres Freund  wrote:

> >Yes. I want a long wait list, modified in bulk - which should be the
> >case with the above.
>

I ran some pgbench. And, I do not see much difference in performance, small
variance in perf can be attributed to variance in probability of drawing
the particular built-in script.

Server configuration:
./postgres -c shared_buffers=8GB -N 200 -c
min_wal_size=15GB -c max_wal_size=20GB -c checkpoint_timeout=900 -c
maintenance_work_mem=1GB -c checkpoint_completion_target=0.9 &

pgbench configuration: initialized with scale_factor 300
./pgbench -c $threads -j $threads -T 1800 -M prepared -b simple-update@1
-b  select-only@20 postgres


Simple-update : select-only = 5:95


*cilents* *8* *64* *128*
*Current Code* 38279.663784 258196.067342 283150.495921
*After Patch revert* 37316.09022 268285.506338 280077.913954
*% diff * -2.5171944284 3.9076656356 -1.0851409449




Simple-update : selec-only = 20:80


*cilents* *8* *64* *128*
*Current Code* 23169.021469 134791.996882 154057.101004
*After Patch revert* 23892.91539 135091.645551 150402.80543
%diff 3.1244043775 0.2223044958 -2.3720396854

And this was done on our 8 socket  intel machine machine

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Cache Hash Index meta page.

2016-09-05 Thread Mithun Cy

On Sep 2, 2016 7:38 PM, "Jesper Pedersen" 
wrote:
> Could you provide a rebased patch based on Amit's v5 ?

Please find the the patch, based on Amit's V5.

I have fixed following things

1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.

2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.


cache_hash_index_metapage_onAmit_v5_01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-05 Thread Mithun Cy

On Mon, Sep 5, 2016 at 4:33 PM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:
>After a brief examination I've found following ways to improve the patch.
Adding to above comments.

1)
+ /*
+ * consult connection options and check if RO connection is OK
+ * RO connection is OK if readonly connection is explicitely
+ * requested or if replication option is set, or just one
+ * host is specified in the connect string.
+ */
+ if ((conn->pghost==NULL || strchr(conn->pghost,',')==NULL) ||
+ ((conn->read_only && conn->read_only[0] > '0'))
+ ||(conn->replication && conn->replication[0])
+ )
+ {

Now if only one host is in connection string and it ask for read_write
connection(connect to master) I mean read_only is set 0 explicitly. With
above logic we will allow it to connect to standby?. I still think psql
connection to standby should be handled by changing the default value of
readonly to 1 (which means connect to any). Thinking further probably
replacing readonly parameter with targetServerType=any|master (with default
being any) should clear some confusions and bring consistency since same is
used in JDBC multi host connection string.

2)
@@ -1460,33 +1538,80 @@ connectDBStart(PGconn *conn)
   portstr,
   (int) (UNIXSOCK_PATH_BUFLEN - 1));
  conn->options_valid = false;
+ free(nodes->port);

nodes->port was not allocated at this point.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-04 Thread Mithun Cy

On Aug 31, 2016 1:44 PM, "Victor Wagner"  wrote:
> Thanks, I've added this to 7-th (yet unpublished here) version of my
> patch.
Hi victor, just wanted know what your plan for your patch 07. Would you
like to submit it to the community. I have just signed up as a reviewer for
your patch.

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-30 Thread Mithun Cy

On Fri, Aug 26, 2016 at 10:10 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:
>
> >rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg
> -lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
> >../../../../../src/interfaces/libpq/libpq.so: undefined reference to
> `pg_usleep'
>

As similar to psql I have added -lpq for compilation. So now ecpg tests
compiles and make check-world has passed.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


libpq-failover-ecpg-make-01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-09-13 Thread Mithun Cy

On Thu, Sep 8, 2016 at 11:21 PM, Jesper Pedersen  wrote:
>
> > For the archives, this patch conflicts with the WAL patch [1].
>
> > [1] https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFp
> nsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com
>

Updated the patch it applies over Amit's concurrent hash index[1] and
Amit's wal for hash index patch[2] together.

[1] Concurrent Hash index.

[2] Wal for hash index.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_metapage_onAmit_05_02_with_wall.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-08 Thread Mithun Cy

On Wed, Sep 7, 2016 at 7:26 PM, Victor Wagner  wrote:
> No, algorithm here is more complicated. It must ensure that there would
> not be second attempt to connect to host, for which unsuccessful
> connection attempt was done. So, there is list rearrangement.
>Algorithm for pick random list element by single pass is quite trivial.

If concern is only about excluding address which was tried previously. Then
I think we can try this way, randomly permute given address list (for
example fisher-yates shuffle) before trying any of those address in it.
After that you can treat the new list exactly like we do for sequential
connections. I think this makes code less complex.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-29 Thread Mithun Cy

This patch do not apply on latest code. it fails as follows
libpq-failover-9.patch:176: trailing whitespace.
thread.o pgsleep.o
libpq-failover-9.patch:184: trailing whitespace.
check:
libpq-failover-9.patch:185: trailing whitespace.
$(prove_check)
libpq-failover-9.patch:186: trailing whitespace.

libpq-failover-9.patch:194: trailing whitespace.
rm -rf tmp_check
error: patch failed: doc/src/sgml/libpq.sgml:792
error: doc/src/sgml/libpq.sgml: patch does not apply
error: patch failed: src/interfaces/libpq/Makefile:36
error: src/interfaces/libpq/Makefile: patch does not apply
error: patch failed: src/interfaces/libpq/fe-connect.c:299
error: src/interfaces/libpq/fe-connect.c: patch does not apply
error: patch failed: src/interfaces/libpq/libpq-fe.h:62
error: src/interfaces/libpq/libpq-fe.h: patch does not apply
error: patch failed: src/interfaces/libpq/libpq-int.h:334
error: src/interfaces/libpq/libpq-int.h: patch does not apply
error: patch failed: src/interfaces/libpq/test/expected.out:1
error: src/interfaces/libpq/test/expected.out: patch does not apply
error: patch failed: src/test/perl/PostgresNode.pm:398
error: src/test/perl/PostgresNode.pm: patch does not apply

On Tue, Sep 27, 2016 at 2:49 PM, Victor Wagner  wrote:
1).
>* if there is more than one host in the connect string and
>* target_server_type is not specified explicitely, set
>* target_server_type to "master", because default mode of
>* operation is failover, and so, we need to connect to RW
>* host, and keep other nodes of the cluster in the connect
>* string just in case master would fail and one of standbys
>* would be promoted to master.

I am slightly confused. As per this target_server_type=master means user is
expecting failover. What if user pass target_server_type=any and request
sequential connection isn't this also a case for failover?.  I think by
default it should be "any" for any number of hosts. And parameter
"sequential and random will decide whether we want just failover or with
load-balance.

2).

>For some reason DNS resolving was concentrated in one place before my
>changes. So, I've tried to not change this decision.

My intention was not to have a replacement function for
"pg_getaddrinfo_all", I just suggested to

have a local function which call pg_getaddrinfo_all for every host, port
pair read earlier. By this way we need not to maintain nodes struct. This
also reduces complexity of connectDBStart.

FUNC (host, port, addrs)

{

CALL pg_getaddrinfo_all(host, port, newaddrs);

   addrs-> ai_next = newaddrs;

}

3).

I think you should run a spellcheck once. And, there are some formatting
issue with respect to comments and curly braces of controlled blocks which
need to be fixed.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Cache Hash Index meta page.

2016-10-05 Thread Mithun Cy

On Tue, Oct 4, 2016 at 11:55 PM, Jeff Janes  wrote:
>Can you describe your benchmarking machine?  Your benchmarking data went
up to 128 clients.  But how many cores does the machine have?  Are
>you testing how well it can use the resources it has, or how well it can
deal with oversubscription of the resources?

It is a power2 machine with 192 hyperthreads.

Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache:  512K
L3 cache:  8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191

 >Also, was the file supposed to be named .ods?  I didn't find it to be
openable as an .odc file.

Yes .ods right it is a spreadsheet in ODF.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] "Some tests to cover hash_index"

2016-09-19 Thread Mithun Cy

On Sat, Aug 6, 2016 at 9:41 AM, Amit Kapila  wrote:
> I wonder why you have included a new file for these tests, why can't be
these added to existing hash_index.sql.
tests in hash_index.sql did not cover overflow pages, above tests were for
mainly for them. So I thought having a separate test file can help
enabling/disabling them in schedule files, when we do not want them running
as it take slightly high time. If you think otherwise I will reconsider
will add tests to hash_index.sql.

>The relation name con_hash_index* choosen in above tests doesn't seem to
be appropriate, how about hash_split_heap* or something like that.
Fixed. Have renamed relation, index and test filename accordingly.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


commit-hash_coverage_test_v2_no_wal.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-08-25 Thread Mithun Cy

On Thu, Mar 17, 2016 at 4:47 AM, David Steele  wrote:
>Since there has been no response from the author I have marked this patch
"returned with feedback".  Please feel free >to resubmit for 9.7!
I have started to work on this patch, and tried to fix some of the issues
discussed above. The most recent patch 06 has fixed many issues which was
raised previously which include indefinite looping, crashes. And, some of
the issues which remain pending are.

1. Connection status functions PQport, PQhost do not give corresponding
values of established connection. -- Have attached the patch for same.

2. Default value of readonly parameter is 0, which means should connect to
master only. So absence of parameter in connection string in simple cases
like "./psql -p PORT database" fails to connect to hot standby server.  I
think since if readonly=1 means connect to any. and readonly=0 means
connect to master only, we should change the default value  to 1, to handle
the cases when parameter is not specified.

JFYI Interestingly PostgreSql JDBC driver have following options
targetServerType=any|master|slave|preferSlave for same purpose.

Also did a little bit of patch clean-up.

Also found some minor issues related to build which I am working on.
1. make check-world @src/interfaces/ecpg/test/connect fails with following
error:
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-g -O0 -pthread -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS
test1.o -L../../ecpglib -L../../pgtypeslib
-L../../../../../src/interfaces/libpq
-L../../../../../src/port -L../../../../../src/common -Wl,--as-needed
-Wl,-rpath,'/home/mithun/edbsrc/patch6bin/lib',--enable-new-dtags  -lecpg
-lpgtypes -lpq -lpgcommon -lpgport -lz -lrt -lcrypt -ldl -lm   -o test1
../../../../../src/interfaces/libpq/libpq.so: undefined reference to
`pg_usleep'
collect2: error: ld returned 1 exit status
make[3]: *** [test1] Error 1
make[3]: Leaving directory `/home/mithun/mynewpost/p1/
src/interfaces/ecpg/test/connect'

patch has used pg_usleep() which is in L../../../../../src/port  I think
dependency is not captured rightly.

Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index f22e3da..835061d 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -792,7 +792,7 @@ host=localhost port=5432 dbname=mydb connect_timeout=10

The general form for a connection URI is:
 
-postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1...]
+postgresql://[user[:password]@][netloc][:port][,netloc[:port]...][/dbname][?param1=value1...]
 

 
@@ -809,6 +809,7 @@ postgresql://localhost/mydb
 postgresql://user@localhost
 postgresql://user:secret@localhost
 postgresql://other@localhost/otherdb?connect_timeout=10application_name=myapp
+postgresql://node1,node2:5433,node3:4432,node4/mydb?hostorder=randomreadonly=1
 
 Components of the hierarchical part of the URI can also
 be given as parameters.  For example:
@@ -831,7 +832,9 @@ postgresql:///mydb?host=localhostport=5433

 For improved compatibility with JDBC connection URIs,
 instances of parameter ssl=true are translated into
-sslmode=require.
+sslmode=require and
+loadBalanceHosts=true  into
+hostorder=random.

 

@@ -841,6 +844,10 @@ postgresql:///mydb?host=localhostport=5433
 postgresql://[2001:db8::1234]/database
 

+   
+   There can be serveral host specifications, optionally accompanied
+   with port, separated by comma.
+   
 

 The host component is interpreted as described for the parameter PostgreSQL was built). On machines without
 Unix-domain sockets, the default is to connect to localhost.

+   
+   There can be more than one host parameter in
+   the connect string. In this case these hosts would be considered
+   alternate entries into same database and if connect to first one
+   fails, library would try to connect second etc. This can be used
+   for high availability cluster or for load balancing. See
+parameter.
+   
+   
+   Network host name can be accompanied with port number, separated by
+   colon. If so, this port number is used only when connected to
+   this host. If there is no port number, port specified in the
+parameter would be used.
+   
   
  
 
@@ -942,8 +963,44 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname


   
-
+
+  
+  hostorder
+  
+  
+  Specifies how to choose host from list of alternate hosts,
+  specified in the  parameter.
+  
+  
+  If value of this argument is sequential (the
+  default) library connects to the hosts in order they specified,
+  and tries to connect second one only if

Re: [HACKERS] Cache Hash Index meta page.

2016-09-28 Thread Mithun Cy

On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes  wrote:
 > I think that this needs to be updated again for v8 of concurrent and v5
of wal

Adding the rebased patch over [1] + [2]

[1] Concurrent Hash index.

[2] Wal for hash index.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_metapage_onAmit_05_03_with_wall.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-09-25 Thread Mithun Cy

I have some more comments on libpq-failover-8.patch

+ /* FIXME need to check that port is numeric */

Is this still applicable?.

1)

+ /*

+ * if there is more than one host in the connect string and

+ * target_server_type is not specified explicitely, set

+ * target_server_type to "master"

+ */

I think we need comments to know why we change default value based on
number of elements in connection string. why default in not “any" when node
count > 1.


2)

+ /* loop over all the host specs in the node variable */

+ for (node = nodes; node->host != NULL || node->port != NULL; node++)

  {

I think instead of loop if we can move these code into a separate function say
pg_add_to_addresslist(host, port, ) this helps in removing temp
variables like "struct node info” and several heap calls around it.


3)

+static char *

+get_port_from_addrinfo(struct addrinfo * ai)

+{

+ char port[6];

+

+ sprintf(port, "%d", htons(((struct sockaddr_in *)
ai->ai_addr)->sin_port));

+ return strdup(port);

+}

Need some comments for this function.

We use strdup in many places no where we handle memory allocation failure.


4)

+ /*

+ * consult connection options and check if RO connection is OK RO

+ * connection is OK if readonly connection is explicitely

+ * requested or if replication option is set, or just one host is

+ * specified in the connect string.

+ */

+ if (conn->pghost == NULL || !conn->target_server_type ||

+ strcmp(conn->target_server_type, "any") == 0)

Comments not in sink with code.

On Wed, Sep 14, 2016 at 12:28 AM, Robert Haas  wrote:

> On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner  wrote:
> > Random permutation is much more computationally complex than random
> > picking.
>
> It really isn't.  The pseudocode given at
> https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4
> lines long, and one of those lines is a comment.  The C implementation
> is longer, but not by much.
>
> Mind you, I haven't read the patch, so I don't know whether using
> something like this would make the code simpler or more complicated.
> However, if the two modes of operation are (1) try all hosts in random
> order and (2) try all hosts in the listed order, it's worth noting
> that the code for the second thing can be used to implement the first
> thing just by shuffling the list before you begin.
>
> > So, using random permutation instead  of random pick wouln't help.
> > And keeping somewhere number of elements in the list wouldn't help
> > either unless we change linked list to completely different data
> > structure which allows random access.
>
> Is there a good reason not to use an array rather than a linked list?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Proposal : For Auto-Prewarm.

2016-10-27 Thread Mithun Cy

On Thu, Oct 27, 2016 at 5:09 PM, Mithun Cy <mithun...@enterprisedb.com>
wrote:
 > This a PostgreSQL contrib module which automatically dump all of the
blocknums
 >present in buffer pool at the time of server shutdown(smart and fast mode
only,
 >to be enhanced to dump at regular interval.) and load these blocks when
server restarts.

Sorry I forgot add warmup time performance measurement done based on this
patch. Adding now.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


pg_autoprewarm_perf_results.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Proposal : For Auto-Prewarm.

2016-10-27 Thread Mithun Cy

# pg_autoprewarm.

This a PostgreSQL contrib module which automatically dump all of the
blocknums
present in buffer pool at the time of server shutdown(smart and fast mode
only,
to be enhanced to dump at regular interval.) and load these blocks when
server restarts.

Design:
--
We have created a BG Worker Auto Pre-warmer which during shutdown dumps all
the
blocknum in buffer pool in sorted order.
Format of each entry is
.
Auto Pre-warmer is started as soon as the postmaster is started we do not
wait
for recovery to finish and database to reach a consistent state. If there
is a
"dump_file" to load we start loading each block entry to buffer pool until
there is a free buffer. This way we do not replace any new blocks which was
loaded either by recovery process or querying clients. Then it waits until
it receives
SIGTERM to dump the block information in buffer pool.

HOW TO USE:
---
Build and add the pg_autoprewarm to shared_preload_libraries. Auto
Pre-warmer
process automatically do dumping of buffer pool's block info and load them
when
restarted.

TO DO:
--
Add functionality to dump based on timer at regular interval.
And some cleanups.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


pg_autoprewarm_01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-09 Thread Mithun Cy

On Thu, Nov 3, 2016 at 7:16 PM, Robert Haas  wrote:
> Great, committed.  There's still potentially more work to be done
> here, because my patch omits some features that were present in
> Victor's original submission, like setting the failover timeout,
> optionally randomizing the order of the hosts, and distinguishing
> between master and standby servers; Victor, or anyone, please feel
> free to submit separate patches for those things.

Among the remaining things I have worked on failover to new master idea.
Below patch implement that idea. This is taken from Victors patch but
rewritten by me to do some cleanup. As in Victor's patch we have a new
connection parameter "target_server_type", It can take 2 values 1. "any" 2.
"master" with DEFAULT as "any". If it's has the value "any" we can connect
to any of the host server (both master(primary) and slave(standby)). If the
value is "master" then we try to connect to master(primary) only.
NOTE: Parameter name is inspired and taken from PostgreSql JDBC Driver
.

The main difference between Victor's and this new patch is Default value of
parameter target_server_type. In Victor's patch if number of host in
connection string is 1 then default value is "any" (This was done to make
sure old psql connect to standby as it is now). If it is greater than 1
then default value is set as "master". For me this appeared slightly
inconsistent having default value as "any" for any number of connection
appeared more appropriate which is also backward compatible. And, if user
want failover to master he should ask for it by setting
target_server_type=master in connection string.

This patch do not include load balancing feature by trying to connect to
host in random fashion.

This patch also do not have failover timeout feature. Victor's
implementation of same is not a strict one. i. e. 1 walk of all of the host
is allowed even if timer might have expired. Only on second walk we check
for timeout. I thought application can manage the failover timeout instead
libpq doing it. So avoided the same in this patch. If others find that I am
wrong to think that way, will add the same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

failover_to_new_master-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-10 Thread Mithun Cy

On Thu, Nov 10, 2016 at 1:11 PM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:
> Why don't you add "standby" and "prefer_standby" as the
target_server_type value?  Are you thinking that those values are useful
with load balancing > feature?
Yes this patch will only address failover to new master, values "master"
and "any" appeared sufficient for that case.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-14 Thread Mithun Cy

On Mon, Nov 14, 2016 at 1:37 PM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:
 > No, there's no concern about compatibility.  Please look at this:
 > https://www.postgresql.org/docs/devel/static/protocol-
flow.html#PROTOCOL-ASYNC
Thanks, my concern is suppose you have 3 server in cluster A(new version),
B(new version), C(old version). If we implement as above only new servers
will send ParameterStatus message to indicate what type of server we are
connected. Server C will not send same. So we will not be able to use new
feature "failover to new master" for such a kind of cluster.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-13 Thread Mithun Cy

On Fri, Nov 11, 2016 at 7:33 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
> We tend to use the term "primary" instead of "master".

Thanks, I will use primary instead of master in my next patch.

>Will this work with logical replication?

I have not tested with logical replication. Currently we identify the
primary to connect based on result of "SELECT pg_is_in_recovery()". So I
think it works. Do you want me test a particular setup?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-13 Thread Mithun Cy

On Mon, Nov 14, 2016 at 11:31 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:
> PGconn->target_server_type is not freed in freePGconn().

Thanks, will fix in new patch.

>Could you add PGTARGETSERVERTYPE environment variable?  Like other
variables, it will ease testing, since users can change the behavior
>without altering the connection string here and there.

Okay, will add one.

> I think it would be better to expose the server state via ParameterStatus
protocol message like standard_conforming_strings, instead of running >
>"SELECT pg_is_in_recovery()".  We shouldn't want to add one round trip to
check the server type (master, standby).  postmaster can return the server
>type based on its state (pmState); PM_RUN is master, and PM_HOT_STANDBY is
standby.  In addition, as an impractical concern, DBA can revoke >EXECUTE
privilege on pg_is_in_recovery() from non-superusers.

If you are suggesting me to change in protocol messages, I think that would
not be backward compatible to older version servers. I also think such
level of protocol changes will not be allowed. with connection status
CONNECTION_SETENV used for protocol version 2.0 setup, we sent some query
 like
"select pg_catalog.pg_client_encoding()" for same. So I think using "SELECT
pg_is_in_recovery()" should be fine.
-
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-22 Thread Mithun Cy

An updated patch with some fixes for bugs reported earlier,

A. failover_to_new_master_v4.patch
Default value "any" is added to target_server_type parameter during its
definition.

B. libpq-failover-smallbugs_02.patch
Fixed the issue raised by [PATCH] pgpassfile connection option
,
Now added host and port name along with pgpass file error message when ever
authentication fails and added same to file libpq-failover-smallbugs.patch
(bugs reported and fixed by Tsunakawa). I did review of original bugs
reported and fixes it seems okay for me. I could not test GSS and SSPI
authentication as it appears they needed windows.


failover_to_new_master_v4.patch
Description: Binary data


libpq-failover-smallbugs_02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-24 Thread Mithun Cy

On Wed, Nov 23, 2016 at 10:19 PM, Catalin Iacob 
wrote:
   On Tue, Nov 22, 2016 at 8:38 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:
 >> If you want to connect to a server where the transaction is read-only,
then shouldn't the connection parameter be something like
 >>"target_session_attrs=readonly"?  That represents exactly what the code
does.

 >FWIW I find this to be a reasonable compromise. To keep the analogy
 >with the current patch it would be more something like
"target_session_attrs=read_write|any".

I have taken this suggestion now renamed target_server_type to
target_session_attrs with possible 2 values "read-write", "any".
May be we could expand to "readonly" and "prefer-readonly" in next patch
proposal. Attaching the patch for same.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


failover-to-new-writable-session-05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken SSL tests in master

2016-11-24 Thread Mithun Cy

On Fri, Nov 25, 2016 at 10:41 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:
> I agree that pg_conn_host should have hostaddr in addition to host, and
PQhost() return host when host is specified with/without hostaddr specified.

typedef struct pg_conn_host
+{
*+ char   *host; /* host name or address, or socket path */*
*+ pg_conn_host_type type; /* type of host */*
+ char   *port; /* port number for this host; if not NULL,
+ * overrrides the PGConn's pgport */
+ char   *password; /* password for this host, read from the
+ * password file.  only set if the PGconn's
+ * pgpass field is NULL. */
+ struct addrinfo *addrlist; /* list of possible backend addresses */
+} pg_conn_host;

+typedef enum pg_conn_host_type
+{
+ CHT_HOST_NAME,
+ CHT_HOST_ADDRESS,
+ CHT_UNIX_SOCKET
+} pg_conn_host_type;

host parameter stores both hostname and hostaddr, and we already have
parameter "type" to identify same.
I think we should not be using PQHost() directly in
verify_peer_name_matches_certificate_name (same holds good for GSS, SSPI).
Instead proceed only if "conn->connhost[conn->whichhost]" is a
"CHT_HOST_NAME".
Also further old PQHost() did not produce CHT_HOST_ADDRESS as its output so
we might need to revert back to old behaviour.

>However, I wonder whether the hostaddr parameter should also accept
multiple IP addresses.  Currently, it accepts only one address as follows.
I >asked Robert and Mithun about this, but I forgot about that.

As far as I know only pghost allowed to have multiple host. And, pghostaddr
takes only one numeric address.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Broken SSL tests in master

2016-11-25 Thread Mithun Cy

On Fri, Nov 25, 2016 at 12:03 PM, Andreas Karlsson 
wrote:
> Another thought about this code: should we not check if it is a unix
socket first before splitting the host? While I doubt that it is common to
have a unix >socket in a directory with comma in the path it is a bit
annoying that we no longer support this.

I think it is a bug.

*Before this feature:*

./psql postgres://%2fhome%2fmithun%2f%2c
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "*/home/mithun/,/.*s.PGSQL.5444"?

*After this feature:*
./psql postgres://%2fhome%2fmithun%2f%2c
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "*/home/mithun//.*s.PGSQL.5432"?
*could not connect to server: Connection refused*
* Is the server running on host "" (::1) and accepting*
* TCP/IP connections on port 5432?*
*could not connect to server: Connection refused*
* Is the server running on host "" (127.0.0.1) and accepting*
* TCP/IP connections on port 5432?*

So comma (%2c) is misinterpreted as separator not as part of UDS path.

Reason is we first decode the URI(percent encoded character) then try to
split the string into multiple host assuming they are separated by *','*. I
think we need to change the order here. Otherwise difficult the say whether
*','* is part of USD path or a separator.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Proposal : For Auto-Prewarm.

2016-11-28 Thread Mithun Cy

Sorry I took some time on this please find latest patch with addressed
review comments. Apart from fixes for comments I have introduced a new GUC
variable for the pg_autoprewarm "buff_dump_interval". So now we dump the
buffer pool metadata at every buff_dump_interval secs. Currently
buff_dump_interval can be set only at startup time. I did not choose to do
the dumping at checkpoint time, as it appeared these 2 things are not much
related and keeping it independent would be nice for usage. Also overhead
of any communication between them can be avoided.

On Fri, Oct 28, 2016 at 1:45 AM, Jim Nasby  wrote:
 > IMO it would be better to add this functionality to pg_prewarm instead
of creating another contrib module. That would reduce confusion and reduce
 > the amount of code necessary.

I have merged pg_autoprewarm module into pg_prewarm, This is just the
directory merge, Functionality merge is not possible pg_prewarm is just a
utility function with specific signature to load a specific relation at
runtime, where as pg_autoprewarm is a bgworker which dumps current buffer
pool and load it during startup time.

> Presumably the first 4 numbers will vary far less than blocknum, so it's
probably worth reversing the order here (or at least put blocknum first).

function sort_cmp_func is for qsort so orderly comparison is needed to say
which is bigger or smaller, If we put blocknum first, we cannot decide same.

> AFAICT this isn't necessary since palloc will error itself if it fails.

Fixed.

> Since there's no method to change DEFAULT_DUMP_FILENAME, I would call it
what it is: DUMP_FILENAME.

Fixed.

> Also, maybe worth an assert to make sure there was enough room for the
complete filename. That'd need to be a real check if this was configurable
> anyway.

I think if we make it configurable I think I can put that check.

 > + if (!avoid_dumping)
 > +   dump_now();
 > Perhaps that check should be moved into dump_now() itself...

 Fixed.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

pg_autoprewarm_02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgpassfile connection option

2016-11-21 Thread Mithun Cy

 > Independently of your patch, while testing I concluded that the
multi-host feature and documentation should be improved:
 > - If a connection fails, it does not say for which host/port.

Thanks I will re-test same.

> In fact they are tried in turn if the network connection fails, but not
> if the connection fails for some other reason such as the auth.
> The documentation is not precise enough.

If we fail due to authentication, then I think we should notify the client
instead of trying next host. I think it is expected genuine user have right
credentials for making the connection. So it's better if we notify same to
client immediately rather than silently ignoring them. Otherwise the host
node which the client failed to connect will be permanently unavailable
until client corrects itself. But I agree documentation and error message
as you said need improvements. I will try to correct same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Improve OOM handling in pg_locale.c

2016-11-16 Thread Mithun Cy

On Thu, Oct 13, 2016 at 1:40 PM, Michael Paquier 
wrote:
> I am attaching that to the next CF.

I have tested this patch. Now we error out as OOM instead of crash.

postgres=# SELECT '12.34'::money;
ERROR:  out of memory
LINE 1: SELECT '12.34'::money;


One thing which you might need to reconsider is removal of memory leak
comments. There is still a leak if there is an error while encoding in
db_encoding_strdup.
Unless you want to catch those error with an TRY();CATCH(); and then
free the mem.

-* localeconv()'s results.  Note that if we were to fail within this
-* sequence before reaching "CurrentLocaleConvAllocated = true", we 
could
-* leak some memory --- but not much, so it's not worth agonizing over.

Rest all LGTM.
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-17 Thread Mithun Cy

On Thu, Nov 17, 2016 at 8:27 AM, Robert Haas  wrote:
>but SELECT pg_is_in_recovery() and SHOW transaction_read_only
>exist in older versions so if we pick either of those methods then it
>will just work.

I am adding next version of the patch it have following fixes.
Tsunakawa's comments
1.  PGconn->target_server_type is now freed in freePGconn()
2.  Added PGTARGETSERVERTYPE.

*Additional comments from others*
3.  Moved from SELECT pg_is_in_recovery() to SHOW transaction_read_only now
should handle different kind of replication, as we recognise server to
which writable connection can be made as primary. Very exactly like JDBC
driver. Also documented about it.
4. renamed words from master to primary.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

failover_to_new_master-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] possible optimizations - pushing filter before aggregation

2016-11-19 Thread Mithun Cy

On Sat, Nov 19, 2016 at 8:59 AM, Pavel Stehule 
wrote:

>
>
>
>> SELECT MIN(a) m FROM
>>(SELECT a FROM t WHERE a=2) AS v(a)
>>
>> The subquery is going to return an intermediate result of:
>>
>> V:A
>> ---
>> 2
>>
>> And the minimum of that is 2, which is the wrong answer.
>>
>
> yes, you have true,
>
> In above case wondering if we could do this

Min(a) = 2 is the condition, generate condition *"a <= 2"* and push it down
as scan key. Since pushed down condition is lossy one for us ( it gives
values < 2), finally do a recheck of *"Min(a) = 2"*.

For Max(a) = 2 we can have *"a >=2"*,

If both are given we can combine them appropriately.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-15 Thread Mithun Cy

>
> > So I am tempted to just
> > hold my nose and hard-code the SQL as JDBC is presumably already
> doing.


JDBC is sending "show transaction_read_only" to find whether it is master
or not.
Victor's patch also started with it, but later it was transformed into
pg_is_in_recovery
by him as it appeared more appropriate to identify the master / slave.

ConnectionFactoryImpl.java:685

+ private boolean isMaster(QueryExecutor queryExecutor, Logger logger)
+ throws SQLException, IOException {
+byte[][] results = SetupQueryRunner.run(queryExecutor, "show
transaction_read_only", true);
+String value = queryExecutor.getEncoding().decode(results[0]);
+return value.equalsIgnoreCase("off");
}

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-20 Thread Mithun Cy

On Fri, Nov 18, 2016 at 6:39 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:
 >Typo.   , and "observering" -> "observing".

Thanks fixed.

> +   {"target_server_type", "PGTARGETSERVERTYPE",
DefaultTargetServerType, NULL,
> +   "Target-Server-Type", "", 6,

Thanks fixed.

> Please avoid adding another round trip by using a GUC_REPORTed variable
(ParameterStatus entry).  If you want to support this libpq failover with
>pre-10 servers, you can switch the method of determining the primary based
on the server version.  But I don't think it's worth supporting older
servers > at the price of libpq code complexity.

Currently there is no consensus around this. For now, making this patch to
address failover to next primary as similar to JDBC seems sufficient for me.
On next proposal of patch I think we can try to extend as you have proposed

>Please consider supporting "standby" and "prefer_standby" like PgJDBC.
They are useful without load balancing when multiple standbys are used for
HA.

I think they become more relevant with load-balancing. And, making it
usable when we extend this feature to have load-balancing makes sense to
me.

> I haven't tracked the progress of logical replication, but will
target_server_type be likely to be usable with it?  How will
target_server_type fit logical   > replication?

I tried to check logical replication WIP patch, not very sure how to
accomodate same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


failover_to_new_master_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-01 Thread Mithun Cy

On Thu, Oct 27, 2016 at 11:15 PM, Robert Haas  wrote:
> Thanks.  Here's a new version with a fix for that issue and also a fix
> for PQconnectionNeedsPassword(), which was busted in v1.
I did some more testing of the patch for both URI and (host, port)
parameter pairs. I did test for ipv4, ipv6, UDS type of sockets, Also
tested different password using .pgpass. I did not see any major issue
with. All seems to work as defined. Tested the behaviors of API PQport,
PQhost, PQpass, PQreset all works as defined.

I also have following observation which can be considered if it is valid.

1. URI do not support UDS, where as (host,port) pair support same.
But there is one problem with this, instead of reporting it, we discard the
URI input (@conninfo_uri_parse_options) and try to connect to default unix
socket

[mithun@localhost bin]$ ./psql postgres:///home/mithun:,
127.0.0.1:/postgres -U mithun
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
server is running on both the sockets.

2. If we fail to connect we get an error message for each of the addrinfo
we tried to connect, wondering if we could simplify same. Just a suggestion.
[mithun@localhost bin]$ ./psql postgres://,,,/postgres
psql: could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "" (127.0.0.1) and accepting
TCP/IP connections on port 5432?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-01 Thread Mithun Cy

On Tue, Nov 1, 2016 at 6:54 PM, Robert Haas  wrote:
>That's the wrong syntax.  If you look in
> https://www.postgresql.org/docs/devel/static/libpq-connect.html under
> "32.1.1.2. Connection URIs", it gives an example of how to include a
> slash in a pathname.  You have to use %2F, or else the URL parser will
> think you're starting a new section of the URI.
>I believe this works fine if you use the correct syntax.

Sorry that was a mistake from me. Now it appears fine.

>
> Starting program: /home/mithun/libpqbin/bin/./psql
> postgres://%2fhome%2fmithun:/postgres -U mithun1
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> psql: could not connect to server: No such file or directory
> Is the server running locally and accepting
> *connections on Unix domain socket "/home/mithun/.s.PGSQL."*?



Accidentally when testing further on this line I found a crash when invalid
encoding "%2" (not recognised as hexa) was used.
Test:

> Starting program: /home/mithun/libpqbin/bin/*./psql
> postgres://%2home%2mithun:/postgres -U mithun1*

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib64/libthread_db.so.1".


> Program received signal SIGSEGV, Segmentation fault.

0x77bb8d4f in PQpass (conn=0x68aaa0) at fe-connect.c:5582

5582 password = conn->connhost[conn->whichhost].password;

Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-106.el7_2.6.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64
> readline-6.2-9.el7.x86_64


There can be PGconn which have no connhost.
exposed API's PQpass, PQreset->connectDBStart access conn->connhost without
checking whether it is set.

>That output seems fine to me.  In a real connection string, you're not
>likely to have so many duplicated addresses, and it's good for the
>error message to make clear which addresses were tried and what
>happened for each one.

Agree, Thanks.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-01 Thread Mithun Cy

On Tue, Nov 1, 2016 at 7:44 PM, Robert Haas  wrote:
>> Starting program: /home/mithun/libpqbin/bin/./psql
>> postgres://%2home%2mithun:/postgres -U mithun1
>Can you provide a concrete test scenario or some test code that fails?
>connhost is supposed to be getting set in connectOptions2(), which is
>run (AIUI) when the PGconn is first created.  So I think that it will
>be set in all scenarios of the type you mention, but I might be
>missing something.

Sorry if my sentence is confusing
If I give a proper hexadecimal encoding % followed by 2 hexadigit (i.e. for
e.g %2f, %2a) every thing is fine. When I pass a invalid hexadigit encoding
eg:  *%2h, %2m *among the host string e.g
"postgres://*%2h*ome*%2m*ithun:/postgres".
then "PQconnectdbParams()" fails before calling connectOptions2(). In that
case failed PQconnectdbParams() also return a PGconn where connhost is not
set. If we call PQpass(), PQReset() on such a PGconn we get a crash.

A simple test case which crash is:
./psql  'postgres://%2hxxx:/postgres'

Call stack:
--
#0  0x77bb8d4f in PQpass (conn=0x68aa10) at fe-connect.c:5582
#1  0x77bb907a in PQconnectionNeedsPassword (conn=0x68aa10) at
fe-connect.c:5727
#2  0x004130aa in main (argc=2, argv=0x7fffdff8) at
startup.c:250

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-02 Thread Mithun Cy

On Tue, Nov 1, 2016 at 9:42 PM, Robert Haas  wrote:
> Ah, nuts.  Thanks, good catch.  Should be fixed in the attached version.

I repeated the test on new patch, It works fine now, Also did some more
negative tests forcibly failing some internal calls. All tests have passed.
This patch works as described and look good to me.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-27 Thread Mithun Cy

On Wed, Oct 26, 2016 at 8:49 PM, Robert Haas  wrote:
> Let me know your thoughts.

One small issue. I tried to run make installcheck after applying patch
there seems to be a invalid write access in code (resulting in crash).

==118997== Invalid write of size 1
==118997==at 0x4E3DDF1: connectOptions2 (fe-connect.c:884)
==118997==by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608)
==118997==by 0x4E3D553: PQconnectdbParams (fe-connect.c:465)
==118997==by 0x413067: main (startup.c:245)
==118997==  Address 0x5dc4014 is 0 bytes after a block of size 4 alloc'd
==118997==at 0x4C29BFD: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==118997==by 0x4E3DD3C: connectOptions2 (fe-connect.c:880)
==118997==by 0x4E3D6FF: PQconnectStartParams (fe-connect.c:608)
==118997==by 0x4E3D553: PQconnectdbParams (fe-connect.c:465)
==118997==by 0x413067: main (startup.c:245)

After locally fixing this by allocating an extra space for string terminal
character. make installcheck  tests pass.
-(char *) malloc(sizeof(char) * (e - s));
+ (char *) malloc(sizeof(char) * (e - s + 1));

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] "Some tests to cover hash_index"

2016-10-11 Thread Mithun Cy

On Wed, Sep 28, 2016 at 9:56 PM, Robert Haas  wrote:
> something committable will come from it, but with 2 days left it's not
> going to happen this CF.
Adding a new patch. This one uses generate series instead of INSERT INTO
SELECT and fixed comments from Alvaro.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


commit-hash_coverage_test_v3_no_wal.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-10-13 Thread Mithun Cy

On Fri, Sep 30, 2016 at 2:14 PM, Victor Wagner  wrote:
>Ok, some trailing whitespace and mixing of tabs and spaces
>which git apply doesn't like really present in the patch.
>I'm attached hear version with these issues resolved.

There were some more trailing spaces and spaces used for tabs in patch 09.
I have fixed same along with formatting issues related to comments and { of
controlled blocks.

>But backward compatibility is more important thing, so I now assume, that
user tries to connect just onenode, and this node is read only, user knows
>what he is doing.
Okay but for me consistency is also important. Since we agree to disagree
on some of the comments and others have not expressed any problem I am
moving it to committer.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


libpq-failover-10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-12 Thread Mithun Cy

On Wed, Jan 11, 2017 at 12:46 AM, Robert Haas  wrote:
> Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
> this new logic?  Or at least update the comments?
I have introduced a new function _hash_getcachedmetap in patch 11 [1] with
this hashbulkdelete() can use metapage cache instead of saving it locally.

[1] cache_hash_index_meta_page_11.patch

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Cache Hash Index meta page.

2017-01-12 Thread Mithun Cy

On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila  wrote:
> Few more comments:
> 1.
> no need to two extra lines, one is sufficient and matches the nearby
> coding pattern.

-- Fixed.

> 2.
> Do you see anywhere else in the code the pattern of using @ symbol in
> comments before function name?

-- Fixed.

> 3.
>
> after this change, i think you can directly use bucket in
> _hash_finish_split instead of pageopaque->hasho_bucket

-- Fixed.

> 4.
>
> Won't the check similar to the existing check (if (*bufp ==
> so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
> above this code will suffice the need?  If so, then you can check it
> once and use it in both places.
>

-- Fixed.

> 5. The reader and insertion algorithm needs to be updated in README.

-- Added info in README.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Typo in hashsearch.c

2017-01-13 Thread Mithun Cy

There is a typo in comments of function _hash_first(); Adding a fix for same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


hashsearch_typo01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-14 Thread Mithun Cy

On Fri, Jan 13, 2017 at 12:49 AM, Jesper Pedersen
 wrote:
> Rebased, and removed the compile warn in hashfuncs.c

I did some testing and review for the patch. I did not see any major
issue, but there are few minor cases for which I have few suggestions.

1. Source File :  /doc/src/sgml/pageinspect.sgml
example given.
SELECT * FROM hash_page_type(get_raw_page('con_hash_index', 0));
can be changed to
SELECT hash_page_type(get_raw_page('con_hash_index', 0));

2. @verify_hash_page
I can easily produce this error right after the split, so there will
be pages which are not inited before it is used. So an error saying it
is unexpected might be slightly misleading. I think error message
should be improved.
postgres=# SELECT hash_page_type(get_raw_page('i1', 12));
ERROR:  index table contains unexpected zero page

3. @verify_hash_page

(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("page is not a HASH page"),
+ errdetail("Expected %08x, got %08x.",

In error message "HASH" can downcase to "hash". That makes error
messages consistent with other error messages like "page is not a hash
page of expected type"


4. The condition is raw_page_size != BLCKSZ; But error msg "input page
too small"; I think error message should be changed to "invalid page
size".

if (raw_page_size != BLCKSZ)
+ ereport(ERROR,
 i+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small"),
+ errdetail("Expected size %d, got %d",
+   BLCKSZ, raw_page_size)));


5. @hash_bitmap_info
Metabuf can be released once after bitmapblkno is set it is off no use.
_hash_relbuf(indexRel, metabuf);

is Write lock required here for bitmap page?
mapbuf = _hash_getbuf(indexRel, bitmapblkno, HASH_WRITE, LH_BITMAP_PAGE);


6. You have renamed convert_ovflblkno_to_bitno to
_hash_ovflblkno_to_bitno. But unfortunately, you also moved the
function down. The diff appears as if you have completely replaced it.
I think we should put it back to at old place.

7. I think it is not your bug, but probably a bug in Hash index
itself; page flag is set to 66 (for below test); So the page is both
LH_BUCKET_PAGE and LH_BITMAP_PAGE. Is not this a bug in hash index?

I have inserted 3000 records. Hash index is on integer column.
select hasho_flag FROM hash_page_stats(get_raw_page('i1', 1));
 hasho_flag

 66

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken SSL tests in master

2016-12-01 Thread Mithun Cy

On Fri, Dec 2, 2016 at 2:26 AM, Robert Haas  wrote:
>Yeah, we should change that.  Are you going to write a patch?

Thanks, will work on this will produce a patch to patch to fix.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-11-29 Thread Mithun Cy

On Wed, Nov 30, 2016 at 1:44 AM, Robert Haas  wrote:
 >On Tue, Nov 29, 2016 at 2:19 PM, Kuntal Ghosh 
wrote:
 >> I was doing some testing with the patch and I found some inconsistency
 >> in the error message.

 > Hmm, maybe the query buffer is getting cleared someplace in there.  We
 > might need to save/restore it.

 Thanks, send query resets the errorMessage. Will fix same.

* PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(>errorMessage);*

Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Cache Hash Index meta page.

2016-12-05 Thread Mithun Cy

On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen 
wrote:
>As the concurrent hash index patch was committed in 6d46f4 this patch
needs a rebase.

Thanks Jesper,

Adding the rebased patch.

I have re-run the pgbench readonly tests with below modification.

"alter table pgbench_accounts drop constraint pgbench_accounts_pkey"
postgres
"create index pgbench_accounts_pkey on pgbench_accounts using hash(aid)"
postgres

*Postgres Server settings:*
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9

*pgbench settings:*
scale_factor = 300 (so database fits in shared_buffer)
Mode =  -M prepared -S (prepared readonly mode).

Machine used:
power2 with sufficient ram for above shared_buffer.

#lscpu
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A

*Clients *

*Cache  Meta Page patch   *

*Base code with amits changes*

*  %imp*

1

17062.513102

17218.353817

-0.9050848685

8

138525.808342

128149.381759

8.0971335488

16

212278.44762

205870.456661

3.1126326054

32

369453.224112

360423.566937

2.5052904425

*64*

*576090.293018*

*510665.044842*

*12.8117733604*

*96*

*686813.187117*

*504950.885867*

*36.0158396272*

*104*

*688932.67516*

*498365.55841*

*38.2384202789*

*128*

*730728.526322*

*409011.008553*

*78.6574226711*

Appears there is a good improvement at higher clients.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_06.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2016-12-05 Thread Mithun Cy

On Tue, Dec 6, 2016 at 1:28 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:

>
> *Clients *
>
> *Cache  Meta Page patch   *
>
> *Base code with amits changes*
>
> *  %imp*
>
> 1
>
> 17062.513102
>
> 17218.353817
>
> -0.9050848685
>
> 8
>
> 138525.808342
>
> 128149.381759
>
> 8.0971335488
>
> 16
>
> 212278.44762
>
> 205870.456661
>
> 3.1126326054
>
> 32
>
> 369453.224112
>
> 360423.566937
>
> 2.5052904425
>
> *64*
>
> *576090.293018*
>
> *510665.044842*
>
> *12.8117733604*
>
> *96*
>
> *686813.187117*
>
> *504950.885867*
>
> *36.0158396272*
>
> *104*
>
> *688932.67516*
>
> *498365.55841*
>
> *38.2384202789*
>
> *128*
>
> *730728.526322*
>
> *409011.008553*
>
> *78.6574226711*
>

All the above readings are median of 3 runs.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Broken SSL tests in master

2016-12-05 Thread Mithun Cy

On Fri, Dec 2, 2016 at 9:18 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:
 >On Fri, Dec 2, 2016 at 2:26 AM, Robert Haas <robertmh...@gmail.com> wrote:
 >>Yeah, we should change that.  Are you going to write a patch?
 > Thanks, will work on this will produce a patch to patch to fix.

This is more complicated than I thought, I tried to fix same by delaying
the decoding of URI encoded hostname till connectOptions2(patch attached),
but that bring side effect for PQconninfoParse  and PQconninfo which will
still show encoded string for "host" (also note %2x is a regular chars in
non-uri connection string). So I think even with uri connection string we
will be not able to use "," in hostname. I think probably just as in
regular connection string comma can only be used as separator.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

libpq-multihost-comma-in-uri.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-12-05 Thread Mithun Cy

On Mon, Dec 5, 2016 at 11:23 PM, Robert Haas  wrote:
>I think that you need a restoreErrorMessage call here:
>/* Skip any remaining addresses for this host. */
>conn->addr_cur = NULL;
>if (conn->whichhost + 1 < conn->nconnhost)
>{
>conn->status = CONNECTION_NEEDED
>restoreErrorMessage(conn, );
>goto keep_going;
>}

Right after seeing transaction is read-only we have restored the saved
message so I think we do not need one more restore there.
  if (strncmp(val, "on", 2) == 0)
  {
  PQclear(res);
+ restoreErrorMessage(conn, );

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-12-04 Thread Mithun Cy

On Fri, Dec 2, 2016 at 8:54 PM, Robert Haas  wrote:
> Couldn't this just be a variable in PQconnectPoll(), instead of adding
> a new structure member?

I have fixed same with a local variable in PQconnectPoll, Initally I
thought savedMessage should have same visiblitly as errorMessage so we can
restore the errorMessage even outside PQconnectPoll. But that seems not
required. Attacting the new patch which fixes the same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

failover-to-new-master-bug-fix-02.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-12-01 Thread Mithun Cy

On Wed, Nov 30, 2016 at 7:14 AM, Mithun Cy <mithun...@enterprisedb.com>
wrote:
> Thanks, send query resets the errorMessage. Will fix same.
>
*PQsendQuery()->PQsendQueryStart()->resetPQExpBuffer(>errorMessage);*

*Please find the patch which fixes this bug. **conn->errorMessage as per
definition only stores current error message, a new member*
*conn->savedMessage is introduced to save and restore error messages
related to previous hosts which we failed to connect.*

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

failover-to-new-writable-session_bugfix_01.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-02 Thread Mithun Cy

Thanks Amit for detailed review, and pointing out various issues in
the patch. I have tried to fix all of your comments as below.

On Mon, Jan 2, 2017 at 11:29 AM, Amit Kapila  wrote:

> 1.
> usage "into to .." in above comment seems to be wrong.usage "into to .." in 
> above comment seems to be wrong.usage "into to .." in above comment seems to 
> be wrong.usage "into to .." in above comment seems to be wrong.

-- Fixed.

> 2.
> In the above comment, a reference to HashMetaCache is confusing, are
> your referring to any structure here?  If you change this, consider toyour 
> referring to any structure here?  If you change this, consider toyour 
> referring to any structure here?  If you change this, consider toyour 
> referring to any structure here?  If you change this, consider to
> change the similar usage at other places in the patch as well.change the 
> similar usage at other places in the patch as well.change the similar usage 
> at other places in the patch as well.change the similar usage at other places 
> in the patch as well.

-- Fixed. Removed HashMetaCache everywhere in the code. Where ever
needed added HashMetaPageData.

> Also, it is not clear to what do you mean by ".. to indicate bucketto 
> indicate bucket
> has been initialized .."?  assigning maxbucket as a special value tohas been 
> initialized .."?  assigning maxbucket as a special value to
> prevblkno is not related to initializing a bucket page.prevblkno is not 
> related to initializing a bucket page.

-- To be consistent with our definition of prevblkno, I am setting
prevblkno with current hashm_maxbucket when we initialize the bucket
page. I have tried to correct the comments accordingly.

> 3.
> In above comment, where you are saying ".. caching the some of the
> meta page data .." is slightly confusing, because it appears to me
> that you are caching whole of the metapage not some part of it.

-- Fixed. Changed to caching the HashMetaPageData.

> 4.
> +_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,
>
> Generally, for _hash_* API's, we use rel as the first parameter, so I
> think it is better to maintain the same with this API as well.

-- Fixed.

> 5.
>   _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
> -   maxbucket, highmask, lowmask);
> +   usedmetap->hashm_maxbucket,
> +   usedmetap->hashm_highmask,
> +   usedmetap->hashm_lowmask);

> I think we should add an Assert for the validity of usedmetap before using it.

-- Fixed. Added Assert(usedmetap != NULL);

> 6. Getting few compilation errors in assert-enabled build.
>

-- Fixed. Sorry, I missed handling bucket number which is needed at
below codes. I have fixed same now.

> 7.
> I can understand what you want to say in above comment, but I think
> you can write it in somewhat shorter form.  There is no need to
> explain the exact check.

-- Fixed. I have tried to compress it into a few lines.

> 8.
> @@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
>   _hash_relbuf(rel, *bufp);
>
>   *bufp = InvalidBuffer;
> +
> + /* If it is a bucket page there will not be a prevblkno. */
> + if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
> + return;
> +
>
> I don't think above check is right.  Even if it is a bucket page, we
> might need to scan bucket being populated, refer check else if
> (so->hashso_buc_populated && so->hashso_buc_split).  Apart from that,
> you can't access bucket page after releasing the lock on it.  Why have
> you added such a check?
>

-- Fixed. That was a mistake, now I have fixed it. Actually, if bucket
page is passed to _hash_readprev then there will not be a prevblkno.
But from this patch onwards prevblkno of bucket page will store
hashm_maxbucket. So a check BlockNumberIsValid (blkno) will not be
valid anymore. I have fixed by adding as below.
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
  {
+ Assert(BlockNumberIsValid(blkno));

There are 2 other places which does same @_hash_freeovflpage,
@_hash_squeezebucket.
But that will only be called for overflow pages. So I did not make
changes. But I think we should also change there to make it
consistent.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_09.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-04 Thread Mithun Cy

On Wed, Jan 4, 2017 at 4:19 PM, Mithun Cy <mithun...@enterprisedb.com> wrote:
> As part performance/space analysis of hash index on varchar data typevarchar 
> data type
> with this patch, I have run some tests for same with modified pgbench.with 
> this patch, I have run some tests for same with modified pgbench.
I forgot to mention all these tests were run on power2 machine which
has 192  2 hyperthreads.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-04 Thread Mithun Cy

On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy <mithun...@enterprisedb.com> wrote:
I have re-based the patch to fix one compilation warning
@_hash_doinsert where variable bucket was only used for Asserting but
was not declared about its purpose.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


cache_hash_index_meta_page_10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cache Hash Index meta page.

2017-01-04 Thread Mithun Cy

On Tue, Jan 3, 2017 at 12:05 PM, Mithun Cy <mithun...@enterprisedb.com> wrote:

As part performance/space analysis of hash index on varchar data type
with this patch, I have run some tests for same with modified pgbench.
Modification includes:
1. Changed aid column of pg_accounts table from int to varchar(x)
2.  To generate unique values as before inserted stringified integer
repeatedly to fill x.

I have mainly tested for varchar(30) and varchar(90),
Below document has the detailed report on our captured data. New hash
indexes have some ~25% improved performance over btree. And, as
expected very space efficient.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


hash_index_sizes_experiment_01.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 172 matches

Mail list logo