from:"Ashutosh Sharma"

[HACKERS] dead or outdated URLs found in win32.h

2017-10-06 Thread Ashutosh Sharma

Hi All,

The following URLs present in src/include/port/win32.h to share the
information on dllimport or dllexport (used in windows) seems to be
either dead/obsolete and i think, they need to be updated.

http://support.microsoft.com/kb/132044 -- dead
http://msdn.microsoft.com/en-us/library/8fskxacy(v=vs.80).aspx -- outdated
http://msdn.microsoft.com/en-us/library/a90k134d(v=vs.80).aspx -- outdated

https://msdn.microsoft.com/en-gb/library/aa489609.aspx -- outdated

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-22 Thread Ashutosh Sharma

On Fri, Sep 22, 2017 at 11:56 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 3:08 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> I have added a note for handling of logged and unlogged tables in
>> README file and also corrected the header comment for
>> hashbucketcleanup(). Please find the attached 0003*.patch having these
>> changes. Thanks.
>
> I committed 0001 and 0002 with some additional edits as
> 7c75ef571579a3ad7a1d3ee909f11dba5e0b9440.  I also rebased 0003 and
> edited it a bit; see attached hash-cleanup-changes.patch.
>

Thanks for the commit. I had put lot of efforts for this and very
happy that it got committed. Thanks to Amit too for the detail review.

> I'm not entirely sold on 0003.  An alternative would be to rip the lsn
> stuff back out of HashScanPosData, and I think we ought to consider
> that.  Basically, 0003 is betting that getting rid of the
> lock-chaining in hash index vacuum is more valuable than being able to
> kill dead items more aggressively.  I bet that's a bad bet.
>
> In the case of btree indexes, since
> 2ed5b87f96d473962ec5230fd820abfeaccb2069, page-at-a-time scanning
> allows most btree index scans to avoid holding buffer pins when the
> scan is suspended, but we gain no such advantage here.  We always have
> to hold a pin on the primary bucket page anyway, so even with this
> patch cleanup is going to block when it hits a bucket containing a
> suspended scan.  0003 helps if (a) the relation is permanent, (b) the
> bucket has overflow pages, and (c) the scan is moving faster than
> vacuum and can overtake it instead of waiting.  But that doesn't seem
> like it will happen very often at all, whereas the LSN check will
> probably fail frequently and cause us to skip cleanup that we could
> usefully have done.  So I propose the attached hashscan-no-lsn.patch
> as an alternative.
>
> Thoughts?
>
> --

Yes, I too feel that 0003 patch won't help much. The reason being, the
chances of scan overtaking vacuum would be very rare and also
considering the fact that hash index is normally meant for unique
values (I mean that is when hash index is quite dominant over other
indexes) which means the chances of overflow pages in hash index won't
be high. Therefore, i feel, 0003 patch won't be much beneficial.
Honestly speaking, the code changes in 0003 looks a bit ugly as well.
So, yes, hashscan no-lsn.patch looks like a better option to me.
Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-21 Thread Ashutosh Sharma

On Thu, Sep 21, 2017 at 9:30 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 11:43 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Attached are the patches with above changes. Thanks.
>
> Thanks.  I think that the comments and README changes in 0003 need
> significantly more work.  In several places, they fail to note the
> unlogged vs. logged differences, and the header comment for
> hashbucketcleanup still says that scans depend on increasing-TID order
> (really, 0001 should change that text somehow).
>

I have added a note for handling of logged and unlogged tables in
README file and also corrected the header comment for
hashbucketcleanup(). Please find the attached 0003*.patch having these
changes. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 485627178cfaf73a908498e79229e4db04a99648 Mon Sep 17 00:00:00 2001
From: ashu <ashutosh.sha...@enterprisedb.com>
Date: Wed, 20 Sep 2017 20:53:06 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma.
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 437 +++
 src/backend/access/hash/hashutil.c   |  67 +-
 src/include/access/hash.h|  55 -
 6 files changed, 553 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..3b1f719 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+	the pin on the primary bucket throughout the scan)
+	save all the matching tuples from current index page into an items array
+	release pin and content lock (but if it is primary bucket page retain
+	its pin till the end of the scan)
+	get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page.  Since deletion is not possible while a reader holds the pin on bucket,
-and we assume that heap tuple TIDs are unique, this can be implemented by
-searching for the same heap tuple TID previously returned.  Insertion does
-not move index entries across pages, so the previously-returned index entry
-should always be on the same page, at the same or higher offset number,
-as it was before.
+To minimize lock/unlock traffic, hash index scan always searches the entire
+hash page to identify all the matching items at once, copying their heap tuple
+IDs into backend-local storage. The heap tuple IDs are then processed while not
+holding any page lock within the index thereby, allowing concurrent insertion
+to happen on the same index page without any requirement of re-finding the
+current scan position for the reader. We do continue to hold a pin on the
+bucket page, to protect against concurrent deletions and bucket split.
 
 To allow for scans during a bucket split, if at the start of the scan, the
 bucket is marked as bucket-being-populated, it scan all the tuples in that
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index d89c192..8550218 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,21 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
-	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-20 Thread Ashutosh Sharma

On Thu, Sep 21, 2017 at 9:30 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 11:43 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Attached are the patches with above changes. Thanks.
>
> Thanks.  I think that the comments and README changes in 0003 need
> significantly more work.  In several places, they fail to note the
> unlogged vs. logged differences, and the header comment for
> hashbucketcleanup still says that scans depend on increasing-TID order
> (really, 0001 should change that text somehow).
>

Thanks for putting that point. I will try to correct the comments in
hashbucketcleanup(),  mention about the handling done for logged and
unlogged tables in README and submit the updated patch asap.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-20 Thread Ashutosh Sharma

On Wed, Sep 20, 2017 at 8:05 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 5:37 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Thanks for all your review comments. Please find my comments in-line.
>
> +if (!BlockNumberIsValid(opaque->hasho_nextblkno))
> +{
> +if (so->currPos.buf == so->hashso_bucket_buf ||
> +so->currPos.buf == so->hashso_split_bucket_buf)
> +prev_blkno = InvalidBlockNumber;
> +else
> +prev_blkno = opaque->hasho_prevblkno;
> +}
>
> 1. Why not remove the outer "if" statement?
>

Yes, the outer if statement is not required. I just missed to remove
that in my earlier patch.

> 2. How about adding a comment, like /* If this is a primary bucket
> page, hasho_prevblkno is not a real block number. */
>

Added.

>> When _hash_readpage() doesn't find any qualifying tuples i.e. when
>> _hash_readnext() returns Invalid buffer, we just update prevPage,
>> nextPage and buf in
>> currPos (not currPage or lsn) as currPage and lsn should point to last
>> page in the hash bucket so that we can mark the killed items as dead
>> at the end of scan (with the help of _hash_kill_items). Hence, we keep
>> the currpage and lsn as it is if no more valid hash pages are found.
>
> How about adding a comment about this, by extending this comment:
>
> + * Remember next and previous block numbers for scrollable
> + * cursors to know the start position and return FALSE
> + * indicating that no more matching tuples were found.
>
> e.g. (Don't reset currPage or lsn, because we expect _hash_kill_items
> to be called for the old page after this function returns.)
>
>

Added.

Attached are the patches with above changes. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 485627178cfaf73a908498e79229e4db04a99648 Mon Sep 17 00:00:00 2001
From: ashu <ashutosh.sha...@enterprisedb.com>
Date: Wed, 20 Sep 2017 20:53:06 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma.
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 437 +++
 src/backend/access/hash/hashutil.c   |  67 +-
 src/include/access/hash.h|  55 -
 6 files changed, 553 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..3b1f719 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+	the pin on the primary bucket throughout the scan)
+	save all the matching tuples from current index page into an items array
+	release pin and content lock (but if it is primary bucket page retain
+	its pin till the end of the scan)
+	get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page.  Since deletion is not possible while a reader holds the pin on bucket,
-and we assume that heap tuple TIDs are unique, this can be implemented by
-searching for the same heap tuple TID previously returned.  Insertion does
-not move index entries across pages, so the previously-returned index entry
-should always be on the same page, at the same or higher offset number,
-as it was before.
+To minimize lock/unlock traffic, hash index scan always searches the entire
+hash page to identify all the matching items at once, copying their heap tuple
+IDs into backend-local storage. The heap tuple IDs are then processed while not
+holding any page lock within the index thereby, allowing concurrent insertion
+to happen on the same index page without any requirement of re-finding the
+current scan position for the reader. We do continue to hold a pin on the
+bucket page, to protect against concurrent deletions and bucket split.

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-20 Thread Ashutosh Sharma

Thanks for all your review comments. Please find my comments in-line.

On Tue, Sep 19, 2017 at 9:49 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Aug 24, 2017 at 11:26 AM, Jesper Pedersen
> <jesper.peder...@redhat.com> wrote:
>> Based on the feedback in this thread, I have moved the patch to "Ready for
>> Committer".
>
> Reviewing 0001:
>
> _hash_readpage gets the page LSN to see if we can apply LP_DEAD hints,
> but if the table is unlogged or temporary, the LSN will never change,
> so the test in _hash_kill_items will always think that it's OK to
> apply the hints.  (This seems like it might be a pretty serious
> problem, because I'm not sure what would be a viable workaround.)
>

Amit has already replied to this query up-thread.

> The logic that tries to ensure that so->currPos.{buf,currPage,lsn} get
> updated is not, to my eyes, obviously correct. Generally, the logic
> for this stuff seems unnaturally spread out to me.  For example,
> _hash_next() updates currPos.buf, but leaves it to _hash_readpage to
> set currPage and lsn.  That function also sets all three fields when
> it advances to the next page by calling _hash_readnext(), but when it
> tries to advance to the next page and doesn't find one it sets buf but
> not currPage or lsn.  It seems to me that this logic should be more
> centralized somehow.  Can we arrange things so that at least buf,
> currPage, and lsn, and maybe also nextPage and prevPage, get updated
> at the same time and as soon after reading the buffer as possible?
>

Okay, I have tried to update currPos.{buf, currPage, lsn} in
_hash_readpage() at the same time. Please have a look into the
attached 0001*.patch.

When _hash_readpage() doesn't find any qualifying tuples i.e. when
_hash_readnext() returns Invalid buffer, we just update prevPage,
nextPage and buf in
currPos (not currPage or lsn) as currPage and lsn should point to last
page in the hash bucket so that we can mark the killed items as dead
at the end of scan (with the help of _hash_kill_items). Hence, we keep
the currpage and lsn as it is if no more valid hash pages are found.

> It would be bad if a primary bucket page's hasho_prevblkno field got
> copied into so->currPos.prevpage, because the value that appears for
> the primary bucket is not a real block number.  But _hash_readpage
> seems like it can bring about this exact situation, because of this
> code:
>
> +if (!BlockNumberIsValid(opaque->hasho_nextblkno))
> +prev_blkno = opaque->hasho_prevblkno;
> ...
> +so->currPos.prevPage = prev_blkno;
>
> If we're reading the primary bucket page and there are no overflow
> pages, hasho_nextblkno will not be valid and hasho_prevblkno won't be
> a real block number.
>

Fixed. Thanks for putting that point.

> Incidentally, the "if" statement in the above block of code is
> probably not saving anything; I would suggest for clarity that you do
> the assignment unconditionally (but this is just a matter of style, so
> I don't feel super-strongly about it).
>
> +return (so->currPos.firstItem <= so->currPos.lastItem);
>
> Can this ever return false?  It looks to me like we should never reach
> this code unless that condition holds, and that it would be a bug if
> we did.  If that's correct, maybe this should be an Assert() and the
> return statement should just return true.
>

No, it will never return FALSE. I have changed it to Assert statement.

> +buf = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE);
> +
> +/* It might not exist anymore; in which case we can't hint it. */
> +if (!BufferIsValid(buf))
> +return;
>
> This is dead code, because _hash_getbuf always returns a valid buffer.
> If there's actually a risk of the buffer disappearing, then some other
> handling is needed for this case.  But I suspect that, since a scan
> always holds a pin on the primary bucket, there's actually no way for
> this to happen and this is just dead code.
>

Removed the redundant code.

> The comment in hashgetbitmap claims that _hash_first() or _hash_next()
> never returns dead tuples.  If that were true, it would be a bug,
> because then scans started during recovery would return wrong answers.
> A more accurate statement would be something like: _hash_first and
> _hash_next handle eliminate dead index entries whenever
> scan->ignored_killed_tuples is true.  Therefore, there's nothing to do
> here except add the results to the TIDBitmap.
>

Corrected.

> _hash_readpage contains unnecessary "continue" statements inside the
> loops.  The reason that they're unnecessary is that there's no code
> below that in the loop anyway, so the

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-09-14 Thread Ashutosh Sharma

Hi Craig,

On Thu, Aug 17, 2017 at 5:50 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 16 August 2017 at 23:14, Robert Haas <robertmh...@gmail.com> wrote:
>>
>> On Tue, Aug 15, 2017 at 4:36 PM, Tomas Vondra
>> <tomas.von...@2ndquadrant.com> wrote:
>> > You might say that people investigating issues in this area of code
>> > should
>> > be aware of how HEAP_XMIN_FROZEN is defined, and perhaps you're right
>> > ...
>>
>> Yes, I think that's what I would say.  I mean, if you happen to NOT
>> know that committed|invalid == frozen, but you DO know what committed
>> means and what invalid means, then you're going to be *really*
>> confused when you see committed and invalid set on the same tuple.
>> Showing you frozen has got to be clearer.
>>
>> Now, I agree with you that a test like (enumval_tup->t_data &
>> HEAP_XMIN_COMMITTED) could be confusing to someone who doesn't realize
>> that HEAP_XMIN_FROZEN & HEAP_XMIN_COMMITTED == HEAP_XMIN_COMMITTED,
>> but I think that's just one of those things that unfortunately is
>> going to require adequate knowledge for people investigating issues.
>> If there's an action item there, it might be to try to come up with a
>> way to make the source code clearer.
>>
>
> For other multi-purpose flags we have macros, and I think it'd make sense to
> use them here too.
>
> Eschew direct use of  HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID and
> HEAP_XMIN_FROZEN in tests. Instead, consistently use HeapXminIsFrozen(),
> HeapXminIsCommitted(), and HeapXminIsInvalid() or something like that.
>
> --

Are you planning to work on the review comments from Robert, Moon
Insung and supply the new patch. I just had a quick glance into this
mail thread (after a long time) and could understand Robert's concern
till some extent. I think, he is trying to say that if a tuple is
frozen (committed|invalid) then it shouldn't be shown as COMMITTED and
INVALID together in fact it should just be displayed as FROZEN tuple.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Warnings "unrecognized node type" for some DDLs with log_statement = 'ddl'

2017-09-14 Thread Ashutosh Sharma

On Thu, Sep 14, 2017 at 9:24 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> Hi all,
>
> While reviewing another patch, I have bumped into a couple of failures
> when running installcheck if log_statement = 'ddl'. This pops
> regression failures for 4 tests: object_address, alter_generic,
> alter_operator and stats_ext involving commands CREATE STATISTICS and
> ALTER OPERATOR.
>
> You can as well reproduce the failures using simply that:
> =# create table aa (a int, b int);
> CREATE TABLE
> =# CREATE STATISTICS aa_stat ON a, b FROM aa;
> WARNING:  01000: unrecognized node type: 332
> LOCATION:  GetCommandLogLevel, utility.c:3357
> ERROR:  42P17: extended statistics require at least 2 columns
> LOCATION:  CreateStatistics, statscmds.c:220
> =# ALTER OPERATOR = (boolean, boolean) SET (RESTRICT = NONE);
> WARNING:  01000: unrecognized node type: 294
> LOCATION:  GetCommandLogLevel, utility.c:3357
> ALTER OPERATOR
>
> Attached is a patch to fix all the failures I have spotted. As CREATE
> STATISTICS is new in PG10, I am adding an open item as things come
> from 7b504eb2. The problems of ALTER OPERATOR are introduced by 9.6.
> Still I would suggest to fix everything at the same time. The problem
> comes from GetCommandLogLevel() which forgot to add T_CreateStatsStmt
> and T_AlterOperatorStmt. I have noticed as well that
> T_AlterCollationStmt was missing.
>
Hmm...I am also able to reproduce the failures reported by you. Your
patch does solve the problem. Just to confirm if we are still missing
TAGS for any other Statement nodes, I had a quick look into the list
given in 'nodes.h' file but couldn't find any missing nodes. So, yes,
your patch does solve the problem completely and looks good to me.
Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Ashutosh Sharma

Hi,

On Wed, Sep 13, 2017 at 3:15 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> Hi Thomas,
>
> On Wed, Sep 13, 2017 at 2:57 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Wed, Sep 13, 2017 at 9:11 PM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>>> On Wed, Sep 13, 2017 at 8:58 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>>> wrote:
>>>> I am getting the following error message when trying to build latest
>>>> PG source on Windows,
>>>>
>>>> Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
>>>> C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468
>>>
>>> Googling around I see some indications that the macro may not be
>>> defined in all implementations and that some other projects test if
>>> it's defined:
>>
>> Does this work for you Ashutosh?
>>

Thanks for the patch. Yes, it works for me.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-09-13 Thread Ashutosh Sharma

On Wed, Sep 13, 2017 at 7:11 AM, Tsunakawa, Takayuki
<tsunakawa.ta...@jp.fujitsu.com> wrote:
> Hi Thomas, Magnus
>
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro
>> Since it only conflicts with c7b8998e because of pgindent whitespace
>> movement, I applied it with "patch -p1 --ignore-whitespace" and created
>> a new patch.  See attached.
>
> Thanks, Thomas.  I've added your name in the CF entry so that your name will 
> also be listed on the release note, because my patch is originally based on 
> your initial try.  Please remove your name just in case you mind it.  BTW, 
> your auto-reviewer looks very convenient.  Thank you again for your great 
> work.
>
> Magnus, it would be grateful if you could review and commit the patch while 
> your memory is relatively fresh.
>
> I've been in a situation which keeps me from doing development recently, but 
> I think I can gradually rejoin the community activity soon.
>

I have once again tested the latest patch (v14 patch) on Windows and
the results looked fine to me. Basically I have repeated the test
cases which I had done earlier on v8 patch. For more details, on the
tests that i have re-executed, please refer to - [1]. Thanks.

[1]-   
https://www.postgresql.org/message-id/CAE9k0Pkz%2BtOiPmx2LrVePM7cZydTLNbQ6R3GqgeivurfsXyZ5w%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Ashutosh Sharma

Hi Thomas,

On Wed, Sep 13, 2017 at 2:57 PM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Wed, Sep 13, 2017 at 9:11 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Wed, Sep 13, 2017 at 8:58 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>> I am getting the following error message when trying to build latest
>>> PG source on Windows,
>>>
>>> Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
>>> C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468
>>
>> Googling around I see some indications that the macro may not be
>> defined in all implementations and that some other projects test if
>> it's defined:
>
> Does this work for you Ashutosh?
>
> --

I am currently stuck with some other task on Windows. Is it okay if i
can let you know the results in another 30-40 mins. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Ashutosh Sharma

Hi,

I am getting the following error message when trying to build latest
PG source on Windows,

Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468

I think, it got introduced in the following git commit,

commit 83aaac41c66959a3ebaec7daadc4885b5f98f561
Author: Peter Eisentraut <pete...@gmx.net>
Date:   Tue Sep 12 09:46:14 2017 -0400

 Allow custom search filters to be configured for LDAP auth.

Please ignore this email if this issue has already been reported.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-09-13 Thread Ashutosh Sharma

On Wed, Sep 13, 2017 at 1:15 PM, Alexey Chernyshov
<a.chernys...@postgrespro.ru> wrote:
> On Sat, 9 Sep 2017 13:53:35 +0530
> Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>
>>
>> Finally, i got some time to look into this patch and surprisingly I
>> didn't find any function returning information at page level instead
>> all the SQL functions are returning information at index level.
>> Therefore, i too feel that it should be either integrated with
>> pgstattuple which could a better match as Tomas also mentioned or may
>> be add a new contrib module itself. I think, adding a new contrib
>> module looks like a better option. The reason being, it doesn't simply
>> include the function for showing index level statistics (for e.g..
>> index size, no of rows, values..e.t.c) like pgstattuple does but,
>> also, displays information contained in a page for e.g. the object
>> stored in the page,  it's tid e.t.c. So, more or less, I would say
>> that, it's the mixture of pageinspect and pgstattuple module and
>> therefore, i feel, adding it as a new extension would be a better
>> choice. Thought?
>
> Thank you for your interest, I will add a new contrib module named,
> say, indexstat. I think we can add statistics on other indexes in the
> future.
>

I think we should wait for experts opinion and then take a call. I am
not expert. I just gave my opinion as i have worked in this area
earlier when working for hash index. Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup fails on Windows when using tablespace mapping

2017-09-09 Thread Ashutosh Sharma

On Tue, Jun 27, 2017 at 6:36 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Jun 27, 2017 at 7:46 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> I am still seeing the issue with the attached patch. I had a quick
>> look into the patch. It seems to me like you have canonicalized the
>> tablespace path to convert win32 slashes to unix type of slashes but
>> that is not being passed to strcmp() function and probably that could
>> be the reason why the issue is still existing. Thanks.
>>
>> for (cell = tablespace_dirs.head; cell; cell = cell->next)
>> -   if (strcmp(dir, cell->old_dir) == 0)
>> +   if (strcmp(canon_dir, cell->old_dir) == 0)
>
> Thanks. I had the correct version on my Windows box actually, just
> messed up the attachment.
> --

Okay. I have once again reviewed your patch and tested it on Windows
as well as Linux. The patch LGTM. I am now marking it as Ready For
Committer. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-09-09 Thread Ashutosh Sharma

On Fri, Sep 8, 2017 at 3:38 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> Hi,
>
> On 07/21/2017 03:40 PM, Alexander Korotkov wrote:
>> Hi, Alexey!
>>
>> On Fri, Jul 21, 2017 at 3:05 PM, Alexey Chernyshov
>> <a.chernys...@postgrespro.ru <mailto:a.chernys...@postgrespro.ru>> wrote:
>>
>> the following patch transfers functionality from gevel module
>> (http://www.sai.msu.su/~megera/wiki/Gevel
>> <http://www.sai.msu.su/~megera/wiki/Gevel>) which provides functions for
>> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
>> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
>> GIN indexes.
>>
>>
>> It's very good that you've picked up this work!  pageinspect is lacking
>> of this functionality.
>>
>> Functions added:
>>  - gist_stat(text) - shows statistics on GiST Tree
>>  - gist_tree(text) - shows GiST tree
>>  - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
>>  - gist_print(text) - prints objects stored in GiST tree
>>  - spgist_stat(text) - shows statistics on SP-GiST
>>  - spgist_print(text) - prints objects stored in index
>>  - gin_value_count() - originally gin_stat(text) - prints estimated
>> counts
>> for index values
>>  - gin_stats() - originally gin_statpage(text) - shows statistics
>>  - gin_count_estimate(text, tsquery) - shows number of indexed rows
>> matched
>> query
>>
>> Tests also transferred, docs for new functions are added. I run pgindent
>> over the code, but the result is different from those I expected, so
>> I leave
>> pgindented one.
>> The patch is applicable to the commit
>> 866f4a7c210857aa342bf901558d170325094dde.
>>
>>
>> As we can see, gevel contains functions which analyze the whole index.
>>  pageinspect is written in another manner: it gives you functionality to
>> analyze individual pages, tuples and so on.
>> Thus, we probably shouldn't try to move gevel functions to pageinspect
>> "as is".  They should be rewritten in more granular manner as well as
>> other pageinspact functions are.  Any other opinions?
>>
>
> I agree with Alexander that pageinspect is written in a very different
> way - as the extension name suggests, it's used to inspect pages. The
> proposed patch uses a very different approach, reading the whole index,
> not individual pages. Why should it be part of pageinspect?
>
> For example we have pgstattuple extension, which seems like a better
> match. Or just create a new extension - if the code is valuable, surely
> we can live one more extension instead of smuggling it in inside
> pageinspect.
>

Finally, i got some time to look into this patch and surprisingly I
didn't find any function returning information at page level instead
all the SQL functions are returning information at index level.
Therefore, i too feel that it should be either integrated with
pgstattuple which could a better match as Tomas also mentioned or may
be add a new contrib module itself. I think, adding a new contrib
module looks like a better option. The reason being, it doesn't simply
include the function for showing index level statistics (for e.g..
index size, no of rows, values..e.t.c) like pgstattuple does but,
also, displays information contained in a page for e.g. the object
stored in the page,  it's tid e.t.c. So, more or less, I would say
that, it's the mixture of pageinspect and pgstattuple module and
therefore, i feel, adding it as a new extension would be a better
choice. Thought?

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Ashutosh Sharma

On Wed, Aug 23, 2017 at 7:39 PM, Jesper Pedersen
<jesper.peder...@redhat.com> wrote:
> On 08/23/2017 07:38 AM, Amit Kapila wrote:
>>
>> Thanks for the new version.  I again looked at the patches and fixed
>> quite a few comments in the code and ReadMe.  You have forgotten to
>> update README for the changes in vacuum patch
>> (0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
>> don't have anything more to add.  If you are okay with changes, then
>> we can move it to Ready For Committer unless someone else has some
>> more comments.
>>
>
> Just some minor comments.

Thanks for the review.

>
> README:
> +   it's pin till the end of scan)
>
> its pin till the end of the scan)

Corrected.

>
> +To minimize lock/unlock traffic, hash index scan always searches entire
> hash
>
> To minimize lock/unlock traffic, hash index scan always searches the entire
> hash

Done.

>
> hashsearch.c:
>
> +static inline void _hash_saveitem(HashScanOpaque so, int itemIndex,
> +  OffsetNumber offnum, IndexTuple itup);
>
> There are other instances of "inline" in the code base, so I guess that this
> is ok.
>
> +* Advance to next tuple on current page; or if there's no more, try
> to
>
> Advance to the next tuple on the current page; or if done, try to
>

Done.

Attached are the patches with above changes.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 501d48ef3b566569c687a9a4ac4b239b2278c789 Mon Sep 17 00:00:00 2001
From: ashu <ashutosh.sha...@enterprisedb.com>
Date: Thu, 24 Aug 2017 10:19:58 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma.
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 426 +++
 src/backend/access/hash/hashutil.c   |  74 +-
 src/include/access/hash.h|  55 -
 6 files changed, 549 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..3b1f719 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+	the pin on the primary bucket throughout the scan)
+	save all the matching tuples from current index page into an items array
+	release pin and content lock (but if it is primary bucket page retain
+	its pin till the end of the scan)
+	get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page.  Since deletion is not possible while a reader holds the pin on bucket,
-and we assume that heap tuple TIDs are unique, this can be implemented by
-searching for the same heap tuple TID previously returned.  Insertion does
-not move index entries across pages, so the previously-returned index entry
-should always be on the same page, at the same or higher offset number,
-as it was before.
+To minimize lock/unlock traffic, hash index scan always searches the entire
+hash page to identify all the matching items at once, copying their heap tuple
+IDs into backend-local storage. The heap tuple IDs are then processed while not
+holding any page lock within the index thereby, allowing concurrent insertion
+to happen on the same index page without any requirement of re-finding the
+current scan position for the reader. We do continue to hold a pin on the
+bucket page, to protect against concurrent deletions and bucket split.
 
 To allow for scans during a bucket split, if at the start of the scan, the
 bucket is marked as bucket-being-populated, it scan all the tuples in that
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index d89c192..45a3a5a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,21 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRe

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-23 Thread Ashutosh Sharma

Hi Amit,

On Wed, Aug 23, 2017 at 5:08 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Aug 22, 2017 at 7:24 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> On Tue, Aug 22, 2017 at 3:55 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>
>> Okay, I got your point now. I think, currently in _hash_kill_items(),
>> if an overflow page is pinned we do not check if it got modified since
>> the last read or
>> not. Hence, if the vacuum runs on an overflow page that is pinned and
>> also has some dead tuples in it then it could create a problem for
>> scan basically,
>> when scan would attempt to mark the killed items as dead. To get rid
>> of such problem, i think, even if an overflow page is pinned we should
>> check if it got
>> modified or not since the last read was performed on the page. If yes,
>> then do not allow scan to mark killed items as dead. Attached is the
>> newer version with these changes along with some other cosmetic
>> changes mentioned in your earlier email. Thanks.
>>
>
> Thanks for the new version.  I again looked at the patches and fixed
> quite a few comments in the code and ReadMe.  You have forgotten to
> update README for the changes in vacuum patch
> (0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v7).  I
> don't have anything more to add.  If you are okay with changes, then
> we can move it to Ready For Committer unless someone else has some
> more comments.
>

Thanks for reviewing my patches. I've gone through the changes done by
you in the README file and few changes in code comments. The changes
looks valid to me. But, it seems like there are some more minor review
comments from Jesper which i will fix and share the new set of patches
shortly.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-22 Thread Ashutosh Sharma

On Tue, Aug 22, 2017 at 3:55 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Tue, Aug 22, 2017 at 2:28 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> On Sat, Aug 19, 2017 at 11:37 AM, Amit Kapila <amit.kapil...@gmail.com> 
>> wrote:
>>> On Fri, Aug 11, 2017 at 6:51 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>>> wrote:
>>>>>
>>>>> 7.
>>>>> _hash_kill_items(IndexScanDesc scan)
>>>>> {
>>>>> ..
>>>>> + /*
>>>>> + * If page LSN differs it means that the page was modified since the
>>>>> + * last read. killedItems could be not valid so LP_DEAD hints apply-
>>>>> + * ing is not safe.
>>>>> + */
>>>>> + page = BufferGetPage(buf);
>>>>> + if (PageGetLSN(page) != so->currPos.lsn)
>>>>> + {
>>>>> + _hash_relbuf(rel, buf);
>>>>> + return;
>>>>> + }
>>>>> ..
>>>>> }
>>>>>
>>>>> How does this check cover the case of unlogged tables?
>>>>
>>>> Thanks for putting that point. It doesn't cover the case for unlogged
>>>> tables. As suggested by you in one of your email in this mailing list, i am
>>>> now not allowing vacuum to release lock on current page before acquiring
>>>> lock on next page for unlogged tables. This will ensure that scan is always
>>>> behind vacuum if they are running on the same bucket simultaneously.
>>>> Therefore, there is danger in marking tuples as dead for unlogged pages 
>>>> even
>>>> if they are not having any lsn.
>>>>
>>>
>>
>> Once again, Thank you for reviewing my patches.
>>
>>> In the last line, I guess you wanted to say "there is *no* danger
>>> ..."?
>>
>> Yes, i meant that because, it ensures that scan will always be following 
>> VACUUM.
>>
>> Today, while again thinking about this part of the patch
>>> (_hash_kill_items) it occurred to me that we can't rely on a pin on an
>>> overflow page to guarantee that it is not modified by Vacuum.
>>> Consider a case where vacuum started vacuuming the bucket before the
>>> scan and then in-between scan overtakes it.  Now, it is possible that
>>> even if a scan has a pin on a page (and no lock), vacuum might clean
>>> that page, if that happens, then we can't prevent the reuse of TID.
>>> What do you think?
>>>
>>
>> I think, you are talking about non-mvcc scan case, because in case of
>> mvcc scans, even if we have released both pin and lock on a page,
>> VACUUM can't remove tuples from a page if it is visible to some
>> concurrently running transactions (mvcc scan in our case).
>>
>
> I am talking about tuples that are marked as dead in heap. It has
> nothing to do with the visibility of tuple or type of scan (mvcc or
> non-mvcc).
>

Okay, I got your point now. I think, currently in _hash_kill_items(),
if an overflow page is pinned we do not check if it got modified since
the last read or
not. Hence, if the vacuum runs on an overflow page that is pinned and
also has some dead tuples in it then it could create a problem for
scan basically,
when scan would attempt to mark the killed items as dead. To get rid
of such problem, i think, even if an overflow page is pinned we should
check if it got
modified or not since the last read was performed on the page. If yes,
then do not allow scan to mark killed items as dead. Attached is the
newer version with these changes along with some other cosmetic
changes mentioned in your earlier email. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From cb724914ebb56c9b525e165b0117bfa70aac7692 Mon Sep 17 00:00:00 2001
From: ashu <ashutosh.sha...@enterprisedb.com>
Date: Tue, 22 Aug 2017 18:53:47 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma <ashu.coe...@gmail.com>
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 426 +++
 src/backend/access/hash/hashutil.c   |  72 +-
 src/include/access/hash.h|  55 -
 6 files changed, 547 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..eef7d66 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire cont

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-22 Thread Ashutosh Sharma

On Sat, Aug 19, 2017 at 11:37 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Fri, Aug 11, 2017 at 6:51 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>>>
>>> 7.
>>> _hash_kill_items(IndexScanDesc scan)
>>> {
>>> ..
>>> + /*
>>> + * If page LSN differs it means that the page was modified since the
>>> + * last read. killedItems could be not valid so LP_DEAD hints apply-
>>> + * ing is not safe.
>>> + */
>>> + page = BufferGetPage(buf);
>>> + if (PageGetLSN(page) != so->currPos.lsn)
>>> + {
>>> + _hash_relbuf(rel, buf);
>>> + return;
>>> + }
>>> ..
>>> }
>>>
>>> How does this check cover the case of unlogged tables?
>>
>> Thanks for putting that point. It doesn't cover the case for unlogged
>> tables. As suggested by you in one of your email in this mailing list, i am
>> now not allowing vacuum to release lock on current page before acquiring
>> lock on next page for unlogged tables. This will ensure that scan is always
>> behind vacuum if they are running on the same bucket simultaneously.
>> Therefore, there is danger in marking tuples as dead for unlogged pages even
>> if they are not having any lsn.
>>
>

Once again, Thank you for reviewing my patches.

> In the last line, I guess you wanted to say "there is *no* danger
> ..."?

Yes, i meant that because, it ensures that scan will always be following VACUUM.

Today, while again thinking about this part of the patch
> (_hash_kill_items) it occurred to me that we can't rely on a pin on an
> overflow page to guarantee that it is not modified by Vacuum.
> Consider a case where vacuum started vacuuming the bucket before the
> scan and then in-between scan overtakes it.  Now, it is possible that
> even if a scan has a pin on a page (and no lock), vacuum might clean
> that page, if that happens, then we can't prevent the reuse of TID.
> What do you think?
>

I think, you are talking about non-mvcc scan case, because in case of
mvcc scans, even if we have released both pin and lock on a page,
VACUUM can't remove tuples from a page if it is visible to some
concurrently running transactions (mvcc scan in our case). So, i don't
think it can happen in case of MVCC scans however it can happen for
non-mvcc scans and for that to handle i think, it is better that we
drop an idea of allowing scan to overtake
VACUUM (done by
0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v5.patch).

However, B-Tree has handled this in _bt_drop_lock_and_maybe_pin()
where it releases both pin and lock on a page if it is MVCC snapshot
else just
releases lock on the page.

> Few other comments:
>
> 1.
> + * On failure exit (no more tuples), we return FALSE with pin
> + * pin held on bucket page but no pins or locks held on overflow
> + * page.
>   */
>  bool
>  _hash_next(IndexScanDesc scan, ScanDirection dir)
>
> In the above part of comment 'pin' is used twice.

OKay, I will remove one extra pin (from comment) in my next version of patch.

>
> 0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v5
>
> 2.
> - * not at all by the rearrangement we are performing here.  To prevent
> - * any concurrent scan to cross the squeeze scan we use lock chaining
> - * similar to hasbucketcleanup.  Refer comments atop hashbucketcleanup.
> + * not at all by the rearrangement we are performing here.
>
> In _hash_squeezebucket, we still use lock chaining, so removing the
> above comment doesn't seem like a good idea.  I think you should copy
> part of a comment from hasbucketcleanup starting from "There can't be
> any concurrent .."

Okay, I will correct it in my next version of patch.

>
> 3.
> _hash_freeovflpage()
> {
> ..
>
> * Concurrency issues are avoided by using lock chaining as
> * described atop hashbucketcleanup.
> ..
> }
>
> After fixing #2, you need to change the function name in above comment.
>

Sure, I will correct that in my next version of patch.

With Regards,
Ashutosh Sharma.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-11 Thread Ashutosh Sharma

  I think you are setting it in hashgettuple.  It is better
> to move that assignment from hashgettuple to _hash_first so as to be
> consistent with _hash_next.

Done that way.

>
> 4.
> + * On successful exit, scan->xs_ctup.t_self is set to the TID
> + * of the next heap tuple, and if requested, scan->xs_itup
> + * points to a copy of the index tuple.  so->currPos is updated
> + * as needed.
> + *
> + * On failure exit (no more tuples), we return FALSE with no
> + * pins or locks held.
>   */
>  bool
>  _hash_next(IndexScanDesc scan, ScanDirection dir)
>
> I don't see the usage of scan->xs_itup in this function.  It seems to
> me that you have copied it from btree code and forgot to remove part
> of the comment which is not relevant to a hash index.

Corrected.

>
> 5.
> @@ -514,19 +422,14 @@ hashendscan(IndexScanDesc scan)
> {
> ..
> + if (HashScanPosIsValid(so->currPos))
>   {
> - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
> - _hash_kill_items(scan);
> - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
> + /* Before leaving current page, deal with any killed items */
> + if (so->numKilled > 0)
> + _hash_kill_items(scan);
> + _hash_dropscanbuf(rel, so);
>   }
>
> - _hash_dropscanbuf(rel, so);
> -
> ..
> }
>
> I don't think it is a good idea to move _hash_dropscanbuf as that
> check just ensures if the current buffer is valid.  It doesn't say
> anything about other buffers saved in HashScanOpaque.  A similar
> change is required in hashrescan.

Done.

>
> 6.
> _hash_kill_items(IndexScanDesc scan)
> {
> ..
> + if (so->hashso_bucket_buf == so->currPos.buf ||
> + HashScanPosIsPinned(so->currPos))
> ..
> }
>
> Isn't second check enough?  It should indirectly cover the first test.

Yes, one check should be fine. Corrected it.

>
> 7.
> _hash_kill_items(IndexScanDesc scan)
> {
> ..
> + /*
> + * If page LSN differs it means that the page was modified since the
> + * last read. killedItems could be not valid so LP_DEAD hints apply-
> + * ing is not safe.
> + */
> + page = BufferGetPage(buf);
> + if (PageGetLSN(page) != so->currPos.lsn)
> + {
> + _hash_relbuf(rel, buf);
> + return;
> + }
> ..
> }
>
> How does this check cover the case of unlogged tables?

Thanks for putting that point. It doesn't cover the case for unlogged
tables. As suggested by you in one of your email in this mailing list, i am
now not allowing vacuum to release lock on current page before acquiring
lock on next page for unlogged tables. This will ensure that scan is always
behind vacuum if they are running on the same bucket simultaneously.
Therefore, there is danger in marking tuples as dead for unlogged pages
even if they are not having any lsn.

With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 818cb83a05ab114712e035f9d33bd3072352ff7d Mon Sep 17 00:00:00 2001
From: ashu <ashutosh1...@example.com>
Date: Fri, 11 Aug 2017 17:02:29 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma <ashu.coe...@gmail.com>
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 146 ++--
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 426 +++
 src/backend/access/hash/hashutil.c   |  69 +-
 src/include/access/hash.h|  55 -
 6 files changed, 544 insertions(+), 187 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..eef7d66 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
 the reader's current-tuple pointer from being invalidated by splits or
 compactions.  (Of course, other buckets can still be split or compacted.)
 
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page.  Since deletio

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-10 Thread Ashutosh Sharma

On Thu, Aug 10, 2017 at 8:06 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Thu, Aug 10, 2017 at 8:30 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>> So, the question is should we allow the preprocessor option
>>> '_USE_32BIT_TIME_T' to be defined implicitly or not in postgresql
>>> provided we are using Visual C compiler version > 8.0. I think this
>>> should make plperl compatible with perl binaries.
>
>> We've got this code in MSBuildProject.pm and VCBuildProject.pm:
>
>> # We have to use this flag on 32 bit targets because the 32bit perls
>> # are built with it and sometimes crash if we don't.
>> my $use_32bit_time_t =
>>   $self->{platform} eq 'Win32' ? '_USE_32BIT_TIME_T;' : '';
>
>> Based on the discussion upthread, I think we now know that this was
>> not really the right approach.
>
> Yeah ... however, if that's there, then there's something wrong with
> Ashutosh's explanation, because that means we *are* building with
> _USE_32BIT_TIME_T in 32-bit builds.  It's just getting there in a
> roundabout way.  (Or, alternatively, this code is somehow not doing
> anything at all.)

I am extremely sorry if i have communicated the things wrongly, what i
meant was we are always considering _USE_32BIT_TIME_T flag to build
plperl module on Windows 32-bit platform but unfortunately that is not
being considered/defined in perl code in case we use VC++ compiler
version greater than 8.0. and that's the reason for the binary
mismatch error on 32 bit platform.

Here, is what has been mentioned in 'win32/GNUMakefile' in perl source
for version 5.24.1

# In VS 2005 (VC++ 8.0) Microsoft changes time_t from 32-bit to
# 64-bit, even in 32-bit mode.  It also provides the _USE_32BIT_TIME_T
# preprocessor option to revert back to the old functionality for
# backward compatibility.  We define this symbol here for older 32-bit
# compilers only (which aren't using it at all) for the sole purpose
# of getting it into $Config{ccflags}.  That way if someone builds
# Perl itself with e.g. VC6 but later installs an XS module using VC8
# the time_t types will still be compatible.
ifeq ($(WIN64),undef)
ifeq ((PREMSVC80),define)
BUILDOPT+= -D_USE_32BIT_TIME_T
endif
endif

>
>> The trouble with that is that _USE_32BIT_TIME_T also affects how
>> PostgreSQL code compiles.
>
> Really?  We try to avoid touching "time_t" at all in most of the code.
> I bet that we could drop the above-cited code, and compile only plperl
> with _USE_32BIT_TIME_T, taken (if present) from the Perl flags, and
> it'd be fine.  At least, that's my first instinct for what to try.
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-08-10 Thread Ashutosh Sharma

On Thu, Aug 10, 2017 at 1:55 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Aug 8, 2017 at 12:15 PM, Sandeep Thakkar
> <sandeep.thak...@enterprisedb.com> wrote:
>> I copied and pasted that portion of the build log into file build.log
>> (attached) for Windows 32bit and Windows 64bit.
>
> Can you also share the output of 'perl -V' on each system?
>
> Comparing the 32-bit log and the 64-bit log, I see the following differences:
>
> 32-bit has extra options /IC:\pgBuild32\uuid\include /Oy- /analyze- /D
> _USE_32BIT_TIME_T
> 64-bit has extra options /D WIN64 /D CONSERVATIVE /D USE_SITECUSTOMIZE
> Both builds have several copies of /IC:\pgBuild32\include or
> /IC:\pgBuild64\include, but the 64-bit build has an extra one
>
> (I wrote that command on Linux, might need different quoting to work
> on Windows.)
>
> I'm suspicious about _USE_32BIT_TIME_T.  The contents of a
> PerlInterpreter are defined in Perl's intrpvar.h, and one of those
> lines is:
>
> PERLVAR(I, basetime,Time_t) /* $^T */
>
> Now, Time_t is defined as time_t elsewhere in the Perl headers, so if
> the plperl build and the Perl interpreter build had different ideas
> about whether that flag was set, the interpreter sizes would be
> different.  Unfortunately for this theory, if I'm interpreting the
> screenshot you posted correctly, the sizes are different by exactly 16
> bytes, and I can't see how a difference in the size of time_t could
> account for more than 8 bytes (4 bytes of actual size change + 4 bytes
> of padding).
>

Okay. I too had a look into this issue and my findings are as follows,

The size of PerlInterpreter structure on plperl and perl module are
not the same. The size of PerlInterpreter on plperl is 2744 bytes
whereas on perl it is 2760 bytes i.e. there is the difference of 16
bytes and therefore the handshaking key generated in the two modules
are not the same resulting in a binary mismatch error. To analyse this
problem, I generated the preprocessed output of header file 'perl.h'
on plperl and perl modules and then filtered the contents of "struct
interpreter" from the preprocessed output files and  compared them
using some diff tool. The diff report is attached with this email.

As per the diff report, incase of  plperl module,  the structure
Stat_t is getting mapped to "_stat32i64" whereas incase of perl
module, the same structure is getting mapped to "_stat64" and this is
happening for couple of variables (statbuf & statcache) in intrpvar.h
file. However, if the preprocessor option '_USE_32BIT_TIME_T' is
explicitly passed to the VC compiler when compiling perl source, there
is no diff observed as in both the cases, 'Stat_t ' is gets mapped to
"stat32i64". So, now the question is, why ''_USE_32BIT_TIME_T'  flag
is not implicitly defined when compiling perl source on 32 bit
platform but for postgresql/plperl build it is being defined
implicitly on 32bit platform. I just got few hints from the Makefile,
in perl source code, where is has been mentioned that  the modern
Visual C compiler (VC++ 8.0 onwards) changes time_t from 32-bit to
64-bit, even in 32-bit mode hence, ''_USE_32BIT_TIME_T'   is net being
added to $Config{ccflags}. Here, is what i could see in
GNUMakefile/Makefile.mk in perl source code.

# In VS 2005 (VC++ 8.0) Microsoft changes time_t from 32-bit to
# 64-bit, even in 32-bit mode.  It also provides the _USE_32BIT_TIME_T
# preprocessor option to revert back to the old functionality for
# backward compatibility.  We define this symbol here for older 32-bit
# compilers only (which aren't using it at all) for the sole purpose
# of getting it into $Config{ccflags}.  That way if someone builds
# Perl itself with e.g. VC6 but later installs an XS module using VC8
# the time_t types will still be compatible.

ifeq ($(WIN64),undef)
ifeq ((PREMSVC80),define)
BUILDOPT += -D_USE_32BIT_TIME_T
endif
endif

So, the question is should we allow the preprocessor option
'_USE_32BIT_TIME_T' to be defined implicitly or not in postgresql
provided we are using Visual C compiler version > 8.0. I think this
should make plperl compatible with perl binaries.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
Title: Text Compare

Text Compare
Produced: 8/10/2017 1:44:37 PM

Mode:  Differences  

Left file: C:\Users\ashu\perl-5.24.1\perl_interpreter.out  

Right file: C:\Users\ashu\git-clone-postgresql\postgresql\plper_intrpreter.out  

struct _stat64 Istatbuf;
<>
struct _stat32i64 Istatbuf;

struct _stat64 Istatcache;              

struct _stat32i64 Istatcache;           

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] free space % calculation in pgstathashindex

2017-08-07 Thread Ashutosh Sharma

On Mon, Aug 7, 2017 at 7:19 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Mon, Aug 7, 2017 at 6:07 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>> Hi,
>>
> ..
>> In step #1, assuming '*' as an arithmetic operator, the left operand
>> i.e. 'stats.unused_pages' is of type uint32 whereas the right operand
>> i.e. 'stats.space_per_page' is of type int32 and arithmetic
>> conversions of the ANSI C standard states: 'if either operand has type
>> unsigned int, the other operand is converted to unsigned int' which
>> means stats.space_per_page would be converted to uint32 and then the
>> multiplication would be performed. But, if the result of
>> multiplication is greater than the max value accepted by 32 bits
>> number, we would get an incorrect result. Hence, it is important to do
>> proper typecasting of operands before performing any arithmetic
>> operation. In this case, i would typecast both stats.unused_pages and
>> stats.space_per_page to uint64 and the
>> perform multiplication on them. Similar handling needs to be done in
>> step #2 as well.
>>
>
> I think even if you have typecasted the first variable, it would have
> served the purpose.
>

Yes, that's right, typecasting the first variable can also prevent the
overflow problem but, then, if the operands are of two different
types, compiler will promote all to the largest or more precise type.
Therefore, even if we explicitly typecast it or not, compiler will
eventually be doing that. However, i have now just typecasted the
first value. Attached is the updated patch.

>   /* Count unused pages as free space. */
>
> - stats.free_space += stats.unused_pages * stats.space_per_page;
> + stats.free_space += ((uint64) stats.unused_pages *
> + (uint64) stats.space_per_page);
>
> Why an extra parenthesis in above case whereas not in below case?  I
> think the code will look consistent if you follow the same coding
> practice.  I suggest don't use it unless you need it.

That is because in the 1st case, there are multiple operators (*, +)
whereas in the 2nd case we have just one(*). So, just to ensure that
'*' is performed before '+',  i had used parenthesis, though it is not
required as '*' has higher precedence than '+'. I have removed the
extra parenthesis and attached is the new version of patch. Thanks.

>
>   /*
>   * Total space available for tuples excludes the metapage and the bitmap
>   * pages.
>   */
> - total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
> + total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
> + (uint64) stats.space_per_page;
>

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 44e322d..9365ba7 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -687,13 +687,14 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	index_close(rel, AccessShareLock);
 
 	/* Count unused pages as free space. */
-	stats.free_space += stats.unused_pages * stats.space_per_page;
+	stats.free_space += (uint64) stats.unused_pages * stats.space_per_page;
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
 	 * pages.
 	 */
-	total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
+	total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
+		stats.space_per_page;
 
 	if (total_space == 0)
 		free_percent = 0.0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] free space % calculation in pgstathashindex

2017-08-07 Thread Ashutosh Sharma

Hi,

While working on - [1]  (one of the bug reported for hash index), i
noticed that the percentage of free space shown in 'free_percent'
column of pgstathashindex() is incorrect for some of the boundary
cases. This is how the free space percentage is calculated by
pgstathashindex(),

1)  /* Count unused pages as free space. */
stats.free_space += stats.unused_pages * stats.space_per_page;

2)  /* Total space available for tuples excludes the metapage and the
bitmap pages */
total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;

3)  Calculate the percentage of free space in hash index table.
free_percent = 100.0 * stats.free_space / total_space;

In step #1, assuming '*' as an arithmetic operator, the left operand
i.e. 'stats.unused_pages' is of type uint32 whereas the right operand
i.e. 'stats.space_per_page' is of type int32 and arithmetic
conversions of the ANSI C standard states: 'if either operand has type
unsigned int, the other operand is converted to unsigned int' which
means stats.space_per_page would be converted to uint32 and then the
multiplication would be performed. But, if the result of
multiplication is greater than the max value accepted by 32 bits
number, we would get an incorrect result. Hence, it is important to do
proper typecasting of operands before performing any arithmetic
operation. In this case, i would typecast both stats.unused_pages and
stats.space_per_page to uint64 and the
perform multiplication on them. Similar handling needs to be done in
step #2 as well.

Attached is the patch that fixes this.

[1] - 
https://www.postgresql.org/message-id/20170704105728.mwb72jebfmok2nm2%40zip.com.au
[2] - 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka9483.html
commit a34659959fe7385e68d196e57e90fe92b12764d4
Author: ashu <ashutosh1...@example.com>
Date:   Mon Aug 7 11:18:27 2017 +0530

Add proper typecasting to the operands when doing arithmatic operations.

Patch by Ashutosh Sharma.

diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 44e322d..4e363cd 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -687,13 +687,15 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	index_close(rel, AccessShareLock);
 
 	/* Count unused pages as free space. */
-	stats.free_space += stats.unused_pages * stats.space_per_page;
+	stats.free_space += ((uint64) stats.unused_pages *
+		 (uint64) stats.space_per_page);
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
 	 * pages.
 	 */
-	total_space = (nblocks - (stats.bitmap_pages + 1)) * stats.space_per_page;
+	total_space = (uint64) (nblocks - (stats.bitmap_pages + 1)) *
+		(uint64) stats.space_per_page;
 
 	if (total_space == 0)
 		free_percent = 0.0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-08-07 Thread Ashutosh Sharma

On Fri, Aug 4, 2017 at 7:14 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Sun, Jul 30, 2017 at 2:07 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Hi,
>>
>> On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>> While doing the code coverage testing of v7 patch shared with - [1], I
>>> found that there are few lines of code in _hash_next() that are
>>> redundant and needs to be removed. I basically came to know this while
>>> testing the scenario where a hash index scan starts when a split is in
>>> progress. I have removed those lines and attached is the newer version
>>> of patch.
>>>
>>
>> Please find the new version of patches for page at a time scan in hash
>> index rebased on top of latest commit in master branch. Also, i have
>> ran pgindent script with pg_bsd_indent version 2.0 on all the modified
>> files. Thanks.
>>
>
> Few comments:

Thanks for reviewing the patch.

> 1.
> +_hash_kill_items(IndexScanDesc scan, bool havePin)
>
> I think you can do without the second parameter.  Can't we detect
> inside _hash_kill_items whether the page is pinned or not as we do for
> btree?

Okay, done that way. Please check attached v10 patch.

>
> 2.
> + /*
> + * We remember prev and next block number along with current block
> + * number so that if fetching the tup- les using cursor we know
> + * the page from where to startwith. This is for the case where we
> + * have re- ached the end of bucket chain without finding any
> + * matching tuples.
>
> The spelling of tuples and reached contain some unwanted symbol.  Have
> space between "startwith" or just use "begin"

Corrected.

>
> 3.
> + if (!BlockNumberIsValid((opaque)->hasho_nextblkno))
> + {
> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
> + so->currPos.nextPage = InvalidBlockNumber;
> + }
>
> There is no need to use Parentheses around opaque.  I mean there is no
> problem with that, but it is redundant and makes code less readable.
> Also, there is similar usage at other places in the code, please
> change all another place as well.

Removed parenthesis around opaque.

 I think you can save the value of
> prevblkno in a local variable and use it in else part.

Okay, done that way.

>
> 4.
> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
> + _hash_checkqual(scan, itup))
> + {
> + /* tuple is qualified, so remember it */
> + _hash_saveitem(so, itemIndex, offnum, itup);
> + itemIndex++;
> + }
> + else
> +
> + /*
> + * No more matching tuples exist in this page. so, exit while
> + * loop.
> + */
> + break;
>
> It is better to have braces for the else part. It makes code look
> better.  The type of code exists few line down as well, change that as
> well.

Added braces in the else part.

>
> 5.
> + /*
> + * We save the LSN of the page as we read it, so that we know whether it
> + * safe to apply LP_DEAD hints to the page later.
> + */
>
> "whether it safe"/"whether it is safe"

Corrected.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 59e9a5f5afc31a3d14ae39bf5ae0cf21ee42f624 Mon Sep 17 00:00:00 2001
From: ashu <ashutosh1...@example.com>
Date: Mon, 7 Aug 2017 16:06:52 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.

Patch by Ashutosh Sharma <ashu.coe...@gmail.com>
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 153 +++-
 src/backend/access/hash/hashpage.c   |  10 +-
 src/backend/access/hash/hashsearch.c | 446 +++
 src/backend/access/hash/hashutil.c   |  71 +-
 src/include/access/hash.h|  55 -
 6 files changed, 570 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..eef7d66 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
@@ -270,15 +271,13 @@ Holding the buf

[HACKERS] Gettting warning message during PostgreSQL-9.5 installation on Windows

2017-08-01 Thread Ashutosh Sharma

Hi,

I am getting this warning message when trying to install
PostgreSQL-v9.5 on Windows with Perl-5.22 and above,

Unescaped left brace in regex is deprecated, passed through in regex;
marked by <-- HERE in m/Project\("{ <-- HERE 8BC9CEB8-8B4A-11D0-8D11
00A0C91BC942}"\) = "([^"]+)"/ at Install.pm line 220. Installing
version 9.5 for debug in
C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test

Attached is the patch that fixes this issue.

Please note that from perl-5.26 onwards, this is considered as a
syntax error instead of warning.

Also, I could see that, there is already a git commit in
PostgreSQL-9.6 branch that fixes this but was not back patch to 9.5
and may be below branches.

commit 76a1c97bf21c301f61bb41345e0cdd0d365b2086
Author: Andrew Dunstan <and...@dunslane.net>
Date:   Fri Apr 8 12:50:30 2016 -0400

Silence warning from modern perl about unescaped braces


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


Silence_warning_unescaped_braces.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-31 Thread Ashutosh Sharma

Hi Christoph,

On Mon, Jul 31, 2017 at 9:18 AM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> Hi Christoph,
>
> On Mon, Jul 31, 2017 at 2:44 AM, Christoph Berg <m...@debian.org> wrote:
>> Re: Tom Lane 2017-07-28 <3254.1501276...@sss.pgh.pa.us>
>>> Christoph Berg <m...@debian.org> writes:
>>> > The plperl segfault on Debian's kfreebsd port I reported back in 2013
>>> > is also still present:
>>> > https://www.postgresql.org/message-id/20130515064201.GC704%40msgid.df7cb.de
>>> > https://buildd.debian.org/status/fetch.php?pkg=postgresql-10=kfreebsd-amd64=10~beta2-1=1499947011=0
>>>
>>> So it'd be interesting to know if it's any better with HEAD ...
>>
>> Unfortunately not:
>>
>> == creating database "pl_regression"  ==
>> CREATE DATABASE
>> ALTER DATABASE
>> == installing plperl  ==
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> connection to server was lost
>>
>> The only interesting line in log/postmaster.log is a log_line_prefix-less
>> Util.c: loadable library and perl binaries are mismatched (got handshake key 
>> 0xd500080, needed 0xd600080)
>> ... which is unchanged from the beta2 output.
>

It seems like you are observing this crash on kFreeBSD OS. Well, me or
any of my colleague is not having this OS hence, i can't comment on
that. But, we do have Ubuntu OS (another Debian based OS) and I am not
seeing any server crash here as well,

edb@ubuntu:~/PGsources/postgresql/src/pl/plperl$ make check
make -C ../../../src/test/regress pg_regress
.
/bin/mkdir -p '/home/edb/PGsources/postgresql'/tmp_install/log
make -C '../../..'
DESTDIR='/home/edb/PGsources/postgresql'/tmp_install install
>'/home/edb/PGsources/postgresql'/tmp_install/log/install.log 2>&1
PATH="/home/edb/PGsources/postgresql/tmp_install/home/edb/PGsources/postgresql/inst/bin:$PATH"
LD_LIBRARY_PATH="/home/edb/PGsources/postgresql/tmp_install/home/edb/PGsources/postgresql/inst/lib:$LD_LIBRARY_PATH"
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --dbname=pl_regression
--load-extension=plperl  --load-extension=plperlu plperl plperl_lc
plperl_trigger plperl_shared plperl_elog plperl_util plperl_init
plperlu plperl_array plperl_plperlu
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 50848 with PID 43441
== creating database "pl_regression"  ==
CREATE DATABASE
ALTER DATABASE
== installing plperl  ==
CREATE EXTENSION
== installing plperlu ==
CREATE EXTENSION
== running regression test queries==
test plperl   ... ok
test plperl_lc... ok
test plperl_trigger   ... ok
test plperl_shared... ok
test plperl_elog  ... ok
test plperl_util  ... ok
test plperl_init  ... ok
test plperlu  ... ok
test plperl_array ... ok
test plperl_plperlu   ... ok
== shutting down postmaster   ==
== removing temporary instance==


As i mentioned in my earlier email, could you please delete the Utilc.
and SPI.c files from src/pl/plperl directory, rebuild the source code
and then let us know the results. Thanks.

>
> I am not able to reproduce this issue on my Windows or Linux box. Have
> you deleted Util.c and SPI.c files before starting with the build? The
> point is, these files are generated during build time and if they
> already exist then i think. they are not re generated. I would suggest
> to delete both the .c files and rebuild your source and then give a
> try.
>
> Here are the results i got on Windows and Linux respectively on HEAD,
>
> On Windows:
>
> C:\Users\ashu\git-clone-postgresql\postgresql\src\tools\msvc>vcregress.bat
> PLCHECK
> 
> Checking plperl
> (using postmaster on localhost, default port)
> == dropping database "pl_regression"  ==
> NOTICE:  database "pl_regression" does not exist, skipping
> DROP DATABASE
> == creating database "pl_regression"  ==
> CREATE DATABASE
> ALTER DATABASE
> == installing plperl

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-30 Thread Ashutosh Sharma

Hi Christoph,

On Mon, Jul 31, 2017 at 2:44 AM, Christoph Berg <m...@debian.org> wrote:
> Re: Tom Lane 2017-07-28 <3254.1501276...@sss.pgh.pa.us>
>> Christoph Berg <m...@debian.org> writes:
>> > The plperl segfault on Debian's kfreebsd port I reported back in 2013
>> > is also still present:
>> > https://www.postgresql.org/message-id/20130515064201.GC704%40msgid.df7cb.de
>> > https://buildd.debian.org/status/fetch.php?pkg=postgresql-10=kfreebsd-amd64=10~beta2-1=1499947011=0
>>
>> So it'd be interesting to know if it's any better with HEAD ...
>
> Unfortunately not:
>
> == creating database "pl_regression"  ==
> CREATE DATABASE
> ALTER DATABASE
> == installing plperl  ==
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
> The only interesting line in log/postmaster.log is a log_line_prefix-less
> Util.c: loadable library and perl binaries are mismatched (got handshake key 
> 0xd500080, needed 0xd600080)
> ... which is unchanged from the beta2 output.

I am not able to reproduce this issue on my Windows or Linux box. Have
you deleted Util.c and SPI.c files before starting with the build? The
point is, these files are generated during build time and if they
already exist then i think. they are not re generated. I would suggest
to delete both the .c files and rebuild your source and then give a
try.

Here are the results i got on Windows and Linux respectively on HEAD,

On Windows:

C:\Users\ashu\git-clone-postgresql\postgresql\src\tools\msvc>vcregress.bat
PLCHECK

Checking plperl
(using postmaster on localhost, default port)
== dropping database "pl_regression"  ==
NOTICE:  database "pl_regression" does not exist, skipping
DROP DATABASE
== creating database "pl_regression"  ==
CREATE DATABASE
ALTER DATABASE
== installing plperl  ==
CREATE EXTENSION
== installing plperlu ==
CREATE EXTENSION
== running regression test queries==
test plperl   ... ok
test plperl_lc... ok
test plperl_trigger   ... ok
test plperl_shared... ok
test plperl_elog  ... ok
test plperl_util  ... ok
test plperl_init  ... ok
test plperlu  ... ok
test plperl_array ... ok
test plperl_plperlu   ... ok

==
 All 10 tests passed.
==

On Linux:

LD_LIBRARY_PATH="/home/ashu/git-clone-postgresql/postgresql/tmp_install/home/ashu/git-clone-postgresql/postgresql/TMP/postgres/lib"
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --dbname=pl_regression
--load-extension=plperl  --load-extension=plperlu plperl plperl_lc
plperl_trigger plperl_shared plperl_elog plperl_util plperl_init
plperlu plperl_array
== creating temporary instance==
== initializing database system   ==
== starting postmaster==
running on port 50848 with PID 18140
== creating database "pl_regression"  ==
CREATE DATABASE
ALTER DATABASE
== installing plperl  ==
CREATE EXTENSION
== installing plperlu ==
CREATE EXTENSION
== running regression test queries==
test plperl   ... ok
test plperl_lc... ok
test plperl_trigger   ... ok
test plperl_shared... ok
test plperl_elog  ... ok
test plperl_util  ... ok
test plperl_init  ... ok
test plperlu  ... ok
test plperl_array ... ok
== shutting down postmaster   ==
== removing temporary instance==

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-07-30 Thread Ashutosh Sharma

Hi,

On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> While doing the code coverage testing of v7 patch shared with - [1], I
> found that there are few lines of code in _hash_next() that are
> redundant and needs to be removed. I basically came to know this while
> testing the scenario where a hash index scan starts when a split is in
> progress. I have removed those lines and attached is the newer version
> of patch.
>

Please find the new version of patches for page at a time scan in hash
index rebased on top of latest commit in master branch. Also, i have
ran pgindent script with pg_bsd_indent version 2.0 on all the modified
files. Thanks.

> [1] - 
> https://www.postgresql.org/message-id/CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg%40mail.gmail.com
>
> On Mon, May 8, 2017 at 6:55 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>> Hi,
>>
>> On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>>> wrote:
>>>>
>>>> Please note that these patches needs to be applied on top of  [1].
>>>>
>>>
>>> Few more review comments:
>>>
>>> 1.
>>> - page = BufferGetPage(so->hashso_curbuf);
>>> + blkno = so->currPos.currPage;
>>> + if (so->hashso_bucket_buf == so->currPos.buf)
>>> + {
>>> + buf = so->currPos.buf;
>>> + LockBuffer(buf, BUFFER_LOCK_SHARE);
>>> + page = BufferGetPage(buf);
>>> + }
>>>
>>> Here, you are assuming that only bucket page can be pinned, but I
>>> think that assumption is not right.  When _hash_kill_items() is called
>>> before moving to next page, there could be a pin on the overflow page.
>>> You need some logic to check if the buffer is pinned, then just lock
>>> it.  I think once you do that, it might not be convinient to release
>>> the pin at the end of this function.
>>
>> Yes, there are few cases where we might have pin on overflow pages too
>> and we need to handle such cases in _hash_kill_items. I have taken
>> care of this in the attached v7 patch. Thanks.
>>
>>>
>>> Add some comments on top of _hash_kill_items to explain the new
>>> changes or say some thing like "See _bt_killitems for details"
>>
>> Added some more comments on the new changes.
>>
>>>
>>> 2.
>>> + /*
>>> + * We save the LSN of the page as we read it, so that we know whether it
>>> + * safe to apply LP_DEAD hints to the page later.  This allows us to drop
>>> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
>>> + */
>>> + so->currPos.lsn = PageGetLSN(page);
>>> +
>>>
>>> The second part of above comment doesn't make sense because vacuum's
>>> will still be blocked because we hold the pin on primary bucket page.
>>
>> That's right. It doesn't make sense because we won't allow vacuum to
>> start. I have removed it.
>>
>>>
>>> 3.
>>> + {
>>> + /*
>>> + * No more matching tuples were found. return FALSE
>>> + * indicating the same. Also, remember the prev and
>>> + * next block number so that if fetching tuples using
>>> + * cursor we remember the page from where to start the
>>> + * scan.
>>> + */
>>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>>
>>> You can't read opaque without having lock and by this time it has
>>> already been released.
>>
>> I have corrected it. Please refer to the attached v7 patch.
>>
>> Also, I think if you want to save position for
>>> cursor movement, then you need to save the position of last bucket
>>> when scan completes the overflow chain, however as you have written it
>>> will be always invalid block number. I think there is similar problem
>>> with prevblock number.
>>
>> Did you mean last bucket or last page. If it is last page, then I am
>> already storing it.
>>
>>>
>>> 4.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
>>> offnum,
>>> +   OffsetNumber maxoff, ScanDirection dir)
>>> +{
>>> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
>>> + IndexTuple  itup;
>>> + int itemIndex;
>>> +
>>> + if (ScanDirectionIsForward(dir))
>>> + {
>>> + /* load item

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-28 Thread Ashutosh Sharma

On Sat, Jul 29, 2017 at 12:05 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ashutosh Sharma <ashu.coe...@gmail.com> writes:
>> On Fri, Jul 28, 2017 at 10:05 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Uh-huh.  So the issue is indeed that they're injecting PERL_IMPLICIT_SYS
>>> via a command-line #define rather than putting it into perl's config.h,
>>> and that results in a change in the apparent size of the PerlInterp
>>> struct (because of IMem and friends).
>
>> Yes, That's right. We would have seen different result if the
>> PERL_IMPLICIT_SYS would not have been defined globally.
>
> I've pushed the changes to the build scripts now.  I had to revise the
> Mkvcbuild.pm part a bit, because you'd forgotten to propagate the extra
> defines into hstore_plperl.

Thanks for that.

So that code is untested; please confirm
> that HEAD works in your problem environment now.
>

I quickly ran the some basic test-cases on my Windows machine (machine
where i have been regenerating the issue) and they are all passing.
Also, all the automated test-cases for contrib module hstore_plperl
are passing.

> I notice that Mkvcbuild.pm is artificially injecting a define for
> PLPERL_HAVE_UID_GID.  I strongly suspect that that was a hack meant
> to work around the lack of this mechanism, and that we could now
> get rid of it or clean it up.  But there's no evidence in the commit
> log about the reason for it --- it seems to go back to the original
> addition of MSVC support.  Anybody remember?
>
> regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-28 Thread Ashutosh Sharma

On Fri, Jul 28, 2017 at 10:05 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ashutosh Sharma <ashu.coe...@gmail.com> writes:
>> On Fri, Jul 28, 2017 at 7:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Assuming that the Perl crew know what they're doing and this list is
>>> complete, I notice that not one of the symbols they show as relevant
>>> starts with an underscore.  So I'm thinking that my previous semi-
>>> joking idea of absorbing only -D switches for names that *don't*
>>> start with an underscore was actually a good solution.
>
>> Okay, as per your suggestion, I have modified my earlier patches that
>> imports the -D switches used by perl into plperl accordingly i.e. it
>> now ignores the switches whose name starts with underscore. Please
>> find plperl_win_v3 and plperl_linux_v2 patches attached with this
>> email.
>
> OK, thanks.  I've pushed the XSUB/dTHX patch after another round of
> code-reading and some minor comment improvements; we'll soon see
> what the buildfarm thinks of it.  In the meantime I'll work on these
> two patches.

Sure, Thanks a lot.

>
>>> (BTW, you never did tell us exactly what defines you're getting
>>> out of Perl's flags on the problem installation.)
>
>> I am really sorry about that. I just missed that in my earlier email.
>> Here are the defines used in the perl where i could reproduce the
>> issue,
>
>> C:\Users\ashu>perl -MConfig -e "print $Config{ccflags}"
>> -nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -GL -fp:precise -DWIN32
>> -D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE  -DUSE_SITECUSTOMIZE
>> -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS
>
> Uh-huh.  So the issue is indeed that they're injecting PERL_IMPLICIT_SYS
> via a command-line #define rather than putting it into perl's config.h,
> and that results in a change in the apparent size of the PerlInterp
> struct (because of IMem and friends).

Yes, That's right. We would have seen different result if the
PERL_IMPLICIT_SYS would not have been defined globally.

We'd expected as much, but it's
> good to have clear confirmation.

That's right. It is always good to have a clear confirmation. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-28 Thread Ashutosh Sharma

On Fri, Jul 28, 2017 at 7:22 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ashutosh Sharma <ashu.coe...@gmail.com> writes:
>> Thanks for the patch. I am seeing some compilation errors on Windows
>> with the patch. Below are the errors observed,
>> ...
>> I did spent some time to find reason for these compilation errors and
>> could eventually find that some of the Windows specific functions
>> inside plperl module are calling Perl APIs without fetching the perl
>> interpreter's context using dTHX.
>
> Ah, thanks.  I just stuck that in where compiler errors were telling
> me to, so I didn't realize there were Windows-specific functions to
> worry about.
>
>> Moreover, I have also tested this patch along with the patch to import
>> switches used by perl into plperl and together it fixes the server
>> crash issue. Also, now, the interpreter size in both perl and plperl
>> module are the same thereby generating the same key on both plperl and
>> perl module. Thanks.
>
> Excellent.  So this looks like the avenue to pursue.
>
> As far as the question of which symbols to import goes: on my own
> machines I'm seeing a lot of things like
>
> $ perl -MConfig -e 'print $Config{ccflags}'
> -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector 
> -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
>
> $ perl -MConfig -e 'print $Config{ccflags}'
>  -Ae -D_HPUX_SOURCE -Wl,+vnocompatwarnings -DDEBUGGING -I/usr/local/include 
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
>
> I'm really quite afraid to import symbols like _LARGEFILE_SOURCE and
> _FILE_OFFSET_BITS into plperl; those *will* break things if they
> are different from what core Postgres is built with.  Moreover,
> I came across a relevant data structure in perl.h:
>
> /* These are all the compile time options that affect binary compatibility.
>Other compile time options that are binary compatible are in perl.c
>Both are combined for the output of perl -V
>However, this string will be embedded in any shared perl library, which 
> will
>allow us add a comparison check in perlmain.c in the near future.  */
> #ifdef DOINIT
> EXTCONST char PL_bincompat_options[] =
> #  ifdef DEBUG_LEAKING_SCALARS
>  " DEBUG_LEAKING_SCALARS"
> #  endif
> #  ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP
>  " DEBUG_LEAKING_SCALARS_FORK_DUMP"
> #  endif
> #  ifdef FAKE_THREADS
>  " FAKE_THREADS"
> #  endif
> #  ifdef MULTIPLICITY
>  " MULTIPLICITY"
> #  endif
> ... lots more ...

Thanks for sharing this information. I too had a look into
'PL_bincompat_options' data structure in perl.h and i didn't see any
macro name starting with underscore. Based on this information, we can
now confidently say that we do not need any -D switches starting with
underscore for binary compatibility purpose.

>
> Assuming that the Perl crew know what they're doing and this list is
> complete, I notice that not one of the symbols they show as relevant
> starts with an underscore.  So I'm thinking that my previous semi-
> joking idea of absorbing only -D switches for names that *don't*
> start with an underscore was actually a good solution.

Yes, it was a good solution indeed.

If that
> turns out to not be enough of a filter, we could consider looking
> into perl.h to extract the set of symbols tested in building
> PL_bincompat_options and then intersecting that with what we get
> from Perl's ccflags.  But that would be a lot more complex, so
> I'd rather go with the simpler filter rule for now.

Okay, as per your suggestion, I have modified my earlier patches that
imports the -D switches used by perl into plperl accordingly i.e. it
now ignores the switches whose name starts with underscore. Please
find plperl_win_v3 and plperl_linux_v2 patches attached with this
email.

>
> (BTW, you never did tell us exactly what defines you're getting
> out of Perl's flags on the problem installation.)

I am really sorry about that. I just missed that in my earlier email.
Here are the defines used in the perl where i could reproduce the
issue,

C:\Users\ashu>perl -MConfig -e "print $Config{ccflags}"
-nologo -GF -W3 -O1 -MD -Zi -DNDEBUG -GL -fp:precise -DWIN32
-D_CONSOLE -DNO_STRICT -DWIN64 -DCONSERVATIVE  -DUSE_SITECUSTOMIZE
-DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index e0bf607..f0ada1b 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -518,6 +518,16 @@ sub mkvcbuil

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-28 Thread Ashutosh Sharma

Hi,

On Fri, Jul 28, 2017 at 4:20 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes:
> > On 07/27/2017 04:33 PM, Tom Lane wrote:
> >> So I was trying to figure a way to not include XSUB.h except in a very
> >> limited part of plperl, like ideally just the .xs files.  It's looking
> >> like that would take nontrivial refactoring though :-(.  Another problem
> >> is that this critical bit of the library API is in XSUB.h:
>
> > That's the sort of thing that prompted me to ask what was the minimal
> > set of defines required to fix the original problem (assuming such a
> > thing exists)
> > We haven't used PERL_IMPLICIT_CONTEXT to date, and without ill effect.
> > For example. it's in the ExtUtils::Embed::ccopts for the perl that
> > jacana and bowerbird happily build and test against.
>
> Well, actually, PERL_IMPLICIT_CONTEXT is turned on automatically in any
> MULTIPLICITY build, and since it changes all the Perl ABIs, we *are*
> relying on it.  However, after further study of the Perl docs I noticed
> that we could dispense with XSUB.h if we defined PERL_NO_GET_CONTEXT
> (which turns the quoted stanza into a no-op).  That results in needing to
> sprinkle plperl.c with "dTHX" declarations, as explained in perlguts.pod.
> They're slightly tricky to place correctly, because they load up a pointer
> to the current Perl interpreter, so you have to be wary of where to put
> them in functions that change interpreters.  But I seem to have it working
> in the attached patch.  (One benefit of doing this extra work is that it
> should be a bit more efficient, in that we load up a Perl interpreter
> pointer only once per function called, not once per usage therein.  We
> could remove many of those fetches too, if we wanted to sprinkle the
> plperl code with yet more THX droppings; but I left that for another day.)
>
> Armed with that, I got rid of XSUB.h in plperl.c and moved the
> PG_TRY-using functions in the .xs files to plperl.c.  I think this would
> fix Ashutosh's problem, though I am not in a position to try it with a
> PERL_IMPLICIT_SYS build here.

Thanks for the patch. I am seeing some compilation errors on Windows
with the patch. Below are the errors observed,

1>ClCompile:
1>  plperl.c
1>src/pl/plperl/plperl.c(4051): error C2065: 'my_perl' : undeclared identifier

2>ClCompile:
2>  hstore_plperl.c
2>contrib/hstore_plperl/hstore_plperl.c(77): error C2065: 'my_perl' :
undeclared identifier

I did spent some time to find reason for these compilation errors and
could eventually find that some of the Windows specific functions
inside plperl module are calling Perl APIs without fetching the perl
interpreter's context using dTHX. Please note that, we have now
defined PERL_NO_GET_CONTEXT macro before including the Perl headers in
plperl.h file which means any plperl function calling Perl APIs should
try to fetch the perl interpreter context variable 'my_perl' at the
start of the function using dTHX but, we were not doing that for
'setlocale_perl', 'hstore_to_plperl' and 'plperl_to_hstore' functions.
I have corrected this and attached is the new version of patch.

Moreover, I have also tested this patch along with the patch to import
switches used by perl into plperl and together it fixes the server
crash issue. Also, now, the interpreter size in both perl and plperl
module are the same thereby generating the same key on both plperl and
perl module. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


avoid-XSUB.h-in-plperl.c-v2.patch
Description: Binary data


plperl_win_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-27 Thread Ashutosh Sharma

On Thu, Jul 27, 2017 at 7:51 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ashutosh Sharma <ashu.coe...@gmail.com> writes:
>> Anyways, attached is the patch that corrects this issue. The patch now
>> imports all the switches used by perl into plperl module but, after
>> doing so, i am seeing some compilation errors on Windows. Following is
>> the error observed,
>
>> SPI.obj : error LNK2019: unresolved external symbol PerlProc_setjmp
>> referenced in function do_plperl_return_next
>
> That's certainly a mess, but how come that wasn't happening before?

Earlier we were using Perl-5.20 version which i think didn't have
handshaking mechanism. From perl-5.22 onwards, the functions like
Perl_xs_handshake() or HS_KEY were introduced for handshaking purpose and
to ensure that the handshaking between plperl and perl doesn't fail, we are
now trying to import the switches used by  perl into plperl. As a result of
this, macros like PERL_IMPLICIT_SYS is getting defined in plperl which
eventually opens the following definitions from XSUB.h resulting in the
compilation error.

   499 #if defined(PERL_IMPLICIT_SYS) && !defined(PERL_CORE)
518 #undef ioctl
519 #undef getlogin
520* #undef setjmp*
...
...

651 #define times   PerlProc_times
652 #define waitPerlProc_wait
653

*#define setjmp  PerlProc_setjmp*
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
>
> regards, tom lane

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-27 Thread Ashutosh Sharma

Hi All,

On Wed, Jul 26, 2017 at 7:56 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> On Wed, Jul 26, 2017 at 8:51 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmh...@gmail.com> writes:
>>> Based on discussion downthread, it seems like what we actually need to
>>> do is update perl.m4 to extract CCFLAGS.  Turns out somebody proposed
>>> a patch for that back in 2002:
>>> https://www.postgresql.org/message-id/Pine.LNX.4.44.0211051045070.16317-20%40wotan.suse.de
>>> It seems to need a rebase.  :-)
>>
>> Ah-hah, I *thought* we had considered the question once upon a time.
>> There were some pretty substantial compatibility concerns raised in that
>> thread, which is doubtless why it's still like that.
>>
>> My beef about inter-compiler compatibility (if building PG with a
>> different compiler from that used for Perl) could probably be addressed by
>> absorbing only -D switches from the Perl flags.  But Peter seemed to feel
>> that even that could break things, and I worry that he's right for cases
>> like -D_FILE_OFFSET_BITS which affect libc APIs.  Usually we'd have made
>> the same decisions as Perl for that sort of thing, but if we didn't, it's
>> a mess.
>>
>> I wonder whether we could adopt some rule like "absorb -D switches
>> for macros whose names do not begin with an underscore".  That's
>> surely a hack and three-quarters, but it seems safer than just
>> absorbing everything willy-nilly.
>>
>
> Thanks Robert, Tom, Andrew and Amit for all your inputs. I have tried
> to work on the patch shared by Reinhard  long time back for Linux. I
> had to rebase the patch and also had to do add some more lines of code
> to make it work on Linux. For Windows, I had to prepare a separate
> patch to replicate similar behaviour.  I can see that both these
> patches are working as expected i.e they are able import the switches
> used by Perl into plperl module during build time.  However, on
> windows i am still seeing the crash and i am still working to find the
> reason for this.  Here, I attach the patches that i have prepared for
> linux and Windows platforms. Thanks.

There was a small problem with my patch for windows that i had
submitted yesterday. It was reading the switches in '-D*' form and
including those as it is in the plperl build. Infact, it should be
removing '-D' option (from say -DPERL_CORE) and just add the macro
name (PERL_CORE) to the define list using AddDefine('PERL_CORE').
Anyways, attached is the patch that corrects this issue. The patch now
imports all the switches used by perl into plperl module but, after
doing so, i am seeing some compilation errors on Windows. Following is
the error observed,

SPI.obj : error LNK2019: unresolved external symbol PerlProc_setjmp
referenced in function do_plperl_return_next

I did some analysis in order to find the reason for this error and
could see that for Windows, sigsetjmp is defined as setjmp in PG. But,
setjmp is also defined by Perl. Hence, after including perl header
files, setjmp gets redefined as 'PerlProc_setjmp'.

In short, we have 'src/pl/plperl/SPI.c' file including 'perl.h'
followed 'XSUB.h'. perl.h defines PerlProc_setjmp() as,

#define PerlProc_setjmp(b, n) Sigsetjmp((b), (n))

and Sigsetjmp is defined as:

#define Sigsetjmp(buf,save_mask) setjmp((buf))

Then XSUB.h redefines setjmp as:

# define setjmp PerlProc_setjmp

which basically creates a loop in the preprocessor definitions.

Another problem with setjmp() redefinition is that setjmp takes one
argument whereas PerlProc_setjmp takes two arguments.

To fix this compilation error i have removed the setjmp() redefinition
from XSUB.h in perl and after doing that the build succeeds and also
'create language plperl' command is working fine i.e. there is no more
server crash.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

plperl_win_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-26 Thread Ashutosh Sharma

On Wed, Jul 26, 2017 at 8:51 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Based on discussion downthread, it seems like what we actually need to
>> do is update perl.m4 to extract CCFLAGS.  Turns out somebody proposed
>> a patch for that back in 2002:
>> https://www.postgresql.org/message-id/Pine.LNX.4.44.0211051045070.16317-20%40wotan.suse.de
>> It seems to need a rebase.  :-)
>
> Ah-hah, I *thought* we had considered the question once upon a time.
> There were some pretty substantial compatibility concerns raised in that
> thread, which is doubtless why it's still like that.
>
> My beef about inter-compiler compatibility (if building PG with a
> different compiler from that used for Perl) could probably be addressed by
> absorbing only -D switches from the Perl flags.  But Peter seemed to feel
> that even that could break things, and I worry that he's right for cases
> like -D_FILE_OFFSET_BITS which affect libc APIs.  Usually we'd have made
> the same decisions as Perl for that sort of thing, but if we didn't, it's
> a mess.
>
> I wonder whether we could adopt some rule like "absorb -D switches
> for macros whose names do not begin with an underscore".  That's
> surely a hack and three-quarters, but it seems safer than just
> absorbing everything willy-nilly.
>

Thanks Robert, Tom, Andrew and Amit for all your inputs. I have tried
to work on the patch shared by Reinhard  long time back for Linux. I
had to rebase the patch and also had to do add some more lines of code
to make it work on Linux. For Windows, I had to prepare a separate
patch to replicate similar behaviour.  I can see that both these
patches are working as expected i.e they are able import the switches
used by Perl into plperl module during build time.  However, on
windows i am still seeing the crash and i am still working to find the
reason for this.  Here, I attach the patches that i have prepared for
linux and Windows platforms. Thanks.


plperl_linux.patch
Description: Binary data


plperl_win.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2017-07-20 Thread Ashutosh Sharma

I had a quick look into this patch and also tested it and following
are my observations.

1) I am seeing a server crash when passing any non meaningful value
for t_infomask2 to heap_infomask_flags().

postgres=# SELECT heap_infomask_flags(2816, 3);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

Following is the backtrace,

(gdb) bt
#0  0x00d9c55b in pg_detoast_datum (datum=0x0) at fmgr.c:1833
#1  0x00b87374 in construct_md_array (elems=0x2ad74c0,
nulls=0x0, ndims=1, dims=0x7ffc0b0bbcd0, lbs=0x7ffc0b0bbcc0,
elmtype=25, elmlen=-1,
elmbyval=0 '\000', elmalign=105 'i') at arrayfuncs.c:3382
#2  0x00b8709f in construct_array (elems=0x2ad74c0, nelems=10,
elmtype=25, elmlen=-1, elmbyval=0 '\000', elmalign=105 'i') at
arrayfuncs.c:3316
#3  0x7fb8001603a5 in heap_infomask_flags (fcinfo=0x2ad3b88) at
heapfuncs.c:597
#4  0x0082f4cd in ExecInterpExpr (state=0x2ad3aa0,
econtext=0x2ad3750, isnull=0x7ffc0b0bbf67 "") at execExprInterp.c:672
#5  0x0088b832 in ExecEvalExprSwitchContext (state=0x2ad3aa0,
econtext=0x2ad3750, isNull=0x7ffc0b0bbf67 "")
at ../../../src/include/executor/executor.h:290
#6  0x0088b8e3 in ExecProject (projInfo=0x2ad3a98) at
../../../src/include/executor/executor.h:324
#7  0x0088bb89 in ExecResult (node=0x2ad36b8) at nodeResult.c:132
#8  0x008494fe in ExecProcNode (node=0x2ad36b8) at execProcnode.c:416
#9  0x0084125d in ExecutePlan (estate=0x2ad34a0,
planstate=0x2ad36b8, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001',
numberTuples=0, direction=ForwardScanDirection, dest=0x2ac0ae0,
execute_once=1 '\001') at execMain.c:1693
#10 0x0083d54b in standard_ExecutorRun (queryDesc=0x2a42880,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:362
#11 0x0083d253 in ExecutorRun (queryDesc=0x2a42880,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:305
#12 0x00b3dd8f in PortalRunSelect (portal=0x2ad1490, forward=1
'\001', count=0, dest=0x2ac0ae0) at pquery.c:932
#13 0x00b3d7e7 in PortalRun (portal=0x2ad1490,
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001',
dest=0x2ac0ae0, altdest=0x2ac0ae0,
completionTag=0x7ffc0b0bc2c0 "") at pquery.c:773
#14 0x00b31fe4 in exec_simple_query (query_string=0x2a9d9a0
"SELECT heap_infomask_flags(11008, , true);") at postgres.c:1099
#15 0x00b3a727 in PostgresMain (argc=1, argv=0x2a49eb0,
dbname=0x2a1d480 "postgres", username=0x2a49d18 "ashu") at
postgres.c:4090
#16 0x00a2cb3f in BackendRun (port=0x2a3e700) at postmaster.c:4357
#17 0x00a2bc63 in BackendStartup (port=0x2a3e700) at postmaster.c:4029
#18 0x00a248ab in ServerLoop () at postmaster.c:1753
#19 0x00a236a9 in PostmasterMain (argc=3, argv=0x2a1b2b0) at
postmaster.c:1361
#20 0x008d8054 in main (argc=3, argv=0x2a1b2b0) at main.c:228

2) I can see the documentation for heap_infomask(). But, I do not see
it being defined or used anywhere in the patch.

+ 
+  The heap_infomask function can be used to unpack the
+  recognised bits of the infomasks of heap tuples.
+ 

3) If show_combined flag is set to it's default value and a tuple is
frozen then may i know the reason for not showing it as frozen tuple
when t_infomask2
is passed as zero.

postgres=# SELECT heap_infomask_flags(2816, 0);
heap_infomask_flags
---
 {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID}
(1 row)

postgres=# SELECT heap_infomask_flags(2816, 1);
heap_infomask_flags

 {HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_XMIN_FROZEN}
(1 row)


4) I think, it would be better to use the same argument name for the
newly added function i.e heap_infomask_flags() in both documentation
and sql file. I am basically refering to 'include_combined' argument.
IF you see the function definition, the argument name used is
'include_combined' whereas in documentation you have mentioned
'show_combined'.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Jul 20, 2017 at 11:56 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Thu, Jul 20, 2017 at 3:13 PM, Julien Rouhaud <rjuju...@gmail.com> wrote:
>> On Thu, Jul 20, 2017 at 5:44 AM, Peter Geoghegan <p...@bowt.ie> wrote:
>>> On Wed, Jul 19, 2017 at 8:33 PM, Craig Ringer <cr...@2ndquadrant.com> wrote:
>>>> That's silly, so here's a patch to teach pageinspect how to decode 
>>>> infomasks
>>>> to a human reada

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Ashutosh Sharma

On Wed, Jul 19, 2017 at 9:42 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> Ashutosh Sharma <ashu.coe...@gmail.com> writes:
> > Actually the function used for generation of handshake Key i.e HS_KEYp()
> > considers 'sizeof(PerlInterpreter)' to generate the key and somehow the
> > sizeof PerlInterpreter is not uniform in plperl and perl modules incase of
> > Windows but on Linux it remains same in both the modules.
>
> Yipes.  So actually, this is catching a live ABI problem, which presumably
> we've escaped seeing bad effects from only through sheer good luck.
> I suppose that the discrepancies in the struct contents only occur after
> the last field that plperl happens to touch directly --- but that is
> unlikely to be true forever.
>
> > *typedef struct interpreter PerlInterpreter;struct interpreter {#  include
> > "intrpvar.h"};*
> > where intrpvar.h has different variables defined inside it and most of the
> > variables definition are protected with various macros. And there are some
> > macros that are just defined in perl but not in plperl module which means
> > the sizeof(PerlInterpreter) on the two modules are going to be different
> > and thereby resulting in a  different key.
>
> I imagine the route to a solution is to fix things so that the relevant
> macros are all defined correctly in both cases.  But why they aren't
> already is certainly an interesting question.  Have you identified just
> which fields are added or missing relative to what libperl thinks?

Here are the list of macros and variables from 'intrpvar.h' file that
are just defined in perl module but not in plperl on Windows,

#ifdef PERL_USES_PL_PIDSTATUS
PERLVAR(I, pidstatus,   HV *)   /* pid-to-status mappings for waitpid */
#endif

#ifdef PERL_SAWAMPERSAND
PERLVAR(I, sawampersand, U8)/* must save all match strings */
#endif

#ifdef FCRYPT
PERLVARI(I, cryptseen,  bool,   FALSE)  /* has fast crypt() been initialized? */
#else
/* One byte hole in the interpreter structure.  */
#endif

#ifdef USE_REENTRANT_API
PERLVAR(I, reentrant_buffer, REENTR *)  /* here we store the _r buffers */
#endif

#ifdef PERL_GLOBAL_STRUCT_PRIVATE
PERLVARI(I, my_cxt_keys, const char **, NULL) /* per-module array of
pointers to MY_CXT_KEY constants */
# endif

#ifdef DEBUG_LEAKING_SCALARS_FORK_DUMP
/* File descriptor to talk to the child which dumps scalars.  */
PERLVARI(I, dumper_fd,  int,-1)
#endif

#ifdef DEBUG_LEAKING_SCALARS
PERLVARI(I, sv_serial,  U32,0)  /* SV serial number, used in sv.c */
#endif

#ifdef PERL_TRACE_OPS
PERLVARA(I, op_exec_cnt, OP_max+2, UV)
#endif

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-19 Thread Ashutosh Sharma

On Thu, Jul 13, 2017 at 10:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Andrew Dunstan <andrew.duns...@2ndquadrant.com> writes:
>> It would be nice to get to the bottom of why we're getting a version
>> mismatch on Windows, since we're clearly not getting one on Linux.
>
> Yeah, that's what's bothering me: as long as that remains unexplained,
> I don't have any confidence that we're fixing the right thing.

Okay, I tried to explore on this a bit and my findings are as follows.

On Linux, the handshake key being generated in plperl code i.e. inside
XS_EXTERNAL() in Util.c (generated from Util.xs during build time) and perl
code (inside Perl_xs_handshake function) are same which means the following
condition inside Perl_xs_handshake() becomes true and therefore, there is
no mismatch error.

   got = INT2PTR(void*, (UV)(key & HSm_KEY_MATCH));
   need = (void *)(HS_KEY(FALSE, FALSE, "", "") & HSm_KEY_MATCH);
   if (UNLIKELY(got != need))
   goto bad_handshake;

However, on Windows, the handshake key generated in plperl code inside
XS_EXTERNAL() and in perl code i.e. inside Perl_xs_handshake() are
different thereby resulting in a mismatch error.

Actually the function used for generation of handshake Key i.e HS_KEYp()
considers 'sizeof(PerlInterpreter)' to generate the key and somehow the
sizeof PerlInterpreter is not uniform in plperl and perl modules incase of
Windows but on Linux it remains same in both the modules.

This is how PerlInterpreter is defined in Perl source,

*typedef struct interpreter PerlInterpreter;struct interpreter {#  include
"intrpvar.h"};*

where intrpvar.h has different variables defined inside it and most of the
variables definition are protected with various macros. And there are some
macros that are just defined in perl but not in plperl module which means
the sizeof(PerlInterpreter) on the two modules are going to be different
and thereby resulting in a  different key. But, then the point is, why no
such difference is observed on Linux. Well, as of now, i haven't found the
reason behind it and i am still investigating on it.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

>
> regards, tom lane

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-14 Thread Ashutosh Sharma

On Sat, Jul 15, 2017 at 8:20 AM, Amit Kapila  wrote:
> On Fri, Jul 14, 2017 at 6:02 PM, Michael Paquier
>  wrote:
>> (catching up finally with this thread)
>>
>> On Mon, Jul 10, 2017 at 11:57 AM, Kyotaro HORIGUCHI
>>  wrote:
>>> At Mon, 10 Jul 2017 14:58:13 +0530, Amit Kapila  
>>> wrote in 
>>>

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-14 Thread Ashutosh Sharma

On Thu, Jul 13, 2017 at 6:04 PM, Andrew Dunstan
<andrew.duns...@2ndquadrant.com> wrote:
>
>
> On 07/13/2017 08:08 AM, Ashutosh Sharma wrote:
>>
>> After doing some study, I could understand that Util.c is generated
>> from Util.xs by xsubpp compiler at build time. This is being done in
>> Mkvcbuild.pm file in postgres. If I manually replace
>> 'dXSBOOTARGSAPIVERCHK' macro with 'dXSBOOTARGSNOVERCHK' macro in
>> src/pl/plperl/Util.c, the things work fine. The diff is as follows,
>>
>> XS_EXTERNAL(boot_PostgreSQL__InServer__Util)
>> {
>> #if PERL_VERSION_LE(5, 21, 5)
>> dVAR; dXSARGS;
>> #else
>> -dVAR; dXSBOOTARGSAPIVERCHK;
>> +dVAR; dXSBOOTARGSNOVERCHK;
>>
>> I need to further investigate, but let me know if you have any ideas.
>
>
>
>
> Good job hunting this down!
>
>
> One suggestion I saw in a little googling was that we add this to the XS
> file after the inclusion of XSUB.h:
>
> #undef dXSBOOTARGSAPIVERCHK
> #define dXSBOOTARGSAPIVERCHK dXSBOOTARGSNOVERCHK
>

Thanks for that suggestion. It was really helpful. I think, the point
is, in XSUB.h, the macro 'dXSBOOTARGSXSAPIVERCHK' has been redefined
for novercheck but surprisingly it hasn't been done for
'dXSBOOTARGSAPIVERCHK' macro and that could be the reason why we want
to redefine it in the XS file after including XSUB.h file. Attached is
the patch that redefines 'dXSBOOTARGSAPIVERCHK' in Util.xs and SPI.xs
files. Please have a look into the attached patch and let me know your
comments. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


redefine_dXSBOOTARGSAPIVERCHK_macro.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-13 Thread Ashutosh Sharma

On Thu, Jul 13, 2017 at 12:01 AM, Andrew Dunstan
<andrew.duns...@2ndquadrant.com> wrote:
> On 07/12/2017 11:49 AM, Dave Page wrote:
>>
>>
>> Well crake is a Fedora box - and we have no problems on Linux, only on
>> Windows.
>>
>>
>
>
> Yeah, I have this on one of my Windows boxes, and haven't had time to
> get to the bottom of it yet ;-(
>

I could also see this problem in my Windows machine and I have spent
some time to analyse it.  Here are my findings:

The stacktrace where the crash happens is:

  perl524.dll!Perl_xs_handshake(const unsigned long key, void
*v_my_perl, const char * file, ...)  Line 5569 C
  plperl.dll!boot_PostgreSQL__InServer__Util(interpreter * my_perl, cv
* cv)  Line 490 + 0x22 bytes C
  perl524.dll!Perl_pp_entersub(interpreter * my_perl)  Line 3987 + 0xc bytes C
  perl524.dll!Perl_runops_standard(interpreter * my_perl)  Line 41 + 0x6 bytes C
  perl524.dll!S_run_body(interpreter * my_perl, long oldscope)  Line 2485 C
  perl524.dll!perl_run(interpreter * my_perl)  Line 2406 + 0xc bytes C
  plperl.dll!plperl_init_interp()  Line 829 + 0xb bytes C
  plperl.dll!_PG_init()  Line 470 + 0x5 bytes C

I couldn't get much out of above bt, but, one thing i could notice is
that the input key passed to 'Perl_xs_handshake()' function is not
matching with the key being generated inside 'Perl_xs_handshake()',
hence, the handshaking is failing.

Please have a look into the following code snippet from 'perl 5.24'
and 'Util.c' file in plperl respectively,

Perl-5.24:
===
got = INT2PTR(void*, (UV)(key & HSm_KEY_MATCH));
need = (void *)(HS_KEY(FALSE, FALSE, "", "") & HSm_KEY_MATCH);
if (UNLIKELY(got != need))
goto bad_handshake;

Util.c in plperl:
==
485 XS_EXTERNAL(boot_PostgreSQL__InServer__Util)
486 {
487 #if PERL_VERSION_LE(5, 21, 5)
488dVAR; dXSARGS;
489 #else
490dVAR; dXSBOOTARGSAPIVERCHK;

Actually the macro 'dXSBOOTARGSAPIVERCHK' in line #490 gets expanded to,

#define XS_APIVERSION_SETXSUBFN_POPMARK_BOOTCHECK
 \
Perl_xs_handshake(HS_KEY(TRUE, TRUE, "v" PERL_API_VERSION_STRING,
""),  \
HS_CXT, __FILE__, "v" PERL_API_VERSION_STRING)

And the point to be noted is, the way, the key is being generated in
above macro and in Perl_xs_handshake(). As shown above,
'XS_APIVERSION_SETXSUBFN_POPMARK_BOOTCHECK' macro passes
'PERL_API_VERSION_STRING' as input to HS_KEY() to generate the key and
this key is passed to Perl_xs_handshake function whereas inside
Perl_xs_handshake(), there is no such version string being used to
generate the key. Hence, the key mismatch is seen thereby resulting in
a bad_handshake error.

After doing some study, I could understand that Util.c is generated
from Util.xs by xsubpp compiler at build time. This is being done in
Mkvcbuild.pm file in postgres. If I manually replace
'dXSBOOTARGSAPIVERCHK' macro with 'dXSBOOTARGSNOVERCHK' macro in
src/pl/plperl/Util.c, the things work fine. The diff is as follows,

XS_EXTERNAL(boot_PostgreSQL__InServer__Util)
{
#if PERL_VERSION_LE(5, 21, 5)
dVAR; dXSARGS;
#else
-dVAR; dXSBOOTARGSAPIVERCHK;
+dVAR; dXSBOOTARGSNOVERCHK;

I need to further investigate, but let me know if you have any ideas.

>
> Latest versions of ActivePerl don't ship with library descriptor files,
> either, which is unpleasant.
>

I too had similar observation and surprisingly, I am seeing
'libperl*.a' file instead of 'perl*.lib' file in most of  the
ActiverPerl versions for Windows.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition : Append node over a single SeqScan

2017-07-05 Thread Ashutosh Sharma

Hi,

On Wed, Jul 5, 2017 at 3:58 PM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> On Wed, Jul 5, 2017 at 3:48 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>> Hi All,
>>
>> Today while exploring a bit on Range table partitioning, I could see
>> that even if scan is performed on a single partition, the plan node
>> has Append node in it. Isn't it a bug?
>
> No. See following comment from create_append_plan()
> 1045 /*
> 1046  * XXX ideally, if there's just one child, we'd not bother to
> generate an
> 1047  * Append node but just return the single child.  At the
> moment this does
> 1048  * not work because the varno of the child scan plan won't match the
> 1049  * parent-rel Vars it'll be asked to emit.
> 1050  */
>

Thanks for the information.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Partition : Append node over a single SeqScan

2017-07-05 Thread Ashutosh Sharma

Hi All,

Today while exploring a bit on Range table partitioning, I could see
that even if scan is performed on a single partition, the plan node
has Append node in it. Isn't it a bug?

As per the usage of Append Node, it should only be seen in the
queryplan when scan is performed on multiple tables.

Following are the steps to reproduce the problem.

CREATE TABLE part_tab (c1 int, c2 int) PARTITION BY RANGE (c1);

CREATE TABLE part1 PARTITION OF part_tab FOR VALUES FROM (0) TO (100);

CREATE TABLE part2 PARTITION OF part_tab FOR VALUES FROM (100) TO (200);

postgres=# explain analyze select * from part_tab where c1 > 100;
   QUERY PLAN

 Append  (cost=0.00..38.25 rows=753 width=8) (actual time=0.009..0.009
rows=0 loops=1)
   ->  Seq Scan on part2  (cost=0.00..38.25 rows=753 width=8) (actual
time=0.009..0.009 rows=0 loops=1)
 Filter: (c1 > 100)
 Planning time: 166698.973 ms
 Execution time: 0.043 ms
(5 rows)

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-03 Thread Ashutosh Sharma

Hi,

On Mon, Jul 3, 2017 at 4:10 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Mon, Jul 3, 2017 at 3:24 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> > On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> >> The basic issue was that the WAL logging for Create Index operation
> >> was oblivion of the fact that for unlogged tables only INIT forks need
> >> to be logged.  Another point which we need to consider is that while
> >> replaying the WAL for the create index operation, we need to flush the
> >> buffer if it is for init fork.  This needs to be done only for pages
> >> that can be part of init fork file (like metapage, bitmappage).
> >> Attached patch fixes the issue.
> >>
> >> Another approach to fix the issue could be that always log complete
> >> pages for the create index operation on unlogged tables (in
> >> hashbuildempty).  However, the logic for initial hash index pages
> >> needs us to perform certain other actions (like update metapage after
> >> the creation of bitmappage) which make it difficult to follow that
> >> approach.
> >>
> >
> > Thanks for sharing the steps to reproduce the issue in detail. I could
> > easily reproduce this issue. I also had a quick look into the patch and the
> > fix looks fine to me. However, it would be good if we can add at least one
> > test for unlogged table with hash index in the regression test suite.
> >
>
> There is already such a test, see create_index.sql.
> CREATE UNLOGGED TABLE unlogged_hash_table (id int4);
> CREATE INDEX unlogged_hash_index ON unlogged_hash_table USING hash (id
> int4_ops);
>
> Do you have something else in mind?

Yes, that's what i had in my mind. But, good to know that we already
have a test-case with hash index created on an unlogged table.

>
>
> I think the problem mentioned in this thread can be caught only if we
> promote the standby and restart it.  We might be able to write it
> using TAP framework, but I think such a test should exist for other
> index types as well or in general for unlogged tables.  I am not
> opposed to writing such a test if you and or others feel so.

I think, it would be a good thing to add such test cases using TAP
framework but as you said, it would be good to take others opinion as
well on this. Thanks.

>
> > Apart from that i have tested the patch and the patch seems to be working
> > fine.
>
> Thanks.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-03 Thread Ashutosh Sharma

Hi,

On Mon, Jul 3, 2017 at 9:12 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> While discussing the behavior of hash indexes with Bruce in the nearby
> thread [1], it has been noticed that hash index on unlogged tables
> doesn't behave as expected.  Prior to 10, it has the different set of
> problems (mainly because hash indexes are not WAL-logged) which were
> discussed on that thread [1], however when I checked, it doesn't work
> even for 10.  Below are steps to reproduce the problem.
>
> 1. Setup master and standby
> 2. On the master, create unlogged table and hash index.
>2A.  Create unlogged table t1(c1 int);
>2B.  Create hash index idx_t1_hash on t1 using hash(c1);
> 3. On Standby, try selecting data,
>select * from t1;
>ERROR:  cannot access temporary or unlogged relations during recovery
> ---Till here everything works as expected
> 4. Promote standby to master (I have just stopped the standby and
> master and removed recovery.conf file from the standby database
> location) and try starting the new master, it
> gives below error and doesn't get started.
> FATAL:  could not create file "base/12700/16387": File exists
>
> The basic issue was that the WAL logging for Create Index operation
> was oblivion of the fact that for unlogged tables only INIT forks need
> to be logged.  Another point which we need to consider is that while
> replaying the WAL for the create index operation, we need to flush the
> buffer if it is for init fork.  This needs to be done only for pages
> that can be part of init fork file (like metapage, bitmappage).
> Attached patch fixes the issue.
>
> Another approach to fix the issue could be that always log complete
> pages for the create index operation on unlogged tables (in
> hashbuildempty).  However, the logic for initial hash index pages
> needs us to perform certain other actions (like update metapage after
> the creation of bitmappage) which make it difficult to follow that
> approach.
>

Thanks for sharing the steps to reproduce the issue in detail. I could
easily reproduce this issue. I also had a quick look into the patch and the
fix looks fine to me. However, it would be good if we can add at least one
test for unlogged table with hash index in the regression test suite.

Apart from that i have tested the patch and the patch seems to be working
fine. Here are the steps that i have followed to verify the fix,

With Patch:

postgres=# SELECT pg_relation_filepath('t1');
 pg_relation_filepath
--
 base/14915/16384
(1 row)

postgres=# SELECT pg_relation_filepath('idx_t1_hash');
 pg_relation_filepath
--
 base/14915/16387
(1 row)

Master:
=
[ashu@localhost bin]$ ls -l ../master/base/14915/1638*
-rw---. 1 ashu ashu 18128896 Jul  3 14:29 ../master/base/14915/16384
-rw---. 1 ashu ashu24576 Jul  3 14:29 ../master/base/14915/16384_fsm
-rw---. 1 ashu ashu0 Jul  3 14:28
../master/base/14915/16384_init
-rw---. 1 ashu ashu 22339584 Jul  3 14:29 ../master/base/14915/16387
-rw---. 1 ashu ashu32768 Jul  3 14:28
../master/base/14915/16387_init


Standby:
==
[ashu@localhost bin]$ ls -l ../standby/base/14915/1638*
-rw---. 1 ashu ashu 0 Jul  3 14:28 ../standby/base/14915/16384_init
-rw---. 1 ashu ashu 32768 Jul  3 14:28 ../standby/base/14915/16387_init


Without patch:
==
postgres=# SELECT pg_relation_filepath('t1');
 pg_relation_filepath
--
 base/14915/16384
(1 row)

postgres=# SELECT pg_relation_filepath('idx_t1_hash');
 pg_relation_filepath
--
 base/14915/16387
(1 row)

Master:
=
[ashu@localhost bin]$ ls -l ../master/base/14915/1638*
-rw---. 1 ashu ashu 18128896 Jul  3 14:36 ../master/base/14915/16384
-rw---. 1 ashu ashu24576 Jul  3 14:36 ../master/base/14915/16384_fsm
-rw---. 1 ashu ashu0 Jul  3 14:35
../master/base/14915/16384_init
-rw---. 1 ashu ashu 22339584 Jul  3 14:36 ../master/base/14915/16387
-rw---. 1 ashu ashu32768 Jul  3 14:35
../master/base/14915/16387_init

Standby:
==
[ashu@localhost bin]$ ls -l ../standby/base/14915/1638*
-rw---. 1 ashu ashu 0 Jul  3 14:35 ../standby/base/14915/16384_init
*-rw---. 1 ashu ashu 73728 Jul  3 14:35 ../standby/base/14915/16387*
-rw---. 1 ashu ashu 24576 Jul  3 14:35 ../standby/base/14915/16387_init


As seen from the test results above, it is clear that without patch, both
main fork and init fork were getting WAL logged as the create hash index
WAL logging was being done without checking forkNum type.

_hash_init()
{


log_newpage(>rd_node,
forkNum,
blkno,
BufferGetPage(buf),
true);
_hash_relbuf(rel, buf);

....
}

I have als

Re: [HACKERS] pg_basebackup fails on Windows when using tablespace mapping

2017-06-27 Thread Ashutosh Sharma

Hi,

On Tue, Jun 27, 2017 at 10:13 AM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Tue, Jun 27, 2017 at 12:13 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
>> At quick glance, I think that this should definitely be a client-only
>> fix. One reason is that pg_basebackup should be able to work with past
>> servers. A second is that this impacts as well the backend code, and
>> readlink may not return an absolute path. At least that's true for
>> posix's version, Postgres uses its own wrapper with junction points..
>
> The problem is in pg_basebackup.c's get_tablespace_mapping(), which
> fails to provide a correct comparison for the directory given by
> caller. In your case the caller of get_tablespace_mapping() uses
> backslashes as directory value (path value received from backend), and
> the tablespace mapping uses slashes as canonicalize_path has been
> called once on it. Because of this inconsistency, the directory of the
> original tablespace is used, causing the backup to fail as it should.
> A simple fix is to make sure that the comparison between both things
> is consistent by using canonicalize_path on the directory value given
> by caller.
>
> Attached is a patch. I am parking that in the next CF so as this does
> not get forgotten as a bug fix. Perhaps a committer will show up
> before. Or not.

I am still seeing the issue with the attached patch. I had a quick
look into the patch. It seems to me like you have canonicalized the
tablespace path to convert win32 slashes to unix type of slashes but
that is not being passed to strcmp() function and probably that could
be the reason why the issue is still existing. Thanks.

for (cell = tablespace_dirs.head; cell; cell = cell->next)
-   if (strcmp(dir, cell->old_dir) == 0)
+   if (strcmp(canon_dir, cell->old_dir) == 0)

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_filedump doesn't compile with v10 sources

2017-06-26 Thread Ashutosh Sharma

Hi,

On Mon, Jun 26, 2017 at 12:25 PM, tushar <tushar.ah...@enterprisedb.com> wrote:
> Hi,
>
> While trying to do - make of pg_filedump against v10 sources , getting an
> errors
>
> [centos@centos-cpula pg_filedump]$ make
> cc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m64 -mtune=generic -DLINUX_OOM_ADJ=0 -Wall
> -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
> -Wendif-labels -fno-strict-aliasing -fwrapv
> -I/home/centos/pg10_/postgresql/src/include/ pg_filedump.c -c
> pg_filedump.c: In function ‘FormatControl’:
> pg_filedump.c:1650: error: ‘ControlFileData’ has no member named
> ‘enableIntTimes’
> make: *** [pg_filedump.o] Error 1
> [centos@centos-cpula pg_filedump]$
>

That's because the following git commit in v10 has removed
'enableIntTimes' member from 'ControlFileData' structure,

commit d28aafb6dda326688e2f042c95c93ea57963c03c
Author: Tom Lane <t...@sss.pgh.pa.us>
Date:   Thu Feb 23 12:23:12 2017 -0500

Remove pg_control's enableIntTimes field.

We don't need it any more.

pg_controldata continues to report that date/time type storage is
"64-bit integers", but that's now a hard-wired behavior not something
it sees in the data.  This avoids breaking pg_upgrade, and perhaps other
utilities that inspect pg_control this way.  Ditto for pg_resetwal.

I chose to remove the "bigint_timestamps" output column of
pg_control_init(), though, as that function hasn't been around long
and probably doesn't have ossified users.

Discussion: https://postgr.es/m/26788.1487455...@sss.pgh.pa.us

I think we will have change pg_filedump such that it no more refers to
'enableIntTimes'.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-22 Thread Ashutosh Sharma

Hi,

On Wed, Jun 21, 2017 at 9:50 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Wed, Jun 21, 2017 at 12:15 AM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>> On 6/20/17 09:23, Amit Kapila wrote:
>>> To avoid that why can't we use the same icu path for executing uconv
>>> as we are using for linking?
>>
>> Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or
>> something like that.
>>
>
> That's what I had in mind as well.
>

Okay, attached is the patch which first detects the platform type and
runs the uconv commands accordingly  from the corresponding icu bin
path. Please have a look and let me know for any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


detect_ICU_version_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-19 Thread Ashutosh Sharma

Hi,

On Tue, Jun 20, 2017 at 2:36 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/19/17 00:42, Ashutosh Sharma wrote:
>>> If we don't find unconv, isn't it better to fall back to non-UTF8
>>> version rather than saying command not found?
>> Well, if any of the ICU package is installed on our system then we
>> will certainly find uconv command. The only case where we can see such
>> error is when user doesn't have any of the ICU packages installed on
>> their system and are somehow trying to perform icu enabled build and
>> in such case the build configuration has to fail which i think is the
>> expected behaviour. Anyways, it is not uconv that decides whether we
>> should fallback to non-UTF8 or not. It's the ICU version that decides
>> whether to stick to UTF8 or fallback to nonUTF8 version. Thanks.
>
> How do we know we're running the uconv command from the installation
> that we will compile against?

Okay, I think, you are talking about the case where we may have
multiple ICU versions installed on our system and there might be a
possibility that the uconv command may get executed from the ICU
version that we are not trying to link with postgres. This can only
happen when user has set the path for icu binaries in the system PATH
variable incorrectly. For e.g., if i have installed ICU versions 49
and 53 on my system and the library and include path i am trying to
add is from ICU version 53 whereas the system PATH points to dll's
from ICU 49 then that will be considered as a incorrect build setup.
In  this case, even if i am trying to use ICU 53 libraries but as
dll's i am using is from ICU 49 (as my system PATH is pointing to ICU
49 bin path) the uconv -V output would be '49.1' instead of '53.1'. In
such case, the postgres installation would fail because during
installation it would search for libicu53.dll not libicu49.dll.
Therefore, even if we are running wrong uconv command, we would not at
all succeed with the installation process. Hence, i think, we
shouldn't be worrying about any such things. Please let me know if i
could answer your question or not. Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-06-19 Thread Ashutosh Sharma

Hi,

On Mon, Jun 19, 2017 at 12:25 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 11:42 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Hi,
>>
>> On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
>>> <kuntalghosh.2...@gmail.com> wrote:
>>>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>>>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>>>>> whenever we want compression on a plain-text dump file, we can set the
>>>>> stdout mode to O_BINARY. Is it a wrong approach?
>>>> With the help from Ashutosh Sharma, I tested this in Windows
>>>> environment. Sadly, it still doesn't work. :( IMHO, we should document
>>>> the issue somewhere.
>>>
>>> Why not?  I mean, if there's code there to force the output into
>>> binary mode, does that not work for the -Fc case?  And if it does work
>>> for the -Fc case, then why doesn't it also work for -Z9?
>>>
>>
>> I have re-verified the patch with the help of my colleague 'Neha
>> Sharma' on Windows server 2008 R2 machine and the fix looks to be
>> working fine. With the help of attached patch,
>>
>
> Did you intended to attach the patch to this e-mail or are you
> referring to Kuntal's patch up thread?  If later, then it is better to
> mention the link of mail which has a patch that you have verified.
>

I am referring to Kuntal's patch upthread.  The link for the email
having the patch is,

https://www.postgresql.org/message-id/CAGz5QCJPvbBjXAmJuGx1B_41yVCetAJhp7rtaDf7XQGWuB1GSw%40mail.gmail.com

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_dump generates corrupted gzip file in Windows

2017-06-19 Thread Ashutosh Sharma

Hi,

On Fri, Mar 24, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 6:44 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
>>> ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
>>> mode, cmd pokes its nose and does CR / LF conversions on its own. So,
>>> whenever we want compression on a plain-text dump file, we can set the
>>> stdout mode to O_BINARY. Is it a wrong approach?
>> With the help from Ashutosh Sharma, I tested this in Windows
>> environment. Sadly, it still doesn't work. :( IMHO, we should document
>> the issue somewhere.
>
> Why not?  I mean, if there's code there to force the output into
> binary mode, does that not work for the -Fc case?  And if it does work
> for the -Fc case, then why doesn't it also work for -Z9?
>

I have re-verified the patch with the help of my colleague 'Neha
Sharma' on Windows server 2008 R2 machine and the fix looks to be
working fine. With the help of attached patch, i could see that when
pg_dump is ran in default mode (i.e. -Fp mode) with compression level
> 0 and output redirected to stdout, the dump file generated is not a
corrupted file and can be easily decompressed with the help of gzip
command and also it can be easily restored without any issues. Thanks.

Please note that earlier i was not able to reproduce the issue on my
windows setup and had to borrow Neha's machine to validate the fix.
Sorry, for the delayed response.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-18 Thread Ashutosh Sharma

Hi,

On Sun, Jun 18, 2017 at 6:39 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Sat, Jun 17, 2017 at 8:27 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Hi,
>>
>> On Sat, Jun 17, 2017 at 6:38 PM, Peter Eisentraut
>> <peter.eisentr...@2ndquadrant.com> wrote:
>>> On 6/16/17 23:46, Amit Kapila wrote:
>>>> I have just posted one way
>>>> to determine if icu library has support for ucol_strcollUTF8, see if
>>>> that sounds like a way forward to you.
>>>
>>> I'm not in a position to test such patches, so someone else will have to
>>> take that on.
>>
>
> I can verify the patch on Win7 setup to which I have access if that helps.
>
>> Well, I have tested it from my end. I've basically tried to test it by
>> running multi-byte tests as Amit suggested and by verifing the things
>> manually through debugger . And, i had mistakenly attached wrong patch
>> in my earlier email. Here, i attach the correct patch.
>>
>
> Is it possible for you to once verify this patch with icu library
> version where ucol_strcollUTF8 is not defined?

Okay, I have verified the attached patch with following ICU versions
and didn't find any problem

1) ICU 4.8.1.1
2) ICU 49.1.2
3) ICU 50.1.2
4) ICU 53.1

The first two ICU versions i.e. 'ICU 4.8.1.1' and 'ICU 49.1.2' doesn't
have 'ucol_strcollUTF8' defined in it whereas the last two versions
i.e. 'ICU 50.1.2' and 'ICU 53.1', includes 'ucol_strcollUTF8'.

>
> + # get the icu version.
> + my $output = `uconv -V /? 2>&1`;
> + $? >> 8 == 0 or die "uconv command not found";
>
> If we don't find unconv, isn't it better to fall back to non-UTF8
> version rather than saying command not found?

Well, if any of the ICU package is installed on our system then we
will certainly find uconv command. The only case where we can see such
error is when user doesn't have any of the ICU packages installed on
their system and are somehow trying to perform icu enabled build and
in such case the build configuration has to fail which i think is the
expected behaviour. Anyways, it is not uconv that decides whether we
should fallback to non-UTF8 or not. It's the ICU version that decides
whether to stick to UTF8 or fallback to nonUTF8 version. Thanks.

>
> + print $o "#define HAVE_UCOL_STRCOLLUTF8 1\n"
> + if ($self->{ICUVersion} >= 50);
>
> Indentation looks slightly odd, see the other usage in the same file.

I have corrected it. Please have a look into the attached patch. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

detect_ICU_version_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-17 Thread Ashutosh Sharma

Hi,

On Sat, Jun 17, 2017 at 6:38 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/16/17 23:46, Amit Kapila wrote:
>> I have just posted one way
>> to determine if icu library has support for ucol_strcollUTF8, see if
>> that sounds like a way forward to you.
>
> I'm not in a position to test such patches, so someone else will have to
> take that on.

Well, I have tested it from my end. I've basically tried to test it by
running multi-byte tests as Amit suggested and by verifing the things
manually through debugger. And, i had mistakenly attached wrong patch
in my earlier email. Here, i attach the correct patch. Sorry about
that.

>
> It might not be worth bothering.  ICU 50 is already about 5 years old.
> If you're packaging for Windows, I suspect you have the option of
> bundling a version of your choice.  The support for older versions is
> mainly to support building on "stable" Linux distributions, and even
> there the window of usefulness is closing.  (CentOS 7 has 50, CentOS 6
> has 4.2 which is too old for other reasons, Debian stable has 52 (and it
> will become oldstable after today)).

Yes, it's hard to find any user's having ICU version < 50 installed on
their system. But, having said that, it's always good to have such
detective checks basically considering that we already have such
configure check for Linux platform. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

detect_ICU_version_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-17 Thread Ashutosh Sharma

Hi,

Attached is the patch that detects the ICU version on Windows and
based on that it decides whether 'HAVE_UCOL_STRCOLLUTF8' flag needs to
be set or not.  If this patch gets consisdered then, we may have to
revert the changes following git commit. Thanks.

commit e42645ad92687a2250ad17e1a045da73e54a5064
Author: Peter Eisentraut <pete...@gmx.net>
Date:   Fri Jun 16 21:23:22 2017 -0400

Define HAVE_UCOL_STRCOLLUTF8 on Windows

This should normally be determined by a configure check, but until
someone figures out how to do that on Windows, it's better that the code
uses the new function by default.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Sat, Jun 17, 2017 at 9:16 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Sat, Jun 17, 2017 at 6:57 AM, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
>> On 6/16/17 10:12, Peter Eisentraut wrote:
>>> On 6/16/17 06:30, Amit Kapila wrote:
>>>> How will this compare UTF-8 strings in UTF-8 encoding?  It seems to me
>>>> that ideally, it should use ucol_strcollUTF8 to compare the same,
>>>> however, with patch, it will always ucol_strcoll as we never define
>>>> HAVE_UCOL_STRCOLLUTF8 flag on Windows.
>>>
>>> We have a configure check for that, but I don't know how to replicate
>>> that on Windows.
>>>
>>> If ucol_strcollUTF8 is not available, we have code to convert to UTF-16.
>>>  This is the same code that is used for non-Windows.
>>
>> After thinking about this some more, I have committed a change to define
>> HAVE_UCOL_STRCOLLUTF8 on Windows unconditionally.  Until someone figures
>> out a different way, I think it's better that users of newish versions
>> of ICU get the newer/better behavior, and users of older versions can
>> file a bug.
>>
>
> Yeah, we can take that stand or maybe document that as well but not
> sure that is the best way to deal with it.  I have just posted one way
> to determine if icu library has support for ucol_strcollUTF8, see if
> that sounds like a way forward to you.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


detect_ICU_version.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-16 Thread Ashutosh Sharma

Hi,

On Sat, Jun 17, 2017 at 6:57 AM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/16/17 10:12, Peter Eisentraut wrote:
>> On 6/16/17 06:30, Amit Kapila wrote:
>>> How will this compare UTF-8 strings in UTF-8 encoding?  It seems to me
>>> that ideally, it should use ucol_strcollUTF8 to compare the same,
>>> however, with patch, it will always ucol_strcoll as we never define
>>> HAVE_UCOL_STRCOLLUTF8 flag on Windows.
>>
>> We have a configure check for that, but I don't know how to replicate
>> that on Windows.
>>
>> If ucol_strcollUTF8 is not available, we have code to convert to UTF-16.
>>  This is the same code that is used for non-Windows.
>
> After thinking about this some more, I have committed a change to define
> HAVE_UCOL_STRCOLLUTF8 on Windows unconditionally.

Surprisingly, I am not able to find this commit on a master branch.

Until someone figures
> out a different way, I think it's better that users of newish versions
> of ICU get the newer/better behavior, and users of older versions can
> file a bug.  The alternative is that we forget about this and we keep
> using the old code path indefinitely.

Well, it will work for the users who are using ICU version >= 50 but
not for the older ICU versions So, we will have to figure out a way,
to either detect the availability of ucoll_strcollutf8() on Windows or
may be ICU version itself and set HAVE_UCOL_STRCOLLUTF8 flag
accordingly.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-15 Thread Ashutosh Sharma

Hi,

On Thu, Jun 15, 2017 at 8:36 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/12/17 00:38, Ashutosh Sharma wrote:
>> PFA patch that fixes the issue described in above thread. As mentioned
>> in the above thread, the crash is basically happening in varstr_cmp()
>> function and  it's  only happening on Windows because in varstr_cmp(),
>> if the collation provider is ICU, we don't even think of calling ICU
>> functions to compare the string. Infact, we directly attempt to call
>> the OS function wsccoll*() which is not expected. Thanks.
>
> Maybe just
>
> diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
> index a0dd391f09..2506f4eeb8 100644
> --- a/src/backend/utils/adt/varlena.c
> +++ b/src/backend/utils/adt/varlena.c
> @@ -1433,7 +1433,7 @@ varstr_cmp(char *arg1, int len1, char *arg2, int len2, 
> Oid collid)
>
>  #ifdef WIN32
> /* Win32 does not have UTF-8, so we need to map to UTF-16 */
> -   if (GetDatabaseEncoding() == PG_UTF8)
> +   if (GetDatabaseEncoding() == PG_UTF8 && (!mylocale || 
> mylocale->provider == COLLPROVIDER_LIBC))
> {
> int a1len;
> int a2len;

Oh, yes, this looks like the simplest and possibly the ideal way to
fix the issue. Attached is the patch. Thanks for the inputs.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index a0dd391..d24d1c5 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1433,7 +1433,8 @@ varstr_cmp(char *arg1, int len1, char *arg2, int len2, Oid collid)
 
 #ifdef WIN32
 		/* Win32 does not have UTF-8, so we need to map to UTF-16 */
-		if (GetDatabaseEncoding() == PG_UTF8)
+		if (GetDatabaseEncoding() == PG_UTF8 &&
+			(!mylocale || mylocale->provider == COLLPROVIDER_LIBC))
 		{
 			int			a1len;
 			int			a2len;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-15 Thread Ashutosh Sharma

Hi,

On Thu, Jun 15, 2017 at 7:43 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Mon, Jun 12, 2017 at 10:08 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> PFA patch that fixes the issue described in above thread. As mentioned
>> in the above thread, the crash is basically happening in varstr_cmp()
>> function and  it's  only happening on Windows because in varstr_cmp(),
>> if the collation provider is ICU, we don't even think of calling ICU
>> functions to compare the string. Infact, we directly attempt to call
>> the OS function wsccoll*() which is not expected. Thanks.
>>
>
> Your analysis is right, basically, when ICU is enabled we need to use
> ICU related functions and use corresponding information in the
> pg_locale structure. However, I see few problems with your patch.
>
> + if (mylocale)
> + {
> + if (mylocale->provider == COLLPROVIDER_ICU)
> + {
> +#ifdef USE_ICU
> +#ifdef HAVE_UCOL_STRCOLLUTF8
> + if (GetDatabaseEncoding() == PG_UTF8)
> + {
> + UErrorCode status;
> +
> + status = U_ZERO_ERROR;
> + result = ucol_strcollUTF8(mylocale->info.icu.ucol,
> +  arg1, len1,
> +  arg2, len2,
> +  );
>
> On Windows, we need to map UTF-8 strings to UTF-16 as we are doing in
> this function for other Windows specific comparisons for UTF-8
> strings.  Also, we don't want to screw memcmp optimization we have in
> this function, so do this ICU specific checking after memcmp check.

Firstly, thanks for reviewing the patch. Okay, we do map UTF-8 strings
to UTF-16 on windows. But, I think, we only do that when calling OS
(wcscoll*) functions for string comparison. Please correct me if i am
wrong here. In my case, i am using ICU functions for comparing strings
and i am not sure if we need the convert UTF-8 strings to UTF-16.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU support on Windows

2017-06-13 Thread Ashutosh Sharma

On Tue, Jun 13, 2017 at 6:45 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/12/17 14:03, Ashutosh Sharma wrote:
>>> I noticed that this only works if you use the "Win32" download of ICU,
>>> because the "Win64" download uses "lib64" paths.  I'm not sure what the
>>> impact of this is in practice.
>>
>> Yes, that's right, Win64 download uses lib64 path and in my case i had
>> renamed lib64-> lib and bin64-> bin which i guess is not a right thing
>> to do. I think, we should allow Solution.pm to detect the platform and
>> make a decision on the library path accordingly. Attached patch does
>> that. Please have a look let me know your thoughts on this. Thanks.
>
> committed

Thank you :)

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU support on Windows

2017-06-12 Thread Ashutosh Sharma

Hi,

On Jun 12, 2017 11:43 PM, "Alvaro Herrera" <alvhe...@2ndquadrant.com> wrote:

Ashutosh Sharma wrote:
> > I noticed that this only works if you use the "Win32" download of ICU,
> > because the "Win64" download uses "lib64" paths.  I'm not sure what the
> > impact of this is in practice.
>
> Yes, that's right, Win64 download uses lib64 path and in my case i had
> renamed lib64-> lib and bin64-> bin which i guess is not a right thing
> to do. I think, we should allow Solution.pm to detect the platform and
> make a decision on the library path accordingly. Attached patch does
> that. Please have a look let me know your thoughts on this. Thanks.

Uh, that's pretty odd.  Is it something libicu-specific?  Because I
don't see any other occurrence of \lib64\ anywhere in the MSVC build
scripts.

Yes, it is specific to libicu.

Re: [HACKERS] ICU support on Windows

2017-06-12 Thread Ashutosh Sharma

Hi,

On Mon, Jun 12, 2017 at 8:39 PM, Peter Eisentraut
<peter.eisentr...@2ndquadrant.com> wrote:
> On 6/10/17 05:40, Ashutosh Sharma wrote:
>> With the help of attached patch, we can use icu feature on Windows.
>> All we have to do is, download the ICU libraries from - [1] and add
>> the installation path for icu libraires in config.pl like,
>>
>> icu => 'E:\Users\pg\icu',
>>
>> [1]- http://site.icu-project.org/download/53
>
> Committed that.

Thanks for that.

>
> I noticed that this only works if you use the "Win32" download of ICU,
> because the "Win64" download uses "lib64" paths.  I'm not sure what the
> impact of this is in practice.

Yes, that's right, Win64 download uses lib64 path and in my case i had
renamed lib64-> lib and bin64-> bin which i guess is not a right thing
to do. I think, we should allow Solution.pm to detect the platform and
make a decision on the library path accordingly. Attached patch does
that. Please have a look let me know your thoughts on this. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


load_platform_specific_icu_lib.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU support on Windows

2017-06-12 Thread Ashutosh Sharma

Hi,

On Mon, Jun 12, 2017 at 12:18 PM, Michael Paquier
<michael.paqu...@gmail.com> wrote:
> On Sat, Jun 10, 2017 at 6:40 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Currently, we cannot perform ICU enabled build for postgres on Windows
>> platform. However, this can be done on Linux platforms using
>> '--with-icu' configuration parameter. Attached is the patch that
>> allows us to perform icu enabled build for postgres on Windows
>> platform provided that we have correct version of ICU libraries
>> installed on our system.
>>
>> With the help of attached patch, we can use icu feature on Windows.
>> All we have to do is, download the ICU libraries from - [1] and add
>> the installation path for icu libraires in config.pl like,
>
> I have looked at this patch and it looks good to me. Windows is able
> to compile with ICU support using it, which is a good step forward
> into moving on with things in this area.
>
> I have added an open item, it seems to me that it is a mistake to not
> be able to have that support as well on Windows in PG10.

Thanks for reviewing the patch and adding it to the PostgreSQL 10 open
items list.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-11 Thread Ashutosh Sharma

PFA patch that fixes the issue described in above thread. As mentioned
in the above thread, the crash is basically happening in varstr_cmp()
function and  it's  only happening on Windows because in varstr_cmp(),
if the collation provider is ICU, we don't even think of calling ICU
functions to compare the string. Infact, we directly attempt to call
the OS function wsccoll*() which is not expected. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Sat, Jun 10, 2017 at 3:25 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> Hi All,
>
> I am seeing a server crash when running queries using ICU collations on
> Windows. Following are the steps to reproduce the crash with the help of
> patch to enable icu feature on Windows - [1],
>
> 1) psql -d postgres
>
> 2) CREATE DATABASE icu_win_test
>  TEMPLATE template0
>  ENCODING 'UTF8'
> LC_CTYPE 'C'
>LC_COLLATE 'C';
>
> 3) \c icu_win_test;
>
> 4) icu_win_test=# select 'B' > 'a' COLLATE "de-u-co-standard-x-icu";
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
> The backtrace observed in the core dump is,
>
>> postgres.exe!varstr_cmp(char * arg1, int len1, char * arg2, int len2,
>> unsigned int collid)  Line 1494 C
>   postgres.exe!text_cmp(varlena * arg1, varlena * arg2, unsigned int collid)
> Line 1627 C
>   postgres.exe!text_gt(FunctionCallInfoData * fcinfo)  Line 1738 + 0x12
> bytes C
>   postgres.exe!ExecInterpExpr(ExprState * state, ExprContext * econtext,
> char * isnull)  Line 650 + 0xa bytes C
>   postgres.exe!evaluate_expr(Expr * expr, unsigned int result_type, int
> result_typmod, unsigned int result_collation)  Line 4719 C
>   postgres.exe!evaluate_function(unsigned int funcid, unsigned int
> result_type, int result_typmod, unsigned int result_collid, unsigned int
> input_collid, List * args, char funcvariadic, HeapTupleData * func_tuple,
> eval_const_expressions_context * context)  Line 4272 + 0x50 bytes C
>   postgres.exe!simplify_function(unsigned int funcid, unsigned int
> result_type, int result_typmod, unsigned int result_collid, unsigned int
> input_collid, List * * args_p, char funcvariadic, char process_args, char
> allow_non_const, eval_const_expressions_context * context)  Line 3914 + 0x44
> bytes C
>   postgres.exe!eval_const_expressions_mutator(Node * node,
> eval_const_expressions_context * context)  Line 2627 C
>   postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator,
> void * context)  Line 2735 + 0x37 bytes C
>   postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator,
> void * context)  Line 2932 + 0x8 bytes C
>   postgres.exe!eval_const_expressions(PlannerInfo * root, Node * node)  Line
> 2413 C
>   postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse,
> PlannerInfo * parent_root, char hasRecursion, double tuple_fraction)  Line
> 623 + 0x2d bytes C
>
> RCA:
> 
> As seen in the backtrace, the crash is basically happening in varstr_cmp()
> function. AFAICU, this crash is only happening on Windows because in
> varstr_cmp(), if the encoding type is UTF8, we don't even think of calling
> ICU functions to compare the string. Infact, we directly attempt to call the
> OS function wsccoll*().
>
> The point is, if collation provider is ICU, then we should tryto call ICU
> functions for collation support instead of calling OS functions. This thing
> is being taken care inside varstr_cmp() for Linux but surprisingly it's not
> handled for Windows. Please have a look at below code snippet in
> varstr_cmp() to know on how it is being taken care for linux,
>
> #endif   /* WIN32 */
> 
> 
>
> #ifdef USE_ICU
> #ifdef HAVE_UCOL_STRCOLLUTF8
> if (GetDatabaseEncoding() == PG_UTF8)
> {
>UErrorCode status;
>
>status = U_ZERO_ERROR;
>result = ucol_strcollUTF8(mylocale->info.icu.ucol,
>   arg1, len1,
>   arg2, len2,
>  );
>  if (U_FAILURE(status))
>  ereport(ERROR,
>  (errmsg("collation failed: %s", u_errorName(status;
>  }
> else
> #endif
> {
>   int32_t ulen1,
>   ulen2;
>  UChar   *uchar1, *uchar2;
>
> ulen1 = icu_to_uchar(, arg1, len1);
> ulen2 = icu_to_uchar(, arg2, len2);
>
> result = ucol_strcoll(mylocale->info.icu.ucol,
>   uchar1, ulen1,
>   uchar2, ul

[HACKERS] Getting server crash on Windows when using ICU collation

2017-06-10 Thread Ashutosh Sharma

Hi All,

I am seeing a server crash when running queries using ICU collations on
Windows. Following are the steps to reproduce the crash with the help of
patch to enable icu feature on Windows - [1],

1) psql -d postgres

2) CREATE DATABASE icu_win_test
 TEMPLATE template0
 ENCODING 'UTF8'
LC_CTYPE 'C'
   LC_COLLATE 'C';

3) \c icu_win_test;

4) icu_win_test=# select 'B' > 'a' COLLATE "de-u-co-standard-x-icu";
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

The backtrace observed in the core dump is,

> postgres.exe!varstr_cmp(char * arg1, int len1, char * arg2, int len2,
unsigned int collid)  Line 1494 C
  postgres.exe!text_cmp(varlena * arg1, varlena * arg2, unsigned int
collid)  Line 1627 C
  postgres.exe!text_gt(FunctionCallInfoData * fcinfo)  Line 1738 + 0x12
bytes C
  postgres.exe!ExecInterpExpr(ExprState * state, ExprContext * econtext,
char * isnull)  Line 650 + 0xa bytes C
  postgres.exe!evaluate_expr(Expr * expr, unsigned int result_type, int
result_typmod, unsigned int result_collation)  Line 4719 C
  postgres.exe!evaluate_function(unsigned int funcid, unsigned int
result_type, int result_typmod, unsigned int result_collid, unsigned int
input_collid, List * args, char funcvariadic, HeapTupleData * func_tuple,
eval_const_expressions_context * context)  Line 4272 + 0x50 bytes C
  postgres.exe!simplify_function(unsigned int funcid, unsigned int
result_type, int result_typmod, unsigned int result_collid, unsigned int
input_collid, List * * args_p, char funcvariadic, char process_args, char
allow_non_const, eval_const_expressions_context * context)  Line 3914 +
0x44 bytes C
  postgres.exe!eval_const_expressions_mutator(Node * node,
eval_const_expressions_context * context)  Line 2627 C
  postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator,
void * context)  Line 2735 + 0x37 bytes C
  postgres.exe!expression_tree_mutator(Node * node, Node * (void)* mutator,
void * context)  Line 2932 + 0x8 bytes C
  postgres.exe!eval_const_expressions(PlannerInfo * root, Node * node)
 Line 2413 C
  postgres.exe!subquery_planner(PlannerGlobal * glob, Query * parse,
PlannerInfo * parent_root, char hasRecursion, double tuple_fraction)  Line
623 + 0x2d bytes C

RCA:

As seen in the backtrace, the crash is basically happening in varstr_cmp()
function. AFAICU, this crash is only happening on Windows because in
varstr_cmp(), if the encoding type is UTF8, we don't even think of calling
ICU functions to compare the string. Infact, we directly attempt to call
the OS function wsccoll*().

The point is, if collation provider is ICU, then we should tryto call ICU
functions for collation support instead of calling OS functions. This thing
is being taken care inside varstr_cmp() for Linux but surprisingly it's not
handled for Windows. Please have a look at below code snippet in
varstr_cmp() to know on how it is being taken care for linux,

*#endif   /* WIN32 */*













































*#ifdef USE_ICU#ifdef HAVE_UCOL_STRCOLLUTF8if (GetDatabaseEncoding() ==
PG_UTF8){   UErrorCode status;   status = U_ZERO_ERROR;   result =
ucol_strcollUTF8(mylocale->info.icu.ucol,
arg1, len1,  arg2, len2,
 ); if (U_FAILURE(status))
   ereport(ERROR, (errmsg("collation failed: %s",
u_errorName(status; }else#endif{  int32_t ulen1,  ulen2;
 UChar   *uchar1, *uchar2;ulen1 = icu_to_uchar(, arg1, len1);
ulen2 = icu_to_uchar(, arg2, len2);result =
ucol_strcoll(mylocale->info.icu.ucol,
uchar1, ulen1,  uchar2, ulen2); }#else /*
not USE_ICU *//* shouldn't happen */ elog(ERROR, "unsupported collprovider:
%c", mylocale->provider); #endif   /* not USE_ICU */}else{#ifdef
HAVE_LOCALE_T result = strcoll_l(a1p, a2p, mylocale->info.lt
<http://info.lt>);#else/* shouldn't happen */ elog(ERROR,
"unsupported collprovider: %c", mylocale->provider);#endif}*

Fix:

I am currently working on this and will try to come up with the fix asap.

[1] -
https://www.postgresql.org/message-id/CAE9k0P%3DQRjtS1a0rgTyKag_%2Bs6XCs7vovV%2BgSkUfYVASog0P8w%40mail.gmail.com


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

[HACKERS] ICU support on Windows

2017-06-10 Thread Ashutosh Sharma

Hi All,

Currently, we cannot perform ICU enabled build for postgres on Windows
platform. However, this can be done on Linux platforms using
'--with-icu' configuration parameter. Attached is the patch that
allows us to perform icu enabled build for postgres on Windows
platform provided that we have correct version of ICU libraries
installed on our system.

With the help of attached patch, we can use icu feature on Windows.
All we have to do is, download the ICU libraries from - [1] and add
the installation path for icu libraires in config.pl like,

icu => 'E:\Users\pg\icu',

[1]- http://site.icu-project.org/download/53

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


ICU_support_windows.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-05-10 Thread Ashutosh Sharma

While doing the code coverage testing of v7 patch shared with - [1], I
found that there are few lines of code in _hash_next() that are
redundant and needs to be removed. I basically came to know this while
testing the scenario where a hash index scan starts when a split is in
progress. I have removed those lines and attached is the newer version
of patch.

[1] - 
https://www.postgresql.org/message-id/CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg%40mail.gmail.com

On Mon, May 8, 2017 at 6:55 PM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> Hi,
>
> On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>>
>>> Please note that these patches needs to be applied on top of  [1].
>>>
>>
>> Few more review comments:
>>
>> 1.
>> - page = BufferGetPage(so->hashso_curbuf);
>> + blkno = so->currPos.currPage;
>> + if (so->hashso_bucket_buf == so->currPos.buf)
>> + {
>> + buf = so->currPos.buf;
>> + LockBuffer(buf, BUFFER_LOCK_SHARE);
>> + page = BufferGetPage(buf);
>> + }
>>
>> Here, you are assuming that only bucket page can be pinned, but I
>> think that assumption is not right.  When _hash_kill_items() is called
>> before moving to next page, there could be a pin on the overflow page.
>> You need some logic to check if the buffer is pinned, then just lock
>> it.  I think once you do that, it might not be convinient to release
>> the pin at the end of this function.
>
> Yes, there are few cases where we might have pin on overflow pages too
> and we need to handle such cases in _hash_kill_items. I have taken
> care of this in the attached v7 patch. Thanks.
>
>>
>> Add some comments on top of _hash_kill_items to explain the new
>> changes or say some thing like "See _bt_killitems for details"
>
> Added some more comments on the new changes.
>
>>
>> 2.
>> + /*
>> + * We save the LSN of the page as we read it, so that we know whether it
>> + * safe to apply LP_DEAD hints to the page later.  This allows us to drop
>> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
>> + */
>> + so->currPos.lsn = PageGetLSN(page);
>> +
>>
>> The second part of above comment doesn't make sense because vacuum's
>> will still be blocked because we hold the pin on primary bucket page.
>
> That's right. It doesn't make sense because we won't allow vacuum to
> start. I have removed it.
>
>>
>> 3.
>> + {
>> + /*
>> + * No more matching tuples were found. return FALSE
>> + * indicating the same. Also, remember the prev and
>> + * next block number so that if fetching tuples using
>> + * cursor we remember the page from where to start the
>> + * scan.
>> + */
>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>
>> You can't read opaque without having lock and by this time it has
>> already been released.
>
> I have corrected it. Please refer to the attached v7 patch.
>
> Also, I think if you want to save position for
>> cursor movement, then you need to save the position of last bucket
>> when scan completes the overflow chain, however as you have written it
>> will be always invalid block number. I think there is similar problem
>> with prevblock number.
>
> Did you mean last bucket or last page. If it is last page, then I am
> already storing it.
>
>>
>> 4.
>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
>> offnum,
>> +   OffsetNumber maxoff, ScanDirection dir)
>> +{
>> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
>> + IndexTuple  itup;
>> + int itemIndex;
>> +
>> + if (ScanDirectionIsForward(dir))
>> + {
>> + /* load items[] in ascending order */
>> + itemIndex = 0;
>> +
>> + /* new page, relocate starting position by binary search */
>> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
>>
>> What is the need to find offset number when this function already has
>> that as an input parameter?
>
> It's not required. I have removed it.
>
>>
>> 5.
>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
>> offnum,
>> +   OffsetNumber maxoff, ScanDirection dir)
>>
>> I think maxoff is not required to be passed a parameter to this
>> function as you need it only for forward scan. You can compute it
>> locally.
>
> It is require

Re: [HACKERS] Page Scan Mode in Hash Index

2017-05-08 Thread Ashutosh Sharma

Hi,

On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
>>
>> Please note that these patches needs to be applied on top of  [1].
>>
>
> Few more review comments:
>
> 1.
> - page = BufferGetPage(so->hashso_curbuf);
> + blkno = so->currPos.currPage;
> + if (so->hashso_bucket_buf == so->currPos.buf)
> + {
> + buf = so->currPos.buf;
> + LockBuffer(buf, BUFFER_LOCK_SHARE);
> + page = BufferGetPage(buf);
> + }
>
> Here, you are assuming that only bucket page can be pinned, but I
> think that assumption is not right.  When _hash_kill_items() is called
> before moving to next page, there could be a pin on the overflow page.
> You need some logic to check if the buffer is pinned, then just lock
> it.  I think once you do that, it might not be convinient to release
> the pin at the end of this function.

Yes, there are few cases where we might have pin on overflow pages too
and we need to handle such cases in _hash_kill_items. I have taken
care of this in the attached v7 patch. Thanks.

>
> Add some comments on top of _hash_kill_items to explain the new
> changes or say some thing like "See _bt_killitems for details"

Added some more comments on the new changes.

>
> 2.
> + /*
> + * We save the LSN of the page as we read it, so that we know whether it
> + * safe to apply LP_DEAD hints to the page later.  This allows us to drop
> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
> + */
> + so->currPos.lsn = PageGetLSN(page);
> +
>
> The second part of above comment doesn't make sense because vacuum's
> will still be blocked because we hold the pin on primary bucket page.

That's right. It doesn't make sense because we won't allow vacuum to
start. I have removed it.

>
> 3.
> + {
> + /*
> + * No more matching tuples were found. return FALSE
> + * indicating the same. Also, remember the prev and
> + * next block number so that if fetching tuples using
> + * cursor we remember the page from where to start the
> + * scan.
> + */
> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>
> You can't read opaque without having lock and by this time it has
> already been released.

I have corrected it. Please refer to the attached v7 patch.

Also, I think if you want to save position for
> cursor movement, then you need to save the position of last bucket
> when scan completes the overflow chain, however as you have written it
> will be always invalid block number. I think there is similar problem
> with prevblock number.

Did you mean last bucket or last page. If it is last page, then I am
already storing it.

>
> 4.
> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
> offnum,
> +   OffsetNumber maxoff, ScanDirection dir)
> +{
> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
> + IndexTuple  itup;
> + int itemIndex;
> +
> + if (ScanDirectionIsForward(dir))
> + {
> + /* load items[] in ascending order */
> + itemIndex = 0;
> +
> + /* new page, relocate starting position by binary search */
> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
>
> What is the need to find offset number when this function already has
> that as an input parameter?

It's not required. I have removed it.

>
> 5.
> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
> offnum,
> +   OffsetNumber maxoff, ScanDirection dir)
>
> I think maxoff is not required to be passed a parameter to this
> function as you need it only for forward scan. You can compute it
> locally.

It is required for both forward and backward scan. However, I am not
passing it to _hash_load_qualified_items() now.

>
> 6.
> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber 
> offnum,
> +   OffsetNumber maxoff, ScanDirection dir)
> {
> ..
> + if (ScanDirectionIsForward(dir))
> + {
> ..
> + while (offnum <= maxoff)
> {
> ..
> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
> + _hash_checkqual(scan, itup))
> + {
> + /* tuple is qualified, so remember it */
> + _hash_saveitem(so, itemIndex, offnum, itup);
> + itemIndex++;
> + }
> +
> + offnum = OffsetNumberNext(offnum);
> ..
>
> Why are you traversing the whole page even when there is no match?
> There is a similar problem in backward scan. I think this is blunder.

Fixed. Please check the attached
'0001-Rewrite-hash-index-scans-to-work-a-page-at-a-timev7.patch'

>
> 7.
> + if (so->currPos.buf == so->hashso_bucket_buf ||
> + so->cur

Re: [HACKERS] Re: [BUGS] BUG #14634: On Windows pg_basebackup should write tar to stdout in binary mode

2017-05-03 Thread Ashutosh Sharma

Hi Craig,

On Wed, May 3, 2017 at 10:50 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 3 May 2017 at 12:32, Haribabu Kommi <kommi.harib...@gmail.com> wrote:
>> [Adding -hackers mailing list]
>>
>> On Fri, Apr 28, 2017 at 6:28 PM, <henry_boehl...@agilent.com> wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:  14634
>>> Logged by:  Henry Boehlert
>>> Email address:  henry_boehl...@agilent.com
>>> PostgreSQL version: 9.6.2
>>> Operating system:   Windows Server 2012 R2 6.3.9600
>>> Description:
>>>
>>> Executing command pg_basebackup -D -F t writes its output to stdout, which
>>> is open in text mode, causing LF to be converted to CR LF thus corrupting
>>> the resulting archive.
>>>
>>> To write the tar to stdout, on Windows stdout's mode should be temporarily
>>> switched to binary.
>>>
>>> https://msdn.microsoft.com/en-us/library/tw4k6df8.aspx
>>
>>
>> Thanks for reporting the issue.
>> With the attached patch, I was able to extract the tar file that gets
>> generated when the tar file is written into stdout. I tested the
>> the compressed tar also.
>>
>> This bug needs to be fixed in back branches also.
>
> We should do the same for pg_dump in -Fc mode.

Did you meant -Fp mode. I think we are already setting stdout file to
binary mode if the output format is custom. Please refer to the
following code in parseArchiveFormat() and _allocAH() respectively

static ArchiveFormat
parseArchiveFormat(const char *format, ArchiveMode *mode)
{
...
...
else if (pg_strcasecmp(format, "c") == 0)
archiveFormat = archCustom;
else if (pg_strcasecmp(format, "custom") == 0)
archiveFormat = archCustom;

else if (pg_strcasecmp(format, "p") == 0)
archiveFormat = archNull;
else if (pg_strcasecmp(format, "plain") == 0)
archiveFormat = archNull;
...
...
}

static ArchiveHandle *
_allocAH(const char *FileSpec, const ArchiveFormat fmt,
 const int compression, bool dosync, ArchiveMode mode,
 SetupWorkerPtrType setupWorkerPtr)
{

...
...
#ifdef WIN32
if (fmt != archNull &&
(AH->fSpec == NULL || strcmp(AH->fSpec, "") == 0))
{
if (mode == archModeWrite)
setmode(fileno(stdout), O_BINARY);
else
setmode(fileno(stdin), O_BINARY);
}
#endif
..
..
}

Please confirm.

Meanwhile, I have unit tested the patch submitted by Hari upthread on
postgresql v10 and it works fine. Below are the steps that i have
followed to test Hari's patch.

Without patch:
==
C:\Users\ashu\postgresql\TMP\test\bin>.\pg_basebackup.exe -D - -F t -X
none > base.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL seg
ments are copied through other means to complete the backup

C:\Users\ashu\postgresql\TMP\test\bin>tar -xf base.tar
tar: Skipping to next header
tar: Exiting with failure status due to previous errors


With patch:
===
C:\Users\ashu\git-clone-postgresql\postgresql\TMP\test\bin>.\pg_basebackup.exe
-D - -F t -X none > base.tar
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL seg
ments are copied through other means to complete the backup

C:\Users\ashu\postgresql\TMP\test\bin>cp base.tar ..\basebakup

C:\Users\ashu\postgresql\TMP\test\basebakup>tar -xf base.tar

C:\Users\ashu\postgresql\TMP\test\basebakup>ls
PG_VERSIONpg_commit_ts   pg_multixact  pg_stat  pg_wal
backup_label  pg_dynshmempg_notify pg_stat_tmp  pg_xact
base  pg_hba.confpg_replslot   pg_subtrans  postgresql.auto.conf
base.tar  pg_ident.conf  pg_serial pg_tblspcpostgresql.conf
globalpg_logical pg_snapshots  pg_twophase  tablespace_map

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-04-05 Thread Ashutosh Sharma

Hi,

>>
>> Based on the earlier discussions, I have prepared a patch that would
>> allow pgstathashindex() to show the number of unused pages in hash
>> index. Please find the attached patch for the same. Thanks.
>
> My idea is that we shouldn't end up with both a zero_pages column and
> an unused_pages column.  Instead, we should end up with just an
> unused_pages column, which will include both pages that are all-zeroes
> and pages that have a valid special space marked as LH_UNUSED.
>
> Also, I don't see why it's correct to test PageIsEmpty() here instead
> of just testing the page type as we do in pageinspect.
>
> Attached is a revised patch that shows what I have in mind; please
> review.  Along the way I made the code for examining the page type
> more similar to what pageinspect does, because I see no reason for
> those things to be different, and I think the pageinspect code is
> better.

I have reviewed the patch and it looks good to me. Also, the idea of
including both zero and unused pages in a single 'unused' column looks
better. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread Ashutosh Sharma

Hi,

As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!

Please note that this patch still belongs to Tomas not me. I still
remain the reviewer of this patch. The same thing has been very
clearly mentioned in the attached patch. The only intention behind
Ashutosh (reviewer) working on this patch is to ensure that we do not
miss the things that can easily get committed in this commit-fest.
Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Apr 4, 2017 at 7:23 PM, David Steele <da...@pgmasters.net> wrote:
> On 4/4/17 9:43 AM, Robert Haas wrote:
>> On Tue, Apr 4, 2017 at 9:32 AM, David Steele <da...@pgmasters.net> wrote:
>>> My goal is to help people focus on patches that have a chance.  At this
>>> point I think that includes poking authors who are not being responsive
>>> using the limited means at my disposal.
>>
>> +1.  Pings on specific threads can help clear things up when, for
>> example, the author and reviewer are each waiting for the other.  And,
>> as you say, they also help avoid the situation where a patch just
>> falls off the radar and misses the release for no especially good
>> reason, which naturally causes people frustration.
>>
>> I think your pings have been quite helpful, although I think it would
>> have been better in some cases if you'd done them sooner.  Pinging
>> after a week, with a 3 day deadline, when there are only a few days
>> left in the CommitFest isn't really leaving a lot of room to maneuver.
>
> Thanks for the feedback!  My thinking is that I don't want to bug people
> too soon, but there's a maximum number of days a thread should be idle.
> Over the course of the CF that has gone from 10 days, to 7, 5, and 3.
>
> I don't look at all patches every day so it can be a bit uneven, i.e.,
> all patches are allowed certain amount of idle time, but some may get a
> bit more depending on when I check up on them.
>
> Thanks,
> --
> -David
> da...@pgmasters.net
From 6d9814182eb03521f093d687eb8024c9df1c11d5 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Tue, 4 Apr 2017 21:38:42 +0530
Subject: [PATCH] pageinspect: Add bt_page_items function with bytea v6

Author: Tomas Vondra <tomas.von...@2ndquadrant.com>
Reviewed-by: Ashutosh Sharma <ashu.coe...@gmail.com>
---
 contrib/pageinspect/btreefuncs.c  | 198 +++---
 contrib/pageinspect/expected/btree.out|  13 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  14 ++
 contrib/pageinspect/sql/btree.sql |   4 +
 doc/src/sgml/pageinspect.sgml |  31 
 src/include/access/nbtree.h   |   1 +
 6 files changed, 210 insertions(+), 51 deletions(-)

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 3d69aa9..02440ec 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
 
 PG_FUNCTION_INFO_V1(bt_metap);
 PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -235,14 +236,6 @@ bt_page_stats(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
-/*---
- * bt_page_items()
- *
- * Get IndexTupleData set in a btree page
- *
- * Usage: SELECT * FROM bt_page_items('t1_pkey', 1);
- *---
- */
 
 /*
  * cross-call data structure for SRF
@@ -253,14 +246,72 @@ struct user_args
 	OffsetNumber offset;
 };
 
+/*---
+ * bt_page_print_tuples()
+ *
+ * Form a tuple describing index tuple at a given offset
+ * --
+ */
+static Datum
+bt_page_print_tuples(FuncCallContext *fctx, Page page, OffsetNumber offset)
+{
+	char	   *values[6];
+	HeapTuple	tuple;
+	ItemId		id;
+	IndexTuple	itup;
+	int			j;
+	int			off;
+	int			dlen;
+	char	   *dump;
+	char	   *ptr;
+
+	id = PageGetItemId(page, offset);
+
+	if (!ItemIdIsValid(id))
+		elog(ERROR, "invalid ItemId");
+
+	itup = (IndexTuple) PageGetItem(page, id);
+
+	j = 0;
+	values[j++] = psprintf("%d", offset);
+	values[j++] = psprintf("(%u,%u)",
+		   ItemPointerGetBlockNumberNoCheck(>t_tid),
+		   ItemPointerGetOffsetNumberNoCheck(>t_tid));
+	values[j++] = psprintf("%d", (int) IndexTupleSize(

Re: [HACKERS] Page Scan Mode in Hash Index

2017-04-01 Thread Ashutosh Sharma

Hi,

Thanks for the review.

>>> I am suspicious that _hash_kill_items() is going to have problems if
>>> the overflow page is freed before it reacquires the lock.
>>> _btkillitems() contains safeguards against similar cases.
>>
>> I have added similar check for hash_kill_items() as well.
>>
>
> I think hash_kill_items has a much bigger problem which is that we
> can't kill the items if the page has been modified after re-reading
> it.  Consider a case where Vacuum starts before the Scan on the bucket
> and then Scan went ahead (which is possible after your 0003 patch) and
> noted the killed items in killed item array and before it could kill
> all the items, Vacuum remove those items.  Now it is quite possible
> that before scan tries to kill those items, the corresponding itemids
> have been used by different tuples.  I think what we need to do here
> is to store the LSN of page first time when we have read the page and
> then compare it with current page lsn after re-reading the page in
> hash_kill_tems.

Okay, understood. I have done the changes to handle this type of
scenario. Please refer to the attached patches. Thanks.

>
> *
> + HashScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */
> +} HashScanPosData;
> ..
>
> HashScanPosItem *killedItems; /* tids and offset numbers of killed items */
>   int numKilled; /* number of currently stored items */
> +
> + /*
> + * Identify all the matching items on a page and save them
> + * in HashScanPosData
> + */
> + HashScanPosData currPos; /* current position data */
>  } HashScanOpaqueData;
>
>
> After having array of HashScanPosItems as currPos.items, I think you
> don't need array of HashScanPosItem for killedItems, just an integer
> array of index in currPos.items should be sufficient.

Corrected.

>
>
> *
> I think below para in README is not valid after this patch.
>
> "To keep concurrency reasonably good, we require readers to cope with
> concurrent insertions, which means that they have to be able to
> re-find
> their current scan position after re-acquiring the buffer content lock
> on page.  Since deletion is not possible while a reader holds the pin
> on bucket, and we assume that heap tuple TIDs are unique, this can be
> implemented by searching for the same heap tuple TID previously
> returned.  Insertion does not move index entries across pages, so the
> previously-returned index entry should always be on the same page, at
> the same or higher offset number,
> as it was before."

I have modified above paragraph in README file.

>
> *
> - next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
> -   LH_OVERFLOW_PAGE,
> -   bstrategy);
> -
>
>   /*
> - * release the lock on previous page after acquiring the lock on next
> - * page
> + * As the hash index scan work in page at a time mode,
> + * vacuum can release the lock on previous page before
> + * acquiring lock on the next page.
>   */
> ..
> + next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
> +   LH_OVERFLOW_PAGE,
> +   bstrategy);
> +
>
> After this change, you need to modify comments on top of function
> hashbucketcleanup() and _hash_squeezebucket().
>

Done.

Please note that these patches needs to be applied on top of  [1].

[1] - 
https://www.postgresql.org/message-id/CAE9k0P%3D3rtgUtxopG%2BXQVxgASFzAnGd9sNmYUaj_%3DKeVsKGUdA%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 52429bb8b8ecbacd499de51235c0396ab09b17d8 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Sun, 2 Apr 2017 03:38:00 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev6

Patch by Ashutosh
---
 src/backend/access/hash/README   |  25 +-
 src/backend/access/hash/hash.c   | 150 +++-
 src/backend/access/hash/hashpage.c   |  17 +-
 src/backend/access/hash/hashsearch.c | 450 +++
 src/backend/access/hash/hashutil.c   |  51 +++-
 src/include/access/hash.h|  48 +++-
 6 files changed, 552 insertions(+), 189 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..063656d 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+relea

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Ashutosh Sharma

On Apr 1, 2017 18:11, "Amit Kapila" <amit.kapil...@gmail.com> wrote:

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coe...@gmail.com>
wrote:
>> Well, That is another option but the main idea was to be inline with
>> the btree code.
>
> That's not a bad goal in principal, but _bt_killitems() doesn't have
> any similar argument.
>
> (Also, speaking of consistency, why did we end up with
> _hash_kill_items, with an underscore between kill and items, and
> _bt_killitems, without one?)
>
>> Moreover, I am not sure if acquiring lwlock inside
>> hashendscan (basically the place where we are trying to close down the
>> things) would look good.
>
> Well, if that's not a good thing to do, hiding it inside some other
> function doesn't make it better.  I think it's fine, though.
>

One thing to note here is that this fix won't be required if we get
the page-scan-mode patch [1] in this CF.  I think if we fix this with
the patch proposed by Ashutosh, then we anyway needs to again change
the related code (kind of revert the fix) after page-scan-mode patch.
Now, if you think we have negligible chance of getting that patch,
then it is good to proceed with this fix.

Yes, I had already mentioned this point in my very first mail. Thanks for
highlighting this once again.

[1] - https://commitfest.postgresql.org/13/999/

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-04-01 Thread Ashutosh Sharma

Hi Andreas,

On Apr 1, 2017 16:15, "Andreas Seltenreich" <seltenre...@gmx.de> wrote:

Andreas Seltenreich writes:
>>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> I got about one TRAP per hour when testing on 20 nodes with one postgres
> and 5 sqlsmithes on each.

> Ashutosh Sharma writes:
>> [2. reacquire_lock_hashkillitems_if_required.patch]

I've done about 12 hours of testing since applying this patch and no
failed assertions so far.

Thanks for testing my patch. Just FYI, I have slightly changed my patch
this morning as per Robert's suggestions. But, still more or less the logic
remains the same. Thank you once again.

Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-01 Thread Ashutosh Sharma

Hi Tomas,

On Fri, Mar 31, 2017 at 11:05 PM, Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:
> On 03/31/2017 06:01 PM, Ashutosh Sharma wrote:
>>
>>
>> It seems like the latest patch(v4) shared by Tomas upthread is an
>> empty patch. If I am not wrong, please share the correct patch.
>> Thanks.
>>
>
> OMG, this is the second time I managed to generate an empty patch. I
really
> need to learn not to do that ..
>
> Attached is v5 of the patch, hopefully correct this time ;-)

Yes, it was correct. I have spent some time on reviewing your patch today
and the review comments are as follows.

1. It would be good to have a blank line in between following set of lines
describing bt_page_print_tuples.

+/*
+ * bt_page_print_tuples
+ * form a tuple describing index tuple at a given offset
+ */

Something like this,

+/* -
+ * bt_page_print_tuples()
+ *
+ * Form a tuple describing index tuple at a given offset
+ * -
+ */

Please note that, if you do not agree with my suggestion, you may ignore it
:)

2. I think the following comments for bt_page_items needs to be moved to
the right place in the code.

/*-
 * bt_page_items()
 *
 * Get IndexTupleData set in a btree page
 *
 * Usage: SELECT * FROM bt_page_items('t1_pkey', 1);
 *-
 */

/*
 * cross-call data structure for SRF
 */

*struct user_args{Pagepage;OffsetNumber offset;};*

With your patch, above structure definition is followed by the definition
for bt_page_print_tuples() not bt_page_items(), hence you may need to put
the comments for bt_page_items just above it's definition.

3. I am seeing a server crash when running the sql statement 'SELECT * FROM
bt_page_items('bt_i4_index', 1);'. Here is the backtrace.

I think this crash is happening because in bt_page_items, without reading
opaque region of a page, you have added this if statement,

*if (P_ISMETA(opaque))ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),  errmsg("block
is a meta page")));*

Please pull back above if statement below the following line to get rid of
this crash.

opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);

OR, the best thing to do would be to retain the earlier if statement for
checking meta page because that way you can avoid reading a
buffer unnecessarily.

#0  0x7f592975030c in bt_page_items (fcinfo=0x7fffe5fdb860) at
btreefuncs.c:352
352 if (P_ISMETA(opaque))
Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.el7.x86_64
(gdb) bt
#0  0x7f592975030c in bt_page_items (fcinfo=0x7fffe5fdb860) at
btreefuncs.c:352
#1  0x006797e0 in ExecMakeTableFunctionResult (setexpr=0x21269f8,
econtext=0x2126738, argContext=0x2136c98, expectedDesc=0x2155178,
randomAccess=0 '\000') at execSRF.c:230
#2  0x00689dda in FunctionNext (node=0x2126628) at
nodeFunctionscan.c:94
#3  0x00678f0e in ExecScanFetch (node=0x2126628, accessMtd=0x689d23
, recheckMtd=0x68a108 ) at execScan.c:95
#4  0x00678f7d in ExecScan (node=0x2126628, accessMtd=0x689d23
, recheckMtd=0x68a108 ) at execScan.c:143

4. If you agree with the reason mentioned by me in comment #3, please
remove the following if statement from bt_page_items(),

*if (P_ISMETA(opaque))ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),  errmsg("block
is a meta page")));*

Apart from above comments, your patch looks good to me. I have also marked
this patch as 'Waiting for Author' in the commitfest. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: [HACKERS] Page Scan Mode in Hash Index

2017-04-01 Thread Ashutosh Sharma

Hi,

>
> On 03/29/2017 09:16 PM, Ashutosh Sharma wrote:
>>>
>>> This patch needs a rebase.
>>
>>
>> Please try applying these patches on top of [1]. I think you should be
>> able
>> to apply it cleanly. Sorry, I think I forgot to mention this point in my
>> earlier mail.
>>
>> [1] -
>>
>> https://www.postgresql.org/message-id/CAE9k0P%3DV2LhtyeMXd295fhisp%3DNWUhRVJ9EZQCDowWiY9rSohQ%40mail.gmail.com
>>
>
> Thanks, that works !
>
> As you have provided a patch for Robert's comments, and no other review have
> been posted I'm moving this patch to "Ready for Committer" for additional
> committer feedback.

Please find the attached new version of patches for page scan mode in
hash index rebased on top of - [1].

[1] - 
https://www.postgresql.org/message-id/CAE9k0P%3D3rtgUtxopG%2BXQVxgASFzAnGd9sNmYUaj_%3DKeVsKGUdA%40mail.gmail.com
From 498723199f4b14ff9917aca13abf30f9ea261ca7 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Sat, 1 Apr 2017 12:09:46 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev5

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 140 ++-
 src/backend/access/hash/hashpage.c   |  17 +-
 src/backend/access/hash/hashsearch.c | 441 +++
 src/backend/access/hash/hashutil.c   |  29 ++-
 src/include/access/hash.h|  44 
 6 files changed, 509 insertions(+), 171 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index b835f77..bd2827a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-		ItemPointerSetOffsetNumber(current, offnum);
-
-		/*
 		 * Check to see if we should kill the previously-fetched tuple.
 		 */
 		if (scan->kill_prior_tuple)
@@

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma

Hi,

On Sat, Apr 1, 2017 at 7:06 AM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>> Well, That is another option but the main idea was to be inline with
>>> the btree code.
>>
>> That's not a bad goal in principal, but _bt_killitems() doesn't have
>> any similar argument.
>
> It was there but later got removed with some patch (may be the patch
> for reducing pinning and buffer content lock for btree scans). The
> following commit has this changes.
>
> commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
> Author: Tom Lane <t...@sss.pgh.pa.us>
> Date:   Sun May 7 01:21:30 2006 +
>
>>
>> (Also, speaking of consistency, why did we end up with
>> _hash_kill_items, with an underscore between kill and items, and
>> _bt_killitems, without one?)
>
> That is just to follow the naming convention in hash.h Please check
> the review comments for this at - [1].
>
>>
>>> Moreover, I am not sure if acquiring lwlock inside
>>> hashendscan (basically the place where we are trying to close down the
>>> things) would look good.
>>
>> Well, if that's not a good thing to do, hiding it inside some other
>> function doesn't make it better.
>
> okay, agreed. I will submit the patch very shortly.

As suggested, I am now acquiring lock inside the caller function.
Attached is the patch having this changes. Thanks.

>
>
> [1] - 
> https://www.postgresql.org/message-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522%40redhat.com

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..b835f77 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -476,9 +476,17 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 
@@ -507,9 +515,17 @@ hashendscan(IndexScanDesc scan)
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma

On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Well, That is another option but the main idea was to be inline with
>> the btree code.
>
> That's not a bad goal in principal, but _bt_killitems() doesn't have
> any similar argument.

It was there but later got removed with some patch (may be the patch
for reducing pinning and buffer content lock for btree scans). The
following commit has this changes.

commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
Author: Tom Lane <t...@sss.pgh.pa.us>
Date:   Sun May 7 01:21:30 2006 +

>
> (Also, speaking of consistency, why did we end up with
> _hash_kill_items, with an underscore between kill and items, and
> _bt_killitems, without one?)

That is just to follow the naming convention in hash.h Please check
the review comments for this at - [1].

>
>> Moreover, I am not sure if acquiring lwlock inside
>> hashendscan (basically the place where we are trying to close down the
>> things) would look good.
>
> Well, if that's not a good thing to do, hiding it inside some other
> function doesn't make it better.

okay, agreed. I will submit the patch very shortly.


[1] - 
https://www.postgresql.org/message-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522%40redhat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma

On Sat, Apr 1, 2017 at 1:08 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Thanks for reporting this problem. Could you please let me know on for
>> how long did you run sqlsmith to get this crash. However, I have found
>> the reason for this crash. This is basically happening when trying to
>> retrieve the tuples using cursor. Basically the current hash index
>> scan work tuple-at-a-time which means once it finds tuple on page, it
>> releases lock from the page but keeps pin on it and finally returns
>> the tuple. When the requested number of tuples are processed there is
>> no lock on the page that was being scanned but yes there is a pin on
>> it. Finally, when trying to close a cursor at the end of scan, if any
>> killed tuples has been identified we try to first mark these items as
>> dead with the help of _hash_kill_items(). But, since we only have pin
>> on this page, the assert check 'LWLockHeldByMe()' fails.
>
> Instead of adding bool haveLock to _hash_kill_items, how about just
> having the caller acquire the lock before calling the function and
> release it afterwards?  Since the acquire is at the beginning of the
> function and the release at the end, there seems to be no point in
> putting the acquire/release inside the function rather than in the
> caller.

Well, That is another option but the main idea was to be inline with
the btree code. Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-31 Thread Ashutosh Sharma

Hi,

On Wed, Mar 29, 2017 at 8:38 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
>
>
> On 03/24/2017 04:27 AM, Peter Eisentraut wrote:
>>
>> On 3/17/17 18:35, Tomas Vondra wrote:
>>>
>>> On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
>>>>
>>>> I'm struggling to find a good way to share code between
>>>> bt_page_items(text, int4) and bt_page_items(bytea).
>>>>
>>>> If we do it via the SQL route, as I had suggested, it makes the
>>>> extension non-relocatable, and it will also create a bit of a mess
>>>> during upgrades.
>>>>
>>>> If doing it in C, it will be a bit tricky to pass the SRF context
>>>> around.  There is no "DirectFunctionCall within SRF context", AFAICT.
>>>
>>>
>>> Not sure what it has to do with DirectFunctionCall? You want to call the
>>> bytea variant from the existing one? Wouldn't it be easier to simply
>>> define a static function with the shared parts, and pass around the
>>> fctx/fcinfo? Not quite pretty, but should work.
>>
>>
>> Perhaps what was added in
>>
>> <http://git.postgresql.org/pg/commitdiff/29bf5016835a2c2c23786f7cda347716c083d95f>
>> would actually work here.
>>
>
> I've tried to refactor the code using this, but the result was rather ugly
> because (a) it really is quite tricky to pass around the contexts and (b)
> the sanity checks are quite different for the two input types, so mixing
> those parts (essentially the SRF_IS_FIRSTCALL bits) does not make much sense
> IMHO.
>
> The attached patch is the best I came up with - it essentially shares just
> the tuple-forming part, which is exactly the same in both cases.
>
> It also adds the P_ISMETA(opaque) check to the original function, which
> seems like a useful defense against page written to a different place (which
> is essentially the issue I was originally investigating).
>

It seems like the latest patch(v4) shared by Tomas upthread is an
empty patch. If I am not wrong, please share the correct patch.
Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] inconsistent page found on STANDBY server

2017-03-30 Thread Ashutosh Sharma

Hi All,

When running make installcheck on a master with wal consistency check
enabled, inconsistent page is detected on standby. I could see the
following FATAL message in the standby server logfile,

2017-03-30 07:31:10.101 BST [27994] LOG:  entering standby mode
2017-03-30 07:31:10.106 BST [27994] LOG:  redo starts at 0/224
2017-03-30 07:31:10.108 BST [27994] LOG:  consistent recovery state
reached at 0/2E4
2017-03-30 07:31:10.108 BST [27992] LOG:  database system is ready to
accept read only connections
2017-03-30 07:31:10.113 BST [27998] LOG:  started streaming WAL from
primary at 0/300 on timeline 1
2017-03-30 07:33:19.040 BST [27994] FATAL:  inconsistent page found,
rel 1663/13157/16391, forknum 0, blkno 0
2017-03-30 07:33:19.040 BST [27994] CONTEXT:  WAL redo at 0/351CF03C
for Hash/UPDATE_META_PAGE: ntuples -nan
2017-03-30 07:33:19.041 BST [27992] LOG:  startup process (PID 27994)
exited with exit code 1

Steps to reproduce:
===
1)PG v10 sources
2)Setup Master/SLAVE replication
3)run make installcheck on Master
4)Check database logs ,generated on SLAVE directory.

Please note that above issue is observed only on 32 bit LInux machine
and was offlist reported to me by Tushar Ahuja. Tushar also allowed me
to use his 32 bit Linux machine to analyse this problem. I also had a
small offlist discussion with Amit (included in this mail) when
analysing this problem.

RCA:

After debugging the hash index code for deletion, I could find that
while registering data for xlog record 'XLOG_HASH_UPDATE_META_PAGE' we
are not passing the correct length of data being registered and
therefore, data (xl_hash_update_meta_page) is not completely recorded
into the wal record.

Fix:
===
Attached patch fixes this issue.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..a5798bd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -691,7 +691,7 @@ loop_top:
 		xlrec.ntuples = metap->hashm_ntuples;
 
 		XLogBeginInsert();
-		XLogRegisterData((char *) , sizeof(SizeOfHashUpdateMetaPage));
+		XLogRegisterData((char *) , SizeOfHashUpdateMetaPage);
 
 		XLogRegisterBuffer(0, metabuf, REGBUF_STANDARD);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-29 Thread Ashutosh Sharma

I think you should consider refactoring this so that it doesn't need
>> to use goto.  Maybe move the while (offnum <= maxoff) logic into a
>> helper function and have it return itemIndex.  If itemIndex == 0, you
>> can call it again.
>>
>
> okay, Added a helper function for _hash_readpage(). Please check v4
> patch attached with this mail.
>
> >>

> This is not a full review, but I'm out of time for the moment.
>>
>
> No worries. I will be ready for your further review comments any time.
> Thanks for the review.
>
>
This patch needs a rebase.


Please try applying these patches on top of [1]. I think you should be able
to apply it cleanly. Sorry, I think I forgot to mention this point in my
earlier mail.

[1] -
https://www.postgresql.org/message-id/CAE9k0P%3DV2LhtyeMXd295fhisp%3DNWUhRVJ9EZQCDowWiY9rSohQ%40mail.gmail.com

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Ashutosh Sharma

On Mar 28, 2017 00:00, "Andreas Seltenreich" <seltenre...@gmx.de> wrote:

Ashutosh Sharma writes:
>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.  Nodes are tiny consumer machines with
low-power 4-core sandy bridges.

Okay. Thanks for sharing this information. I tried running running sqlsmith
on a single node. I ran it for an hour but didn't find any crash. Let me
also try running multiple sqlsmith at a time...May be 5 like you are doing
on a single node.

> [2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.

Sure, Thanks a lot.

With Regards,
Ashutosh Sharma

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Ashutosh Sharma

On Mar 27, 2017 22:25, "Robert Haas" <robertmh...@gmail.com> wrote:

On Fri, Mar 24, 2017 at 3:49 AM, Amit Kapila <amit.kapil...@gmail.com>
wrote:
> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma <ashu.coe...@gmail.com>
wrote:
>>>
>>> I think it would have been probably okay to use *int* for ntuples as
>>> that matches with what you are actually assigning in the function.
>>
>> okay, corrected it. Attached is newer version of patch.
>
> Thanks, this version looks good to me.

Committed.


Thank you.

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Ashutosh Sharma

>> >>
>> >> I think it would have been probably okay to use *int* for ntuples as
>> >> that matches with what you are actually assigning in the function.
>> >
>> > okay, corrected it. Attached is newer version of patch.
>> >
>>
>> Thanks, this version looks good to me.
>
>
> It solves the problem for me.

Great..Thanks for confirming.

I'd like to test that I get the right answer
> on the standby, not just the absence of a crash, but I don't know how to do
> that effectively.  Has anyone used the new wal replay block consistency tool
> on hash indexes since this microvacuum code was committed?

Yes, I have used it for hash index but I used it after below commit,

commit 9abbf4727de746222ad8fc15b17348065389ae43
Author: Robert Haas <rh...@postgresql.org>
Date:   Mon Mar 20 15:55:27 2017 -0400

Another fix for single-page hash index vacuum.

The WAL consistency checking code needed to be updated for the new
page status bit, but that didn't get done previously.

All I did was set 'wal_consistency_checking = 'hash'' in
postgresql.conf and ran test cases on primary server. If there is any
inconsistent block on standby the tool would probably terminate the
recovery process and you would see following message in the server
logfile.

"inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u"

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-27 Thread Ashutosh Sharma

Hi,

> I think you should consider refactoring this so that it doesn't need
> to use goto.  Maybe move the while (offnum <= maxoff) logic into a
> helper function and have it return itemIndex.  If itemIndex == 0, you
> can call it again.

okay, Added a helper function for _hash_readpage(). Please check v4
patch attached with this mail.

Notice that the code in the "else" branch of the
> if (itemIndex == 0) stuff could actually be removed from the else
> block without changing anything, because the "if" block always either
> returns or does a goto.  That's worthy of a little more work to try to
> make things simple and clear.

Corrected.

>
> + *  We find the first item(or, if backward scan, the last item) in
>
> Missing space.
>

Corrected.

> In _hash_dropscanbuf(), the handling of hashso_bucket_buf is now
> inconsistent with the handling of hashso_split_bucket_buf, which seems
> suspicious.

I have corrected it.

Suppose we enter this function with so->hashso_bucket_buf
> and so->currPos.buf both being valid buffers, but not the same one.
> It looks to me as if we'll release the first pin but not the second
> one.

Yes, that is because we no need to release pin on currPos.buf if it is
an overflow page. In page scan mode once we have saved all the
qualified tuples from a current page (could be an overflow page) into
items array we do release both pin and lock on a overflow page. This
was not true earlier, let us assume a case where we are supposed to
fetch only fixed number of tuples from a page using cursor and in such
a case once the number of tuples to be fetched is completed we simply
return with out releasing pin on a page being scanned. If this page is
an overflow page then by end of scan we will end up with pin held on
two buffers i.e. bucket buf and current buf which is basically
overflow buf.

My guess (which could be wrong) is that so->hashso_bucket_buf =
> InvalidBuffer should be moved back up higher in the function where it
> was before, just after the first if statement, and that the new
> condition so->hashso_bucket_buf == so->currPos.buf on the last
> _hash_dropbuf() should be removed.  If that's not right, then I think
> I need somebody to explain why not.

Okay, as i mentioned above, in case of page scan mode we only keep pin
on a bucket buf. There won't be any case where we will be having pin
on overflow buf at the end of scan. So, basically _hash_dropscan buf()
should only attempt to release pin on a bucket buf. And an attempt to
release pin on so->currPos.buf should only happen when
'so->hashso_bucket_buf == so->currPos.buf' or when
'so->hashso_split_bucket_buf == so->currPos.buf'

>
> I am suspicious that _hash_kill_items() is going to have problems if
> the overflow page is freed before it reacquires the lock.
> _btkillitems() contains safeguards against similar cases.

I have added similar check for hash_kill_items() as well.

>
> This is not a full review, but I'm out of time for the moment.

No worries. I will be ready for your further review comments any time.
Thanks for the review.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 3d0273f503d1645d6289bda78946a0af4b9e9f3a Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Mon, 27 Mar 2017 18:22:15 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev4

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 124 ++
 src/backend/access/hash/hashpage.c   |  17 +-
 src/backend/access/hash/hashsearch.c | 445 +++
 src/backend/access/hash/hashutil.c   |  42 +++-
 src/include/access/hash.h|  46 +++-
 6 files changed, 515 insertions(+), 168 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 0bacef8..bd2827a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ b

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Ashutosh Sharma

Hi,

> testing with master as of cf366e97ff, sqlsmith occasionally triggers the
> following assertion:
>
> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
>
> Backtraces always look like the one below.  It is reproducible on a
> cluster once it happens.  I could provide a tarball if needed.
>
> regards,
> Andreas
>
> #2  0x008324b1 in ExceptionalCondition 
> (conditionName=conditionName@entry=0x9e4e28 "!(LWLockHeldByMe(((LWLock*) 
> (&(bufHdr)->content_lock", errorType=errorType@entry=0x87b03d 
> "FailedAssertion", fileName=fileName@entry=0x9e5856 "bufmgr.c", 
> lineNumber=lineNumber@entry=3397) at assert.c:54
> #3  0x00706971 in MarkBufferDirtyHint (buffer=2844, 
> buffer_std=buffer_std@entry=1 '\001') at bufmgr.c:3397
> #4  0x004b3ecd in _hash_kill_items (scan=scan@entry=0x66dcf70) at 
> hashutil.c:514
> #5  0x004a9c1b in hashendscan (scan=0x66dcf70) at hash.c:512
> #6  0x004cf17a in index_endscan (scan=0x66dcf70) at indexam.c:353
> #7  0x0061fa51 in ExecEndIndexScan (node=0x3093f30) at 
> nodeIndexscan.c:852
> #8  0x00608e59 in ExecEndNode (node=) at 
> execProcnode.c:715
> #9  0x006045b8 in ExecEndPlan (estate=0x3064000, planstate= out>) at execMain.c:1540
> #10 standard_ExecutorEnd (queryDesc=0x30cb880) at execMain.c:487
> #11 0x005c87b0 in PortalCleanup (portal=0x1a60060) at portalcmds.c:302
> #12 0x0085cbb3 in PortalDrop (portal=0x1a60060, 
> isTopCommit=) at portalmem.c:489
> #13 0x00736ed2 in exec_simple_query (query_string=0x315b7a0 "...") at 
> postgres.c:
> #14 0x00738b51 in PostgresMain (argc=, 
> argv=argv@entry=0x1a6c6c8, dbname=, username=) 
> at postgres.c:4071
> #15 0x00475fef in BackendRun (port=0x1a65b90) at postmaster.c:4317
> #16 BackendStartup (port=0x1a65b90) at postmaster.c:3989
> #17 ServerLoop () at postmaster.c:1729
> #18 0x006c8662 in PostmasterMain (argc=argc@entry=4, 
> argv=argv@entry=0x1a3f540) at postmaster.c:1337
> #19 0x0047729d in main (argc=4, argv=0x1a3f540) at main.c:228
>


Hi,

Thanks for reporting this problem. Could you please let me know on for
how long did you run sqlsmith to get this crash. However, I have found
the reason for this crash. This is basically happening when trying to
retrieve the tuples using cursor. Basically the current hash index
scan work tuple-at-a-time which means once it finds tuple on page, it
releases lock from the page but keeps pin on it and finally returns
the tuple. When the requested number of tuples are processed there is
no lock on the page that was being scanned but yes there is a pin on
it. Finally, when trying to close a cursor at the end of scan, if any
killed tuples has been identified we try to first mark these items as
dead with the help of _hash_kill_items(). But, since we only have pin
on this page, the assert check 'LWLockHeldByMe()' fails.

When scanning tuples using normal SELECT * statement, before moving to
next page in a bucket we first deal with all the killed items but we
do this without releasing lock and pin on the current page. Hence,
with SELECT queries this crash is not visible.

The attached patch fixes this. But, please note that all these changes
will get removed with the patch for page scan mode - [1].

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobYTvexcjqMhXoNCyEUHChzmdC_2xVGgj7eqaYVgoJA%2Bg%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..0bacef8 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -478,7 +478,7 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
@@ -509,7 +509,7 @@ hashendscan(IndexScanDesc scan)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, false);
 
 	_hash_dropscanbuf(rel, so);
 
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 2d92049..414cc6a 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -467,7 +467,7 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
 
 	/* Before leaving current page, deal with any killed items */
 	if (so->numKilled > 0)
-		_hash_kill_items(scan);
+		_hash_kill_items(scan, true);
 
 	/*
 	 * ran off the end of this page, try the next
@@ -524,7

Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-25 Thread Ashutosh Sharma

>>>>>>
>>>>>> +1.  If we consider some more names for that column then probably one
>>>>>> alternative could be "empty pages".
>>>>>
>>>>> Yeah, but I think "unused" might be better.  Because a page could be
>>>>> in use (as an overflow page or primary bucket page) and still be
>>>>> empty.
>>>>>
>>>>
>>>> Based on the earlier discussions, I have prepared a patch that would
>>>> allow pgstathashindex() to show the number of unused pages in hash
>>>> index. Please find the attached patch for the same. Thanks.
>>>>
>>>
>>>   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
>>>   stats.bitmap_pages++;
>>> + else if (PageIsEmpty(page))
>>> + stats.unused_pages++;
>>>
>>> I think having this check after PageIsNew() makes more sense then
>>> having at the place where you currently have,
>>
>> I don't think having it just below PageIsNew() check would be good.
>> The reason being, there can be a bucket page which is in use but still
>> be empty. So, if we add a check just below PageIsNew check then even
>> though a page is marked as bucket page and is empty it will be shown
>> as unused page which i feel is not correct. Here is one simple example
>> that illustrates this point,
>>
>
> oh, is it a page where all the items have been deleted and no new
> items have been inserted?

Yes, it is a page from where items have been removed and no new
insertion has happened.

The reason why I have told that place is
> not appropriate is because all the other checks in near by code are on
> the page type and this check looks odd at that place, but we might
> need to do this way if there is no other clean solution.

I got your point but then i think that is the only one solution we
have as of now.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] comments in hash_alloc_buckets

2017-03-25 Thread Ashutosh Sharma

>> While working on - [1], I realised that the following comments in
>> _hash_alloc_buckets() needs to be corrected.
>>
>> /*
>>  * Initialize the freed overflow page.  Just zeroing the page won't work,
>>  * See _hash_freeovflpage for similar usage.
>>  */
>> _hash_pageinit(page, BLCKSZ);
>>
>> Attached is the patch that corrects above comment. Thanks.
>>
>
> - * Initialize the freed overflow page.  Just zeroing the page won't work,
> + * Initialize the last page in hash index.
>
> I think saying ".. last page in hash index" sounds slightly awkward as
> this is the last page for current split point, how about just
> "Initialize the page. ..."

Yes, I mean just adding "Initialize the page. ..." looks more simple
and correct. Attached is the patch with similar comment.
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 622cc4b..4fd9cbc 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -1002,7 +1002,7 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
 	page = (Page) zerobuf;
 
 	/*
-	 * Initialize the freed overflow page.  Just zeroing the page won't work,
+	 * Initialize the page.  Just zeroing the page won't work,
 	 * See _hash_freeovflpage for similar usage.
 	 */
 	_hash_pageinit(page, BLCKSZ);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-25 Thread Ashutosh Sharma

On Sat, Mar 25, 2017 at 11:02 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 11:24 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>> Hi,
>>
>> On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas <robertmh...@gmail.com> wrote:
>>> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapil...@gmail.com> 
>>> wrote:
>>>>> Maybe we should call them "unused pages".
>>>>
>>>> +1.  If we consider some more names for that column then probably one
>>>> alternative could be "empty pages".
>>>
>>> Yeah, but I think "unused" might be better.  Because a page could be
>>> in use (as an overflow page or primary bucket page) and still be
>>> empty.
>>>
>>
>> Based on the earlier discussions, I have prepared a patch that would
>> allow pgstathashindex() to show the number of unused pages in hash
>> index. Please find the attached patch for the same. Thanks.
>>
>
>   else if (opaque->hasho_flag & LH_BITMAP_PAGE)
>   stats.bitmap_pages++;
> + else if (PageIsEmpty(page))
> + stats.unused_pages++;
>
> I think having this check after PageIsNew() makes more sense then
> having at the place where you currently have,

I don't think having it just below PageIsNew() check would be good.
The reason being, there can be a bucket page which is in use but still
be empty. So, if we add a check just below PageIsNew check then even
though a page is marked as bucket page and is empty it will be shown
as unused page which i feel is not correct. Here is one simple example
that illustrates this point,

Query:
==
select hash_page_type(get_raw_page('con_hash_index', 17));

gdb shows following info for this block number 17,

(gdb) p *(PageHeader)page
$1 = {pd_lsn = {xlogid = 0, xrecoff = 122297104}, pd_checksum = 0,
pd_flags = 0, pd_lower = 24,pd_upper = 8176, pd_special = 8176,
  pd_pagesize_version = 8196, pd_prune_xid = 0,pd_linp = 0x22a82d8}

(gdb) p((HashPageOpaque) PageGetSpecialPointer(page))->hasho_flag
$2 = 66

(gdb) p(((HashPageOpaque) PageGetSpecialPointer(page))->hasho_flag
& LH_BUCKET_PAGE)
$3 = 2

>From above information it is clear that this page is a bucket page and
is empty. I think it should be shown as bucket page rather than unused
page. Also,  this has already been discussed in [1].

other than that
> code-wise your patch looks okay, although I haven't tested it.
>

I have tested it properly and didn't find any issues.

> I think this should also be tracked under PostgreSQL 10 open items,
> but as this is not directly a bug, so not sure, what do others think?

Well, Even I am not sure if this has to be added under open items list
or not. Thanks.

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmobwLKpKe99qnTJCBzFB4UaVGKrLNX3hwp8wcxObMDx7pA%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect and hash indexes

2017-03-24 Thread Ashutosh Sharma

 Thanks for reviewing my patch. I have removed the extra white space.
 Attached are both the patches.
>>>
>>> Sorry, I have mistakenly attached wrong patch. Here are the correct
>>> set of patches.
>>
>> The latest version of patches looks fine to me.
>
> I don't really like 0002.  What about this, instead?
>
> --- a/contrib/pageinspect/hashfuncs.c
> +++ b/contrib/pageinspect/hashfuncs.c
> @@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
>  /* Check that page type is sane. */
>  pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
>  if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
> -pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
> +pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
> +pagetype != LH_UNUSED_PAGE)
>  ereport(ERROR,
>  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>   errmsg("invalid hash page type %08x", pagetype)));
>
> The advantage of that is (1) it won't get confused if in the future we
> have an unused page that has some flag bit not in LH_PAGE_TYPE set,
> and (2) if in the future we want to add any other checks to this
> function which should apply to unused pages also, they won't get
> bypassed by an early return statement.

Agreed. Moreover, previous approach might even change the current
behaviour of functions like hash_page_items() and hash_page_stats()
basically when we pass UNUSED pages to these functions. Attached is
the newer version of patch.
From 5c54fa58895cd279ff44424a3eb1feafdaca69d5 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Sat, 25 Mar 2017 01:02:35 +0530
Subject: [PATCH] Allow pageinspect to handle UNUSED hash pages v3

---
 contrib/pageinspect/hashfuncs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c
index 812a03f..2632287 100644
--- a/contrib/pageinspect/hashfuncs.c
+++ b/contrib/pageinspect/hashfuncs.c
@@ -80,7 +80,8 @@ verify_hash_page(bytea *raw_page, int flags)
 	/* Check that page type is sane. */
 	pagetype = pageopaque->hasho_flag & LH_PAGE_TYPE;
 	if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
-		pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
+		pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE &&
+		pagetype != LH_UNUSED_PAGE)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid hash page type %08x", pagetype)));
-- 
1.8.3.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

2017-03-24 Thread Ashutosh Sharma

Hi,

On Fri, Mar 24, 2017 at 9:00 PM, David Steele <da...@pgmasters.net> wrote:
> Hi Ashutosh,
>
> On 3/22/17 8:52 AM, Amit Kapila wrote:
>>
>> On Wed, Mar 22, 2017 at 12:07 AM, David Steele <da...@pgmasters.net>
>> wrote:
>>>
>>>
>>> Amit, Magnus, you are signed up as reviewers for this patch.  Do you know
>>> when you'll have a chance to take a look?
>>>
>>
>> I have provided my feedback and I could not test it on my machine.  I
>> think Ashutosh Sharma has tested it.  I can give it another look, but
>> not sure if it helps.
>
>
> I know you are not signed up as a reviewer, but have you take a look at this
> patch?

Well, I had a quick look into the patch just because I wanted the test
it as I am having windows setup. But, I never looked into the patch
from reviewer's perspective. The intention was to reverify the test
results shared by Tsunawaka.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-24 Thread Ashutosh Sharma

>>> > I think this will work, but not sure if there is a merit to deviate
>>> > from what btree does to handle this case.   One thing I find slightly
>>> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
>>> > using a number of tuples registered as part of fixed data
>>> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
>>> > I think it will be better if we register offsets also in fixed part of
>>> > data as we are doing btree case.
>>
>> Agreed. I have made the changes accordingly. Please check attached v2 patch.
>>
>
> Changes look good to me.   I think you can modify the comments in
> structure xl_hash_vacuum_one_page to mention "TARGET OFFSET NUMBERS
> FOLLOW AT THE END"
>

Added the comment in xl_hash_vacuum_one_page structure.

>>>
>>> >
>>> >
>>>
>>> Also another small point in this regard, do we need two separate
>>> variables to track number of deleted items in below code?  I think one
>>> variable is sufficient.
>>>
>>> _hash_vacuum_one_page()
>>> {
>>> ..
>>> deletable[ndeletable++] = offnum;
>>> tuples_removed += 1;--
>>>
>>
>> Yes, I think 'ndeletable' alone should be fine.
>>
>
> I think it would have been probably okay to use *int* for ntuples as
> that matches with what you are actually assigning in the function.

okay, corrected it. Attached is newer version of patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index da4c2c5..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -966,8 +966,6 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	OffsetNumber	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	int		i;
-	char *ptr;
-	Size len;
 
 	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
 
@@ -986,12 +984,20 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 		return latestRemovedXid;
 
 	/*
+	 * Check if WAL replay has reached a consistent database state. If not,
+	 * we must PANIC. See the definition of btree_xlog_delete_get_latestRemovedXid
+	 * for more details.
+	 */
+	if (!reachedConsistency)
+		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
+
+	/*
 	 * Get index page.  If the DB is consistent, this should not fail, nor
 	 * should any of the heap page fetches below.  If one does, we return
 	 * InvalidTransactionId to cancel all HS transactions.  That's probably
 	 * overkill, but it's safe, and certainly better than panicking here.
 	 */
-	XLogRecGetBlockTag(record, 1, , NULL, );
+	XLogRecGetBlockTag(record, 0, , NULL, );
 	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
 
 	if (!BufferIsValid(ibuffer))
@@ -1003,9 +1009,7 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	 * Loop through the deleted index items to obtain the TransactionId from
 	 * the heap items they point to.
 	 */
-	ptr = XLogRecGetBlockData(record, 1, );
-
-	unused = (OffsetNumber *) ptr;
+	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
 
 	for (i = 0; i < xlrec->ntuples; i++)
 	{
@@ -1130,23 +1134,15 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		char *ptr;
-		Size len;
-
-		ptr = XLogRecGetBlockData(record, 0, );
-
 		page = (Page) BufferGetPage(buffer);
 
-		if (len > 0)
+		if (XLogRecGetDataLen(record) > SizeOfHashVacuumOnePage)
 		{
 			OffsetNumber *unused;
-			OffsetNumber *unend;
 
-			unused = (OffsetNumber *) ptr;
-			unend = (OffsetNumber *) ((char *) ptr + len);
+			unused = (OffsetNumber *) ((char *) xldata + SizeOfHashVacuumOnePage);
 
-			if ((unend - unused) > 0)
-PageIndexMultiDelete(page, unused, unend - unused);
+			PageIndexMultiDelete(page, unused, xldata->ntuples);
 		}
 
 		/*
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 8640e85..8699b5b 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -344,7 +344,6 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 	Page	page = BufferGetPage(buf);
 	HashPageOpaque	pageopaque;
 	HashMetaPage	metap;
-	double tuples_removed = 0;
 
 	/* Scan each tuple in page to see if it is marked as LP_DEAD */
 	maxoff = PageGetMaxOffsetNumber(page);
@@ -355,10 +354,7 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 		ItemId	itemId = PageGetItemId(page, offnum);
 
 		if (ItemIdIsDead(itemId))
-		{
 			deletable[ndeletable++] = offnum;
-			tuples_removed += 1;
-		}
 	}
 
 	if (ndeletable > 0)
@@

Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma

On Fri, Mar 24, 2017 at 9:46 AM, Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
>> wrote:
>>>>
>>>> Oh, okay, but my main objection was that we should not check hash page
>>>> type (hasho_flag) without ensuring whether it is a hash page.  Can you
>>>> try to adjust the above code so that this check can be moved after
>>>> hasho_page_id check?
>>>
>>> Yes, I got your point. I have done that but then i had to remove the
>>> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
>>> only be true for one page in entire hash index table and can be
>>> ignored. If you wish, I could mention about it in the documentation.
>>>
>>
>> Yeah, I think it is worth adding a Note in docs, but we can do that
>> separately if required.
>>
>>>>
>>>>> To avoid
>>>>> this, at the start of verify_hash_page function itself if we recognise
>>>>> page as UNUSED page we return immediately.
>>>>>
>>>>>>
>>>>>> 2.
>>>>>> + /* Check if it is an empty hash page. */
>>>>>> + if (PageIsEmpty(page))
>>>>>> + ereport(ERROR,
>>>>>> + (errcode(ERRCODE_INDEX_CORRUPTED),
>>>>>> + errmsg("index table contains empty page")));
>>>>>>
>>>>>>
>>>>>> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
>>>>>> it better that the same error message can be given for both of them as
>>>>>> from user perspective there is not much difference between both the
>>>>>> messages?
>>>>>
>>>>> I think we should show separate message because they are two different
>>>>> type of pages. In the sense like, one is initialised whereas other is
>>>>> completely zero.
>>>>>
>>>>
>>>> I understand your point, but not sure if it makes any difference to user.
>>>>
>>>
>>> okay, I have now anyways removed the check for PageIsEmpty. Please
>>> refer to the attached '0002
>>> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>>>
>>
>> +
>> if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
>>     ereport(ERROR,
>>
>> spurious white space.
>>
>>>
>>> Also, I have attached
>>> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
>>> handles your comment mentioned in [1].
>>>
>>
>> In general, we have to initialize prevblock with max_bucket, but here
>> it is okay, because we anyway initialize it after this page is
>> allocated as overflow page.
>>
>> Your patches look good to me except for small white space change.
>
> Thanks for reviewing my patch. I have removed the extra white space.
> Attached are both the patches.

Sorry, I have mistakenly attached wrong patch. Here are the correct
set of patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@

Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma

On Fri, Mar 24, 2017 at 9:21 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Thu, Mar 23, 2017 at 6:46 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> wrote:
>>>
>>> Oh, okay, but my main objection was that we should not check hash page
>>> type (hasho_flag) without ensuring whether it is a hash page.  Can you
>>> try to adjust the above code so that this check can be moved after
>>> hasho_page_id check?
>>
>> Yes, I got your point. I have done that but then i had to remove the
>> check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
>> only be true for one page in entire hash index table and can be
>> ignored. If you wish, I could mention about it in the documentation.
>>
>
> Yeah, I think it is worth adding a Note in docs, but we can do that
> separately if required.
>
>>>
>>>> To avoid
>>>> this, at the start of verify_hash_page function itself if we recognise
>>>> page as UNUSED page we return immediately.
>>>>
>>>>>
>>>>> 2.
>>>>> + /* Check if it is an empty hash page. */
>>>>> + if (PageIsEmpty(page))
>>>>> + ereport(ERROR,
>>>>> + (errcode(ERRCODE_INDEX_CORRUPTED),
>>>>> + errmsg("index table contains empty page")));
>>>>>
>>>>>
>>>>> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
>>>>> it better that the same error message can be given for both of them as
>>>>> from user perspective there is not much difference between both the
>>>>> messages?
>>>>
>>>> I think we should show separate message because they are two different
>>>> type of pages. In the sense like, one is initialised whereas other is
>>>> completely zero.
>>>>
>>>
>>> I understand your point, but not sure if it makes any difference to user.
>>>
>>
>> okay, I have now anyways removed the check for PageIsEmpty. Please
>> refer to the attached '0002
>> allow_pageinspect_handle_UNUSED_hash_pages.patch'
>>
>
> +
> if (pageopaque->hasho_page_id != HASHO_PAGE_ID)
> ereport(ERROR,
>
> spurious white space.
>
>>
>> Also, I have attached
>> '0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
>> handles your comment mentioned in [1].
>>
>
> In general, we have to initialize prevblock with max_bucket, but here
> it is okay, because we anyway initialize it after this page is
> allocated as overflow page.
>
> Your patches look good to me except for small white space change.

Thanks for reviewing my patch. I have removed the extra white space.
Attached are both the patches.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma

> Hi,
>
> On 03/23/2017 02:11 PM, Ashutosh Sharma wrote:
>>
>> On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
>> <jesper.peder...@redhat.com> wrote:
>>>
>>> 0001v2:
>>>
>>> In hashgettuple() you can remove the 'currItem' and 'offnum' from the
>>> 'else'
>>> part, and do the assignment inside
>>>
>>>   if (so->numKilled < MaxIndexTuplesPerPage)
>>>
>>> instead.
>>>
>>
>> Done. Please have a look into the attached v3 patch.
>>
>>>
>>> No new comments for 0002 and 0003.
>>
>>
>> okay. Thanks.
>>
>
> I'll keep the entry in 'Needs Review' if Alexander, or others, want to add
> their feedback.

okay. Thanks.

>
> (Best to post the entire patch series each time)

I take your suggestion. Please find the attachments.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-			if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
-break;
-		}
-		if (offnum > maxoffnum)
-			elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
-		ItemPointerSetOffsetNumber(current, offnum);
-
-		/*
 		 * Check to see if we should kill the previously-fetc

Re: [HACKERS] Page Scan Mode in Hash Index

2017-03-23 Thread Ashutosh Sharma

On Thu, Mar 23, 2017 at 8:29 PM, Jesper Pedersen
<jesper.peder...@redhat.com> wrote:
> Hi,
>
> On 03/22/2017 09:32 AM, Ashutosh Sharma wrote:
>>
>> Done. Please refer to the attached v2 version of patch.
>>
>
> Thanks.
>
>>>> 1) 0001-Rewrite-hash-index-scans-to-work-a-page-at-a-time.patch: this
>>>> patch rewrites the hash index scan module to work in page-at-a-time
>>>> mode. It basically introduces two new functions-- _hash_readpage() and
>>>> _hash_saveitem(). The former is used to load all the qualifying tuples
>>>> from a target bucket or overflow page into an items array. The latter
>>>> one is used by _hash_readpage to save all the qualifying tuples found
>>>> in a page into an items array. Apart from that, this patch bascially
>>>> cleans _hash_first(), _hash_next and hashgettuple().
>>>>
>
> 0001v2:
>
> In hashgettuple() you can remove the 'currItem' and 'offnum' from the 'else'
> part, and do the assignment inside
>
>   if (so->numKilled < MaxIndexTuplesPerPage)
>
> instead.
>

Done. Please have a look into the attached v3 patch.

>
> No new comments for 0002 and 0003.

okay. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 4e953c35da2274165b00d763500b83e0f3f9e2a9 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 23:36:05 +0530
Subject: [PATCH] Rewrite hash index scans to work a page at a timev3

Patch by Ashutosh Sharma
---
 src/backend/access/hash/README   |   9 +-
 src/backend/access/hash/hash.c   | 121 +++--
 src/backend/access/hash/hashpage.c   |  14 +-
 src/backend/access/hash/hashsearch.c | 330 ++-
 src/backend/access/hash/hashutil.c   |  23 ++-
 src/include/access/hash.h|  44 +
 6 files changed, 385 insertions(+), 156 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 1541438..f0a7bdf 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -243,10 +243,11 @@ The reader algorithm is:
 -- then, per read request:
 	reacquire content lock on current page
 	step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
-	get tuple
-	release content lock
+ the pin on the primary bucket throughout the scan)
+save all the matching tuples from current index page into an items array
+release pin and content lock (but if it is primary bucket page retain
+it's pin till the end of scan)
+get tuple from an item array
 -- at scan shutdown:
 	release all pins still held
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..8c28fbd 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,23 @@ bool
 hashgettuple(IndexScanDesc scan, ScanDirection dir)
 {
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
-	Relation	rel = scan->indexRelation;
-	Buffer		buf;
-	Page		page;
 	OffsetNumber offnum;
-	ItemPointer current;
 	bool		res;
+	HashScanPosItem	*currItem;
 
 	/* Hash indexes are always lossy since we store only the hash code */
 	scan->xs_recheck = true;
 
 	/*
-	 * We hold pin but not lock on current buffer while outside the hash AM.
-	 * Reacquire the read lock here.
-	 */
-	if (BufferIsValid(so->hashso_curbuf))
-		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
-	/*
 	 * If we've already initialized this scan, we can just advance it in the
 	 * appropriate direction.  If we haven't done so yet, we call a routine to
 	 * get the first item in the scan.
 	 */
-	current = &(so->hashso_curpos);
-	if (ItemPointerIsValid(current))
+	if (!HashScanPosIsValid(so->currPos))
+		res = _hash_first(scan, dir);
+	else
 	{
 		/*
-		 * An insertion into the current index page could have happened while
-		 * we didn't have read lock on it.  Re-find our position by looking
-		 * for the TID we previously returned.  (Because we hold a pin on the
-		 * primary bucket page, no deletions or splits could have occurred;
-		 * therefore we can expect that the TID still exists in the current
-		 * index page, at an offset >= where we were.)
-		 */
-		OffsetNumber maxoffnum;
-
-		buf = so->hashso_curbuf;
-		Assert(BufferIsValid(buf));
-		page = BufferGetPage(buf);
-
-		/*
-		 * We don't need test for old snapshot here as the current buffer is
-		 * pinned, so vacuum can't clean the page.
-		 */
-		maxoffnum = PageGetMaxOffsetNumber(page);
-		for (offnum = ItemPointerGetOffsetNumber(current);
-			 offnum <= maxoffnum;
-			 offnum = OffsetNumberNext(offnum))
-		{
-			IndexTuple	itup;
-
-			itup = (IndexTuple) PageGetItem(page, PageGetItemI

Re: [HACKERS] Add pgstathashindex() to get hash index table statistics.

2017-03-23 Thread Ashutosh Sharma

Hi,

On Tue, Feb 7, 2017 at 9:23 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Feb 6, 2017 at 10:40 PM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>>> Maybe we should call them "unused pages".
>>
>> +1.  If we consider some more names for that column then probably one
>> alternative could be "empty pages".
>
> Yeah, but I think "unused" might be better.  Because a page could be
> in use (as an overflow page or primary bucket page) and still be
> empty.
>

Based on the earlier discussions, I have prepared a patch that would
allow pgstathashindex() to show the number of unused pages in hash
index. Please find the attached patch for the same. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From e3b59fa85f16d6d15be5360e85b7faf63e8683a9 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 23:02:26 +0530
Subject: [PATCH] Allow pgstathashindex to show unused pages v1

---
 contrib/pgstattuple/expected/pgstattuple.out  | 12 ++--
 contrib/pgstattuple/pgstatindex.c | 19 ---
 contrib/pgstattuple/pgstattuple--1.4--1.5.sql |  1 +
 doc/src/sgml/pgstattuple.sgml | 17 -
 4 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out
index 2c3515b..1f1ff46 100644
--- a/contrib/pgstattuple/expected/pgstattuple.out
+++ b/contrib/pgstattuple/expected/pgstattuple.out
@@ -132,9 +132,9 @@ select * from pgstatginindex('test_ginidx');
 
 create index test_hashidx on test using hash (b);
 select * from pgstathashindex('test_hashidx');
- version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | live_items | dead_items | free_percent 
--+--++--++++--
-   2 |4 |  0 |1 |  0 |  0 |  0 |  100
+ version | bucket_pages | overflow_pages | bitmap_pages | zero_pages | unused_pages | live_items | dead_items | free_percent 
+-+--++--++--+++--
+   2 |4 |  0 |1 |  0 |0 |  0 |  0 |  100
 (1 row)
 
 -- these should error with the wrong type
@@ -233,9 +233,9 @@ select pgstatindex('test_partition_idx');
 (1 row)
 
 select pgstathashindex('test_partition_hash_idx');
-   pgstathashindex   
--
- (2,8,0,1,0,0,0,100)
+pgstathashindex
+---
+ (2,8,0,1,0,0,0,0,100)
 (1 row)
 
 drop table test_partitioned;
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index d448e9e..6fc41d6 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -120,6 +120,7 @@ typedef struct HashIndexStat
 	BlockNumber overflow_pages;
 	BlockNumber bitmap_pages;
 	BlockNumber zero_pages;
+	BlockNumber unused_pages;
 
 	int64	live_items;
 	int64	dead_items;
@@ -588,8 +589,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	BufferAccessStrategy bstrategy;
 	HeapTuple	tuple;
 	TupleDesc	tupleDesc;
-	Datum		values[8];
-	bool		nulls[8];
+	Datum		values[9];
+	bool		nulls[9];
 	Buffer		metabuf;
 	HashMetaPage	metap;
 	float8		free_percent;
@@ -667,6 +668,8 @@ pgstathashindex(PG_FUNCTION_ARGS)
 			}
 			else if (opaque->hasho_flag & LH_BITMAP_PAGE)
 stats.bitmap_pages++;
+			else if (PageIsEmpty(page))
+stats.unused_pages++;
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INDEX_CORRUPTED),
@@ -680,8 +683,9 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	/* Done accessing the index */
 	index_close(rel, AccessShareLock);
 
-	/* Count zero pages as free space. */
-	stats.free_space += stats.zero_pages * stats.space_per_page;
+	/* Count zero and unused pages as free space. */
+	stats.free_space += (stats.zero_pages + stats.unused_pages) *
+		 stats.space_per_page;
 
 	/*
 	 * Total space available for tuples excludes the metapage and the bitmap
@@ -711,9 +715,10 @@ pgstathashindex(PG_FUNCTION_ARGS)
 	values[2] = Int64GetDatum((int64) stats.overflow_pages);
 	values[3] = Int64GetDatum((int64) stats.bitmap_pages);
 	values[4] = Int64GetDatum((int64) stats.zero_pages);
-	values[5] = Int64GetDatum(stats.live_items);
-	values[6] = Int64GetDatum(stats.dead_items);
-	values[7] = Float8GetDatum(free_percent);
+	values[5] = Int64GetDatum((int64) stats.unused_pages);
+	values[6] = Int64GetDatum(stats.live_items);
+	values[7] = Int64GetDatum(stats.dead_items);
+	values[8] = Float8GetDatum(free_percent);
 	tuple = heap_form_tuple(tupleDesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(tuple));
diff --git a/contrib/pgstattuple/pgstattuple--1.4--1.5.sql b/contrib/pgstattuple/pgstattuple--

Re: [HACKERS] pageinspect and hash indexes

2017-03-23 Thread Ashutosh Sharma

Hi Amit,

>>> +
>>> + /* Check if it is an unused hash page. */
>>> + if (pageopaque->hasho_flag == LH_UNUSED_PAGE)
>>> + return page;
>>>
>>>
>>> I don't see we need to have a separate check for unused page, why
>>> can't it be checked with other page types in below check.
>>>
>>> if (pagetype != LH_OVERFLOW_PAGE && pagetype != LH_BUCKET_PAGE &&
>>> pagetype != LH_BITMAP_PAGE && pagetype != LH_META_PAGE)
>>
>> That is because UNUSED page is as good as an empty page except that
>> empty page doesn't have any pagetype. If we add condition for checking
>> UNUSED page in above if check it will never show unused page as an
>> unsed page rather it will always show it as an empty page.
>>
>
> Oh, okay, but my main objection was that we should not check hash page
> type (hasho_flag) without ensuring whether it is a hash page.  Can you
> try to adjust the above code so that this check can be moved after
> hasho_page_id check?

Yes, I got your point. I have done that but then i had to remove the
check for PageIsEmpty(). Anyways, I think PageIsEmpty() condition will
only be true for one page in entire hash index table and can be
ignored. If you wish, I could mention about it in the documentation.

>
>> To avoid
>> this, at the start of verify_hash_page function itself if we recognise
>> page as UNUSED page we return immediately.
>>
>>>
>>> 2.
>>> + /* Check if it is an empty hash page. */
>>> + if (PageIsEmpty(page))
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INDEX_CORRUPTED),
>>> + errmsg("index table contains empty page")));
>>>
>>>
>>> Do we want to give a separate message for EMPTY and NEW pages?  Isn't
>>> it better that the same error message can be given for both of them as
>>> from user perspective there is not much difference between both the
>>> messages?
>>
>> I think we should show separate message because they are two different
>> type of pages. In the sense like, one is initialised whereas other is
>> completely zero.
>>
>
> I understand your point, but not sure if it makes any difference to user.
>

okay, I have now anyways removed the check for PageIsEmpty. Please
refer to the attached '0002
allow_pageinspect_handle_UNUSED_hash_pages.patch'


Also, I have attached
'0001-Mark-freed-overflow-page-as-UNUSED-pagetype-v2.patch' that
handles your comment mentioned in [1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BVE_TDRLWpyeOf%2B7%2B6if68kgPNwO4guKo060rm_t3O5w%40mail.gmail.com

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 86eb711ea80b495c9d2f5228558682d0e95c41eb Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Thu, 23 Mar 2017 16:47:17 +0530
Subject: [PATCH] Mark freed overflow page as UNUSED pagetype v2

---
 src/backend/access/hash/hash_xlog.c |  9 +
 src/backend/access/hash/hashovfl.c  | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index d9ac42c..2ccaf46 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -697,11 +697,20 @@ hash_xlog_squeeze_page(XLogReaderState *record)
 	if (XLogReadBufferForRedo(record, 2, ) == BLK_NEEDS_REDO)
 	{
 		Page		ovflpage;
+		HashPageOpaque ovflopaque;
 
 		ovflpage = BufferGetPage(ovflbuf);
 
 		_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
 
+		ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+		ovflopaque->hasho_prevblkno = InvalidBlockNumber;
+		ovflopaque->hasho_nextblkno = InvalidBlockNumber;
+		ovflopaque->hasho_bucket = -1;
+		ovflopaque->hasho_flag = LH_UNUSED_PAGE;
+		ovflopaque->hasho_page_id = HASHO_PAGE_ID;
+
 		PageSetLSN(ovflpage, lsn);
 		MarkBufferDirty(ovflbuf);
 	}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index a3cae21..d3eaee0 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -591,9 +591,20 @@ _hash_freeovflpage(Relation rel, Buffer bucketbuf, Buffer ovflbuf,
 	/*
 	 * Initialize the freed overflow page.  Just zeroing the page won't work,
 	 * because WAL replay routines expect pages to be initialized. See
-	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended.
+	 * explanation of RBM_NORMAL mode atop XLogReadBufferExtended. Also, mark
+	 * the page as UNUSED type and retain it's page id. This allows the tools
+	 * like pageinspect to consider it as a hash page.
 	 */
 	_hash_pageinit(ovflpage, BufferGetPageSize(ovflbuf));
+
+	ovflopaque = (HashPageOpaque) PageGetSpecialPointer(ovflpage);
+
+	ovflopaque->hasho_prevblkn

Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-23 Thread Ashutosh Sharma

On Thu, Mar 23, 2017 at 9:17 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Thu, Mar 23, 2017 at 8:43 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> > On Wed, Mar 22, 2017 at 3:39 PM, Ashutosh Sharma <ashu.coe...@gmail.com> 
> > wrote:
> >> Hi,
> >>
> >> On Wed, Mar 22, 2017 at 8:41 AM, Amit Kapila <amit.kapil...@gmail.com> 
> >> wrote:
> >>
> >> To fix this, I think we should pass 'REGBUF_KEEP_DATA' while
> >> registering the buffer. Something like this,
> >>
> >> -   XLogRegisterBuffer(0, buf, REGBUF_STANDARD);
> >> +   XLogRegisterBuffer(0, buf, REGBUF_STANDARD |
> >> REGBUF_KEEP_DATA);
> >>
> >> Attached is the patch that fixes this issue.
> >>
> >
> > I think this will work, but not sure if there is a merit to deviate
> > from what btree does to handle this case.   One thing I find slightly
> > awkward in hash_xlog_vacuum_get_latestRemovedXid() is that you are
> > using a number of tuples registered as part of fixed data
> > (xl_hash_vacuum_one_page) to traverse the data registered as buf data.
> > I think it will be better if we register offsets also in fixed part of
> > data as we are doing btree case.

Agreed. I have made the changes accordingly. Please check attached v2 patch.

>
> >
> >
>
> Also another small point in this regard, do we need two separate
> variables to track number of deleted items in below code?  I think one
> variable is sufficient.
>
> _hash_vacuum_one_page()
> {
> ..
> deletable[ndeletable++] = offnum;
> tuples_removed += 1;--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
> ..
> }
>

Yes, I think 'ndeletable' alone should be fine.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash_xlog.c b/src/backend/access/hash/hash_xlog.c
index de7522e..d9ac42c 100644
--- a/src/backend/access/hash/hash_xlog.c
+++ b/src/backend/access/hash/hash_xlog.c
@@ -957,8 +957,6 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	OffsetNumber	hoffnum;
 	TransactionId	latestRemovedXid = InvalidTransactionId;
 	int		i;
-	char *ptr;
-	Size len;
 
 	xlrec = (xl_hash_vacuum_one_page *) XLogRecGetData(record);
 
@@ -977,12 +975,20 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 		return latestRemovedXid;
 
 	/*
+	 * Check if WAL replay has reached a consistent database state. If not,
+	 * we must PANIC. See the definition of btree_xlog_delete_get_latestRemovedXid
+	 * for more details.
+	 */
+	if (!reachedConsistency)
+		elog(PANIC, "hash_xlog_vacuum_get_latestRemovedXid: cannot operate with inconsistent data");
+
+	/*
 	 * Get index page.  If the DB is consistent, this should not fail, nor
 	 * should any of the heap page fetches below.  If one does, we return
 	 * InvalidTransactionId to cancel all HS transactions.  That's probably
 	 * overkill, but it's safe, and certainly better than panicking here.
 	 */
-	XLogRecGetBlockTag(record, 1, , NULL, );
+	XLogRecGetBlockTag(record, 0, , NULL, );
 	ibuffer = XLogReadBufferExtended(rnode, MAIN_FORKNUM, blkno, RBM_NORMAL);
 
 	if (!BufferIsValid(ibuffer))
@@ -994,9 +1000,7 @@ hash_xlog_vacuum_get_latestRemovedXid(XLogReaderState *record)
 	 * Loop through the deleted index items to obtain the TransactionId from
 	 * the heap items they point to.
 	 */
-	ptr = XLogRecGetBlockData(record, 1, );
-
-	unused = (OffsetNumber *) ptr;
+	unused = (OffsetNumber *) ((char *) xlrec + SizeOfHashVacuumOnePage);
 
 	for (i = 0; i < xlrec->ntuples; i++)
 	{
@@ -1121,23 +1125,15 @@ hash_xlog_vacuum_one_page(XLogReaderState *record)
 
 	if (action == BLK_NEEDS_REDO)
 	{
-		char *ptr;
-		Size len;
-
-		ptr = XLogRecGetBlockData(record, 0, );
-
 		page = (Page) BufferGetPage(buffer);
 
-		if (len > 0)
+		if (XLogRecGetDataLen(record) > SizeOfHashVacuumOnePage)
 		{
 			OffsetNumber *unused;
-			OffsetNumber *unend;
 
-			unused = (OffsetNumber *) ptr;
-			unend = (OffsetNumber *) ((char *) ptr + len);
+			unused = (OffsetNumber *) ((char *) xldata + SizeOfHashVacuumOnePage);
 
-			if ((unend - unused) > 0)
-PageIndexMultiDelete(page, unused, unend - unused);
+			PageIndexMultiDelete(page, unused, xldata->ntuples);
 		}
 
 		/*
diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c
index 8640e85..8699b5b 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -344,7 +344,6 @@ _hash_vacuum_one_page(Relation rel, Buffer metabuf, Buffer buf,
 	Page	page = BufferGetPage(buf);
 	HashPageOpaque	pageopaque;
 	HashMetaPage	metap;
-	double tuples_removed = 0;
 
 	/* Scan each tuple in page to see if it is marked as LP_DEAD */
 	maxoff = PageGe

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-23 Thread Ashutosh Sharma

Hi All,

I have tried to test 'group_update_clog_v11.1.patch' shared upthread by
Amit on a high end machine. I have tested the patch with various savepoints
in my test script. The machine details along with test scripts and the test
results are shown below,

Machine details:

24 sockets, 192 CPU(s)
RAM - 500GB

test script:


\set aid random (1,3000)
\set tid random (1,3000)

BEGIN;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s1;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s2;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s3;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
SAVEPOINT s4;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
SAVEPOINT s5;
SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
END;

Non-default parameters
==
max_connections = 200
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
checkpoint_timeout=900
synchronous_commit=off


pgbench -M prepared -c $thread -j $thread -T $time_for_reading postgres -f
~/test_script.sql

where, time_for_reading = 10 mins

Test Results:
=

With 3 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 50275 53704 6.82048732
64 62860 66561 5.887686923
8 18464 18752 1.559792028

With 5 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 46559 47715 2.482871196
64 52306 52082 -0.4282491492
8 12289 12852 4.581332899


With 7 savepoints
=

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 41367 41500 0.3215123166
64 42996 41473 -3.542189971
8 9665 9657 -0.0827728919

With 10 savepoints
==

CLIENT COUNT TPS (HEAD) TPS (PATCH) % IMPROVEMENT
128 34513 34597 0.24338655
64 32581 32035 -1.67582
8 7293 7622 4.511175099
*Conclusion:*
As seen from the test results mentioned above, there is some performance
improvement with 3 SP(s), with 5 SP(s) the results with patch is slightly
better than HEAD, with 7 and 10 SP(s) we do see regression with patch.
Therefore, I think the threshold value of 4 for number of subtransactions
considered in the patch looks fine to me.


--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Mar 21, 2017 at 6:19 PM, Amit Kapila <amit.kapil...@gmail.com>
wrote:

> On Mon, Mar 20, 2017 at 8:27 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Fri, Mar 17, 2017 at 2:30 AM, Amit Kapila <amit.kapil...@gmail.com>
> wrote:
> >>> I was wondering about doing an explicit test: if the XID being
> >>> committed matches the one in the PGPROC, and nsubxids matches, and the
> >>> actual list of XIDs matches, then apply the optimization.  That could
> >>> replace the logic that you've proposed to exclude non-commit cases,
> >>> gxact cases, etc. and it seems fundamentally safer.  But it might be a
> >>> more expensive test, too, so I'm not sure.
> >>
> >> I think if the number of subxids is very small let us say under 5 or
> >> so, then such a check might not matter, but otherwise it could be
> >> expensive.
> >
> > We could find out by testing it.  We could also restrict the
> > optimization to cases with just a few subxids, because if you've got a
> > large number of subxids this optimization probably isn't buying much
> > anyway.
> >
>
> Yes, and I have modified the patch to compare xids and subxids for
> group update.  In the initial short tests (with few client counts), it
> seems like till 3 savepoints we can win and 10 savepoints onwards
> there is some regression or at the very least there doesn't appear to
> be any benefit.  We need more tests to identify what is the safe
> number, but I thought it is better to share the patch to see if we
> agree on the changes because if not, then the whole testing needs to
> be repeated.  Let me know what do you think about attached?
>
>
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

1 2 3 >

1 - 100 of 230 matches

Mail list logo