Re: range_agg

2019-07-04 Thread Pavel Stehule
Hi

út 2. 7. 2019 v 0:38 odesílatel Jeff Davis  napsal:

> On Fri, 2019-05-03 at 15:56 -0700, Paul Jungwirth wrote:
> > Hello,
> >
> > I wrote an extension to add a range_agg function with similar
> > behavior
> > to existing *_agg functions, and I'm wondering if folks would like
> > to
> > have it in core? Here is the repo:
> > https://github.com/pjungwir/range_agg
>
> This seems like a very useful extension, thank you.
>
> For getting into core though, it should be a more complete set of
> related operations. The patch is implicitly introducing the concept of
> a "multirange" (in this case, an array of ranges), but it's not making
> the concept whole.
>
> What else should return a multirange? This patch implements the union-
> agg of ranges, but we also might want an intersection-agg of ranges
> (that is, the set of ranges that are subranges of every input). Given
> that there are other options here, the name "range_agg" is too generic
> to mean union-agg specifically.
>
> What can we do with a multirange? A lot of range operators still make
> sense, like "contains" or "overlaps"; but "adjacent" doesn't quite
> work. What about new operations like inverting a multirange to get the
> gaps?
>
> Do we want to continue with the array-of-ranges implementation of a
> multirange, or do we want a first-class multirange concept that might
> eliminate the boilerplate around defining all of these operations?
>
> If we have a more complete set of operators here, the flags for
> handling overlapping ranges and gaps will be unnecessary.
>

I think so scope of this patch is correct. Introduction of set of ranges
data type based on a array or not, should be different topic.

The question is naming - should be this agg function named "range_agg", and
multi range agg "multirange_agg"? Personally, I have not a problem with
range_agg, and I have not a problem so it is based on union operation. It
is true so only result of union can be implemented as range simply. When I
though about multi range result, then there are really large set of
possible algorithms how to do some operations over two, three values. So
personally, I am satisfied with scope of simple range_agg functions,
because I see a benefits, and I don't think so this implementation block
any more complex designs in the future. There is really big questions how
to implement multi range, and now I think so special data type will be
better than possible unordered arrays.

Regards

Pavel



Regards,
> Jeff Davis
>
>
>
>
>


Declared but no defined functions

2019-07-04 Thread Masahiko Sawada
Hi,

I realized that TransactionIdAbort is declared in the transam.h but
there is not its function body. As far as I found there are three
similar functions in total by the following script.

for func in `git ls-files | egrep "\w+\.h$" | xargs cat | egrep
"extern \w+ \w+\(.*\);" | sed -e "s/.* \(.*\)(.*);/\1(/g"`
do
if [ `git grep "$func" -- "*.c" | wc -l` -lt 1 ];then
echo $func
fi
done

I think the following functions are mistakenly left in the header
file. So attached patch removes them.

dsa_startup()
TransactionIdAbort()
renameatt_type()

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 6cbb0c82c7..33fd052156 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -209,7 +209,6 @@ extern PGDLLIMPORT VariableCache ShmemVariableCache;
 extern bool TransactionIdDidCommit(TransactionId transactionId);
 extern bool TransactionIdDidAbort(TransactionId transactionId);
 extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
-extern void TransactionIdAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
diff --git a/src/include/commands/tablecmds.h b/src/include/commands/tablecmds.h
index b09afa2775..ac2bfaff1e 100644
--- a/src/include/commands/tablecmds.h
+++ b/src/include/commands/tablecmds.h
@@ -60,8 +60,6 @@ extern void SetRelationHasSubclass(Oid relationId, bool relhassubclass);
 
 extern ObjectAddress renameatt(RenameStmt *stmt);
 
-extern ObjectAddress renameatt_type(RenameStmt *stmt);
-
 extern ObjectAddress RenameConstraint(RenameStmt *stmt);
 
 extern ObjectAddress RenameRelation(RenameStmt *stmt);
diff --git a/src/include/utils/dsa.h b/src/include/utils/dsa.h
index ddd3cef8c2..991b62d28c 100644
--- a/src/include/utils/dsa.h
+++ b/src/include/utils/dsa.h
@@ -99,8 +99,6 @@ typedef pg_atomic_uint64 dsa_pointer_atomic;
  */
 typedef dsm_handle dsa_handle;
 
-extern void dsa_startup(void);
-
 extern dsa_area *dsa_create(int tranche_id);
 extern dsa_area *dsa_create_in_place(void *place, size_t size,
 	 int tranche_id, dsm_segment *segment);


Re: Add client connection check during the execution of the query

2019-07-04 Thread Thomas Munro
On Sat, Feb 9, 2019 at 6:16 AM  wrote:
> The purpose of this patch is to stop the execution of continuous
> requests in case of a disconnection from the client. In most cases, the
> client must wait for a response from the server before sending new data
> - which means there should not remain unread data on the socket and we
> will be able to determine a broken connection.
> Perhaps exceptions are possible, but I could not think of such a use
> case (except COPY). I would be grateful if someone could offer such
> cases or their solutions.
> I added a test for the GUC variable when the client connects via SSL,
> but I'm not sure that this test is really necessary.

Hello Sergey,

This seems like a reasonable idea to me.  There is no point in running
a monster 24 hour OLAP query if your client has gone away.  It's using
MSG_PEEK which is POSIX, and I can't immediately think of any reason
why it's not safe to try to peek at a byte in that socket at any time.
Could you add a comment to explain why you have !PqCommReadingMsg &&
!PqCommBusy?  The tests pass on a couple of different Unixoid OSes I
tried.  Is it really necessary to do a bunch of IO and busy CPU work
in $long_query?  pg_sleep(60) can do the job, because it includes a
standard CHECK_FOR_INTERRUPTS/latch loop that will loop around on
SIGALRM.  I just tested that your patch correctly interrupts
pg_sleep() if I kill -9 my psql process.  Why did you call the timeout
SKIP_CLIENT_CHECK_TIMEOUT (I don't understand the "SKIP" part)?  Why
not CLIENT_CONNECTION_CHECK_TIMEOUT or something?

I wonder if it's confusing to users that you get "connection to client
lost" if the connection goes away while running a query, but nothing
if the connection goes away without saying goodbye ("X") while idle.

The build fails on Windows.  I think it's because
src/tools/msvc/Mkvcbuild.pm is trying to find something to compile
under src/test/modules/connection, and I think the solution is to add
that to the variable @contrib_excludes.  (I wonder if that script
should assume there is nothing to build instead of dying with "Could
not determine contrib module type for connection", otherwise every
Unix hacker is bound to get this wrong every time.)

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.45820

Aside from that problem, my Windows CI building thing isn't smart
enough to actually run those extra tests yet, so I don't know if it
actually works on that platform yet (I really need to teach that thing
to use the full buildfarm scripts...)


--
Thomas Munro
https://enterprisedb.com




Re: [proposal] de-TOAST'ing using a iterator

2019-07-04 Thread Thomas Munro
On Thu, Jun 20, 2019 at 1:51 AM Binguo Bao  wrote:
> Hi hackers!
> This proposal aims to provide the ability to de-TOAST a fully TOAST'd and 
> compressed field using an iterator and then update the appropriate parts of 
> the code to use the iterator where possible instead of de-TOAST'ing and 
> de-compressing the entire value. Examples where this can be helpful include 
> using position() from the beginning of the value, or doing a pattern or 
> substring match.
>
> de-TOAST iterator overview:
> 1. The caller requests the slice of the attribute value from the de-TOAST 
> iterator.
> 2. The de-TOAST iterator checks if there is a slice available in the output 
> buffer, if there is, return the result directly,
> otherwise goto the step3.
> 3. The de-TOAST iterator checks if there is the slice available in the input 
> buffer, if there is, goto step44. Otherwise,
> call fetch_datum_iterator to fetch datums from disk to input buffer.
> 4. If the data in the input buffer is compressed, extract some data from the 
> input buffer to the output buffer until the caller's
> needs are met.
>
> I've implemented the prototype and apply it to the position() function to 
> test performance.

Hi Binguo,

Interesting work, and nice performance improvements so far.  Just by
the way, the patch currently generates warnings:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554345719

-- 
Thomas Munro
https://enterprisedb.com




Re: Conflict handling for COPY FROM

2019-07-04 Thread Thomas Munro
On Fri, Jun 28, 2019 at 10:57 PM Surafel Temesgen  wrote:
> In addition to the above change in the attached patch i also change
> the syntax to ERROR LIMIT because it is no longer only skip
> unique and exclusion constrain violation

Hi Surafel,

FYI copy.sgml has some DTD validity problems.

https://travis-ci.org/postgresql-cfbot/postgresql/builds/554350168

-- 
Thomas Munro
https://enterprisedb.com




Re: check_recovery_target_lsn() does a PG_CATCH without a throw

2019-07-04 Thread Michael Paquier
On Sun, Jun 30, 2019 at 09:35:52PM +0900, Michael Paquier wrote:
> The refactoring looks good to me (including what you have just fixed
> with PG_RETURN_LSN).  Thanks for considering it.

This issue was still listed as an open item for v12, so I have removed
it.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-07-04 Thread Michael Paquier
On Thu, Jul 04, 2019 at 07:44:42PM +0100, Dent John wrote:
> I see that your patch removed that particular type, so I guess that
> feature in vacuum.c has been added in the meantime.
> 
> Would you have a more recent patch?

I have switched the patch as waiting on author. 
--
Michael


signature.asc
Description: PGP signature


Re: errbacktrace

2019-07-04 Thread Thomas Munro
On Tue, Jun 25, 2019 at 11:08 PM Peter Eisentraut
 wrote:
> For the implementation, I support both backtrace() provided by the OS as
> well as using libunwind.  The former seems to be supported by a number
> of platforms, including glibc, macOS, and FreeBSD, so maybe we don't
> need the libunwind suport.  I haven't found any difference in quality in
> the backtraces between the two approaches, but surely that is highly
> dependent on the exact configuration.
>
> I would welcome testing in all direction with this, to see how well it
> works in different circumstances.

I like it.

Works out of the box on my macOS machine, but for FreeBSD I had to add
-lexecinfo to LIBS.

-- 
Thomas Munro
https://enterprisedb.com




Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-04 Thread Michael Paquier
On Thu, Jul 04, 2019 at 01:48:24PM -0300, Fabrízio de Royes Mello wrote:
> On Thu, Jul 4, 2019 at 10:57 AM Robert Haas  wrote:
>> It would be pretty silly to have one and not the other, regardless of
>> whether we can think of an immediate use case.
> 
> +1

OK, applied with a catalog version bump.  This is cool to have.
--
Michael


signature.asc
Description: PGP signature


Re: base backup client as auxiliary backend process

2019-07-04 Thread Thomas Munro
On Sun, Jun 30, 2019 at 8:05 AM Peter Eisentraut
 wrote:
> Attached is a very hackish patch to implement this.  It works like this:
>
> # (assuming you have a primary already running somewhere)
> initdb -D data2 --minimal
> $EDITOR data2/postgresql.conf  # set primary_conninfo
> pg_ctl -D data2 start

+1, very nice.  How about --replica?

FIY Windows doesn't like your patch:

src/backend/postmaster/postmaster.c(1396): warning C4013: 'sleep'
undefined; assuming extern returning int
[C:\projects\postgresql\postgres.vcxproj]

https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.45930

-- 
Thomas Munro
https://enterprisedb.com




RE: [PATCH] Speedup truncates of relation forks

2019-07-04 Thread Jamison, Kirk
Hi,

> I updated the patch based from comments, but it still fails the regression
> test as indicated in (5) above.
> Kindly verify if I correctly addressed the other parts as what you intended.
> 
> Thanks again for the review!
> I'll update the patch again after further comments.

I updated the patch which is similar to V3 of the patch,
but addressing my problem in (5) in the previous email regarding 
FreeSpaceMapVacuumRange.
It seems to pass the regression test now. Kindly check for validation.
Thank you!

Regards,
Kirk Jamison


v4-0001-Speedup-truncates-of-relation-forks.patch
Description: v4-0001-Speedup-truncates-of-relation-forks.patch


Re: Replacing the EDH SKIP primes

2019-07-04 Thread Michael Paquier
On Thu, Jul 04, 2019 at 08:24:13AM +0200, Daniel Gustafsson wrote:
> LGTM, thanks.

Okay, done, after rechecking the shape of the key.  Thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Introduce timeout capability for ConditionVariableSleep

2019-07-04 Thread Thomas Munro
On Fri, Mar 22, 2019 at 7:21 AM Shawn Debnath  wrote:
>
> On Sat, Mar 16, 2019 at 03:27:17PM -0700, Shawn Debnath wrote:
> > > + * Track the current time so that we can calculate the
> > > remaining timeout
> > > + * if we are woken up spuriously.
> > >
> > > I think tha "track" means chasing a moving objects. So it might
> > > be bettter that it is record or something?
> > >
> > > >   * Wait for the given condition variable to be signaled or till 
> > > > timeout.
> > > >   *
> > > >   * Returns -1 when timeout expires, otherwise returns 0.
> > > >   *
> > > >   * See ConditionVariableSleep() for general usage.
> > > >
> > > > > +ConditionVariableTimedSleep(ConditionVariable *cv, long timeout,
> > > > >
> > > > > Counldn't the two-state return value be a boolean?
> >
> > I will change it to Record in the next iteration of the patch.
>
> Posting rebased and updated patch. Changed the word 'Track' to 'Record'
> and also changed variable name rem_timeout to cur_timeout to match
> naming in other use cases.

Hi Shawn,

I think this is looking pretty good and I'm planning to commit it
soon.  The convention for CHECK_FOR_INTERRUPTS() in latch wait loops
seems to be to put it after the ResetLatch(), so I've moved it in the
attached version (though I don't think it was wrong where it was).
Also pgindented and with my proposed commit message.  I've also
attached a throw-away test module that gives you CALL poke() and
SELECT wait_for_poke(timeout) using a CV.

Observations that I'm not planning to do anything about:
1.  I don't really like the data type "long", but it's already
established that we use that for latches so maybe it's too late for me
to complain about that.
2.  I don't really like the fact that we have to do floating point
stuff in INSTR_TIME_GET_MILLISEC().  That's not really your patch's
fault and you've copied the timeout adjustment code from latch.c,
which seems reasonable.

-- 
Thomas Munro
https://enterprisedb.com


0001-Introduce-timed-waits-for-condition-variables-v5.patch
Description: Binary data


0002-Simple-module-for-waiting-for-other-sessions-v5.patch
Description: Binary data


Re: mcvstats serialization code is still shy of a load

2019-07-04 Thread Tom Lane
Tomas Vondra  writes:
> I was about to push into REL_12_STABLE, when I realized that maybe we
> need to do something about the catversion first. REL_12_STABLE is still
> on 201906161, while master got to 201907041 thanks to commit
> 7b925e12703.  Simply cherry-picking the commits would get us to
> 201907052 in both branches, but that'd be wrong as the catalogs do
> differ. I suppose this is what you meant by "catversion differences to
> cope with".

Yeah, exactly.

My recommendation is to use 201907051 on v12 and 201907052
on master (or whatever is $today for you).  They need to be
different now that the branches' catalog histories have diverged,
and it seems to me that the back branch should be "older".

regards, tom lane




Re: mcvstats serialization code is still shy of a load

2019-07-04 Thread Tomas Vondra

On Tue, Jul 02, 2019 at 10:38:29AM +0200, Tomas Vondra wrote:

On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote:

Tomas Vondra  writes:

Attached is a slightly improved version of the serialization patch.


I reviewed this patch, and tested it on hppa and ppc.  I found one
serious bug: in the deserialization varlena case, you need

-   dataptr += MAXALIGN(len);
+   dataptr += MAXALIGN(len + VARHDRSZ);

(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
plus the Assert a few lines further down about dataptr having been
advanced by the correct amount fires.  (On one machine I tested on,
that happened during the core regression tests.  The other machine
got through regression, but trying to do "select * from pg_stats_ext;"
afterwards exhibited the crash.  I didn't investigate closely, but
I suspect the difference has to do with different MAXALIGN values,
4 and 8 respectively.)

The attached patch (a delta atop your v2) corrects that plus some
cosmetic issues.



Thanks.


If we're going to push this, it would be considerably less complicated
to do so before v12 gets branched --- not long after that, there will be
catversion differences to cope with.  I'm planning to make the branch
tomorrow (Monday), probably ~1500 UTC.  Just sayin'.



Unfortunately, I was travelling on Sunday and was quite busy on Monday, so
I've been unable to push this before the branching :-(

I'll push by the end of this week, once I get home.



I've pushed the fix (along with the pg_mcv_list_item fix) into master,
hopefully the buildfarm won't be upset about it.

I was about to push into REL_12_STABLE, when I realized that maybe we
need to do something about the catversion first. REL_12_STABLE is still
on 201906161, while master got to 201907041 thanks to commit
7b925e12703.  Simply cherry-picking the commits would get us to
201907052 in both branches, but that'd be wrong as the catalogs do
differ. I suppose this is what you meant by "catversion differences to
cope with".

I suppose this is not the first time this happened - how did we deal
with it in the past? I guess we could use some "past" non-conflicting
catversion number in the REL_12_STABLE branch (say, 201907030x) but
maybe that'd be wrong?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-07-04 Thread Peter Geoghegan
On Thu, Jul 4, 2019 at 10:38 AM Peter Geoghegan  wrote:
> I tried this on my own "UK land registry" test data [1], which was
> originally used for the v12 nbtree work. My test case has a low
> cardinality, multi-column text index. I chose this test case because
> it was convenient for me.
>
> On v12/master, the index is 1100MB. Whereas with your patch, it ends
> up being 196MB -- over 5.5x smaller!

I also see a huge and consistent space saving for TPC-H. All 9 indexes
are significantly smaller. The lineitem orderkey index is "just" 1/3
smaller, which is the smallest improvement among TPC-H indexes in my
index bloat test case. The two largest indexes after the initial bulk
load are *much* smaller: the lineitem parts supplier index is ~2.7x
smaller, while the lineitem ship date index is a massive ~4.2x
smaller. Also, the orders customer key index is ~2.8x smaller, and the
order date index is ~2.43x smaller. Note that the test involved retail
insertions, not CREATE INDEX.

I haven't seen any regression in the size of any index so far,
including when the number of internal pages is all that we measure.
Actually, there seems to be cases where there is a noticeably larger
reduction in internal pages than in leaf pages, probably because of
interactions with suffix truncation.

This result is very impressive. We'll need to revisit what the right
trade-off is for the compression scheme, which Heikki had some
thoughts on when we left off 3 years ago, but that should be a lot
easier now. I am very encouraged by the fact that this relatively
simple approach already works quite nicely. It's also great to see
that bulk insertions with lots of compression are very clearly faster
with this latest revision of your patch, unlike earlier versions from
2016 that made those cases slower (though I haven't tested indexes
that don't really use compression). I think that this is because you
now do the compression lazily, at the point where it looks like we may
need to split the page. Previous versions of the patch had to perform
compression eagerly, just like GIN, which is not really appropriate
for nbtree.

--
Peter Geoghegan




Re: Fix runtime errors from -fsanitize=undefined

2019-07-04 Thread Noah Misch
On Mon, Jun 03, 2019 at 09:21:48PM +0200, Peter Eisentraut wrote:
> After many years of trying, it seems the -fsanitize=undefined checking
> in gcc is now working somewhat reliably.  Attached is a patch that fixes
> all errors of the kind
> 
> runtime error: null pointer passed as argument N, which is declared to
> never be null
> 
> Most of the cases are calls to memcpy(), memcmp(), etc. with a length of
> zero, so one appears to get away with passing a null pointer.

I just saw this proposal.  The undefined behavior in question is strictly
academic.  These changes do remove the need for new users to discover
-fno-sanitize=nonnull-attribute, but they make the code longer and no clearer.
Given the variety of code this touches, I expect future commits will
reintroduce the complained-of usage patterns, prompting yet more commits to
restore the invariant achieved here.  Hence, I'm -0 for this change.




Re: [PATCH] Implement uuid_version()

2019-07-04 Thread Jose Luis Tallon

On 4/7/19 17:30, Alvaro Herrera wrote:

On 2019-Jul-04, Tom Lane wrote:


A possible option 3 is to keep the function in pgcrypto but change
its C code to call the core code.

This seems most reasonable, and is what José Luis proposed upthread.  We
don't have to bump the pgcrypto extension version, as nothing changes
for pgcrypto externally.


Yes, indeed.

...which means I get another todo item if nobody else volunteers :)


Thanks!

    / J.L.






Proposal to add GUC_REPORT to lc_monetary, lc_numeric and search_path

2019-07-04 Thread Dave Cramer
Hello,

The first 2 lc_monetary and lc_numeric are useful if the client for some
reason executes set lc_*. We don't get a report and in many cases can't
continue to parse numerics or money.
Now it it possible to get these at startup by issuing show or querying the
catalog, but it seems much cleaner to just send them.

The latter is important for similar reasons. JDBC caches prepared
statements internally and if the user changes the search path without using
setSchema or uses a function to change it then internally it would be
necessary to invalidate the cache. Currently if this occurs these
statements fail.

This seems like a rather innocuous change as the protocol is not changed,
rather the amount of information returned on startup is increased
marginally.

I've included the authors of the npgsql and the node drivers in the email
for their input.

Dave Cramer


RE: duplicate key entries for primary key -- need urgent help

2019-07-04 Thread Kumar, Pawan (Nokia - IN/Bangalore)
Thanks a lot Tomas for the reply.

Which version are you running, exactly? Whih minor version?
[Pawan]: Its (PostgreSQL) 9.5.9

sai=> select version();
 version
--
 PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-11), 64-bit
(1 row)

Why do you think it's the issue you linked?

[Pawan]: Because the thread which I shared also has problem statement like 
"Duplicate entries of Primary key" .
If this is also known to this version, I will be appreciating a lot if we have 
some Workaround or config change.

In our production: See below entries, proc_id is primary key and we can see 
duplicate entries. How it is possible?

sai=> select ctid,proc_id from etl_status where proc_id='2993229';
   ctid   | proc_id
--+-
 (381,20) | 2993229
 (388,28) | 2993229
(2 rows)

Any idea, how it happened?

I will waiting for your reply

WBR,
-Pawan

-Original Message-
From: Tomas Vondra  
Sent: Thursday, July 04, 2019 10:18 PM
To: Kumar, Pawan (Nokia - IN/Bangalore) 
Cc: and...@2ndquadrant.com; and...@dunslane.net; j...@agliodbs.com; 
pgsql-hack...@postgresql.org
Subject: Re: duplicate key entries for primary key -- need urgent help

On Thu, Jul 04, 2019 at 01:37:01PM +, Kumar, Pawan (Nokia - IN/Bangalore) 
wrote:
>Hi Guys,
>
>Can you please help here?
>
>Below reported issue in past about duplicate key entries for primary key.
>https://www.postgresql.org/message-id/534c8b33.9050...@pgexperts.com
>
>the solution was provided in 9.3 version of postgres but it seems issue is 
>still there in 9.5 version which I am running currently.
>
>Can you please let me know if this is also known in 9.5? any fix or Workaround 
>please?
>

Which version are you running, exactly? Whih minor version?

Why do you think it's the issue you linked?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: duplicate key entries for primary key -- need urgent help

2019-07-04 Thread Kumar, Pawan (Nokia - IN/Bangalore)
Thanks for reply.

This has happened very often and at different production system.
There is no version change. System running with same version since 1 year but 
duplicate key issue came quiet a time.
And impact is big because of that and only way to fix is to delete the 
duplicate primary key.
Any suggestions to check which logs? Any command to run to get more info during 
the issue?

Any potential configuration to check?
Plz suggest

Wbr,
Pk

From: Tomas Vondra 
Sent: Thursday, July 4, 2019 11:31:48 PM
To: Kumar, Pawan (Nokia - IN/Bangalore)
Cc: and...@2ndquadrant.com; and...@dunslane.net; j...@agliodbs.com; 
pgsql-hack...@postgresql.org
Subject: Re: duplicate key entries for primary key -- need urgent help

On Thu, Jul 04, 2019 at 05:34:21PM +, Kumar, Pawan (Nokia - IN/Bangalore) 
wrote:
>Thanks a lot Tomas for the reply.
>
>Which version are you running, exactly? Whih minor version?
>[Pawan]: Its (PostgreSQL) 9.5.9
>

You're missing 2 years of bugfixes, some of which are addressing data
corruption issues and might have caused this.

>sai=> select version();
> version
>--
> PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 
> (Red Hat 4.8.5-11), 64-bit
>(1 row)
>
>Why do you think it's the issue you linked?
>
>[Pawan]: Because the thread which I shared also has problem statement like 
>"Duplicate entries of Primary key" .
>If this is also known to this version, I will be appreciating a lot if we have 
>some Workaround or config change.
>

Duplicate entries are clearly some sort of data corruption, but that
might have happened in various ways - it does not mean it's the same
issue. And yes, 9.5.9 has a fix for the issue in the thread you linked.

>In our production: See below entries, proc_id is primary key and we can see 
>duplicate entries. How it is possible?
>
>sai=> select ctid,proc_id from etl_status where proc_id='2993229';
>   ctid   | proc_id
>--+-
> (381,20) | 2993229
> (388,28) | 2993229
>(2 rows)
>
>Any idea, how it happened?
>

No, that's impossible to say without you doing some more investigation.
We need to know when those rows were created, on which version that
happened (the system might have been updated and the corruption predates
might have happened on the previous version), and so on. For example, if
the system crashed or had any significant issues, that might be related
to data corruption issues.

We know nothing about your system, so you'll have to do a bit of
investigation, look for suspicious things, etc.

FWIW it might be a good idea to look for other cases of data corruption.
Both to know the extent of the problem, and to gain insight.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Periods

2019-07-04 Thread Alvaro Herrera
Hello Vik, Paul,

On 2019-Jul-04, Vik Fearing wrote:

> I've been working on this as an extension instead.  It allows me to do some 
> stuff in plpgsql which is much easier for me.
> 
> https://github.com/xocolatl/periods/

Hmm, okay.

> I would love to have all this in core (especially since MariaDB 10.4
> just became the first open source database to get all of this
> functionality), but something is better than nothing.

Agreed on all accounts.

I think that the functionality in your patch is already integrated in
Paul's patch for temporal PK/FK elsewhere ... is that correct, or is
this patch orthogonal to that work?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-07-04 Thread Dent John
Hi Nikolay,

I have had a crack at re-basing the current patch against 12b2, but I didn’t 
trivially succeed.

It’s probably more my bodged fixing of the rejections than a fundamental 
problem. But I now get compile fails in — and seems like only — vacuum.c.

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement 
-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security 
-fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -O2 
-I../../../src/include  -isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
-c -o vacuum.o vacuum.c
vacuum.c:1761:20: error: expected expression
((StdRdOptions *) 
onerel->rd_options)->vacuum_index_cleanup)
^
vacuum.c:1761:6: error: use of undeclared identifier 'StdRdOptions'
((StdRdOptions *) 
onerel->rd_options)->vacuum_index_cleanup)
  ^
vacuum.c:1771:20: error: expected expression
((StdRdOptions *) onerel->rd_options)->vacuum_truncate)
^
vacuum.c:1771:6: error: use of undeclared identifier 'StdRdOptions'
((StdRdOptions *) onerel->rd_options)->vacuum_truncate)
  ^
4 errors generated.

I see that your patch removed that particular type, so I guess that feature in 
vacuum.c has been added in the meantime.

Would you have a more recent patch?

For what it is worth, I had a play at getting it to work, and only made things 
much worse!

denty.

> On 1 Jul 2019, at 12:52, Thomas Munro  wrote:
> 
> On Mon, Apr 1, 2019 at 4:36 AM Nikolay Shaplov  wrote:
>> В письме от понедельник, 18 марта 2019 г. 3:03:04 MSK пользователь Iwata, Aya
>> написал:
>>> This patch does not apply.
>> Oh... Sorry... here goes new version
> 
> Hi Nikolay,
> 
> Could we please have a new rebase?
> 
> Thanks,
> 
> -- 
> Thomas Munro
> https://enterprisedb.com



Re: duplicate key entries for primary key -- need urgent help

2019-07-04 Thread Tomas Vondra

On Thu, Jul 04, 2019 at 06:28:03PM +, Kumar, Pawan (Nokia - IN/Bangalore) 
wrote:

Thanks for reply.

This has happened very often and at different production system.
There is no version change. System running with same version since 1 year but 
duplicate key issue came quiet a time.
And impact is big because of that and only way to fix is to delete the 
duplicate primary key.
Any suggestions to check which logs? Any command to run to get more info during 
the issue?

Any potential configuration to check?
Plz suggest



Well, I've already pointed out you're missing 2 years worth of fixes, so
upgrading to current minor version is the first thing I'd do. (I doubt
people will be rushing to help you in their free time when you're
missing two years of fixes, possibly causing this issue.)

If the issue happens even after upgrading, we'll need to see more details
about an actual case - commands creating/modifying the duplicate rows,
or anything you can find. It's impossible to help you when we only know
there are duplicate values in a PK.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: range_agg

2019-07-04 Thread Alvaro Herrera
I noticed that this patch has a // comment about it segfaulting.  Did
you ever figure that out?  Is the resulting code the one you intend as
final?

Did you make any inroads regarding Jeff Davis' suggestion about
implementing "multiranges"?  I wonder if that's going to end up being a
Pandora's box.

Stylistically, the code does not match pgindent's choices very closely.
I can return a diff to a pgindented version of your v0002 for your
perusal, if it would be useful for you to learn its style.  (A committer
would eventually run pgindent himself[1], but it's good to have
submissions be at least close to the final form.)  BTW, I suggest "git
format-patch -vN" to prepare files for submission.


[1] No female committers yet ... is this 2019?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Periods

2019-07-04 Thread Vik Fearing
On Thursday, July 4, 2019 8:04 PM, Alvaro Herrera alvhe...@2ndquadrant.com 
wrote:

> Hello Vik,
> On 2018-Jun-05, Vik Fearing wrote:
>
>>> I understand that your patch is just to allow defining periods, but I
>>> thought I'd share my own hopes earlier rather than later, in case you
>>> are doing more work on this.
>>
>> Yes, I plan on doing much more work on this. My goal is to implement
>> (by myself or with help from other) the entire SQL:2016 spec on
>> periods and system versioned tables. This current patch is just
>> infrastructure.
>
> Have you had the chance to work on this?

Hi Alvaro,

I've been working on this as an extension instead.  It allows me to do some 
stuff in plpgsql which is much easier for me.

https://github.com/xocolatl/periods/

I would love to have all this in core (especially since MariaDB 10.4 just 
became the first open source database to get all of this functionality), but 
something is better than nothing.
--
Vik Fearing

question about SnapBuild

2019-07-04 Thread ifquant
i am some pullzed by Snapbuild.c
it seem some code can not reach  for ever.
SnapBuildCommitTxn
{
//can not reach there
if (builder->state == SNAPBUILD_START ||
(builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder
}


 DecodeXactOp  {
if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
return;  //returned 
}
}
| |
ifquant
|
|
ifqu...@163.com
|
签名由网易邮箱大师定制

Re: Periods

2019-07-04 Thread Alvaro Herrera
Hello Vik,

On 2018-Jun-05, Vik Fearing wrote:

> > I understand that your patch is just to allow defining periods, but I
> > thought I'd share my own hopes earlier rather than later, in case you
> > are doing more work on this.
> 
> Yes, I plan on doing much more work on this.  My goal is to implement
> (by myself or with help from other) the entire SQL:2016 spec on
> periods and system versioned tables.  This current patch is just
> infrastructure.

Have you had the chance to work on this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-04 Thread James Coleman
On Thu, Jul 4, 2019 at 10:46 AM Tomas Vondra
 wrote:
>
> On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote:
> >On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
> >> >
> >> >Unrelated: if you or someone else you know that's more familiar with
> >> >the parallel code, I'd be interested in their looking at the patch at
> >> >some point, because I have a suspicion it might not be operating in
> >...
> >> So I've looked into that, and the reason seems fairly simple - when
> >> generating the Gather Merge paths, we only look at paths that are in
> >> partial_pathlist. See generate_gather_paths().
> >>
> >> And we only have sequential + index paths in partial_pathlist, not
> >> incremental sort paths.
> >>
> >> IMHO we can do two things:
> >>
> >> 1) modify generate_gather_paths to also consider incremental sort for
> >> each sorted path, similarly to what create_ordered_paths does
> >>
> >> 2) modify build_index_paths to also generate an incremental sort path
> >> for each index path
> >>
> >> IMHO (1) is the right choice here, because it automatically does the
> >> trick for all other types of ordered paths, not just index scans. So,
> >> something like the attached patch, which gives me plans like this:
> >...
> >> But I'm not going to claim those are total fixes, it's the minimum I
> >> needed to do to make this particular type of plan work.
> >
> >Thanks for looking into this!
> >
> >I intended to apply this to my most recent version of the patch (just
> >sent a few minutes ago), but when I apply it I noticed that the
> >partition_aggregate regression tests have several of these failures:
> >
> >ERROR:  could not find pathkey item to sort
> >
> >I haven't had time to look into the cause yet, so I decided to wait
> >until the next patch revision.
> >
>
> FWIW I don't claim the patch I shared is complete and/or 100% correct.
> It was more an illustration of the issue and the smallest patch to make
> a particular query work. The test failures are a consequence of that.
>
> I'll try looking into the failures over the next couple of days, but I
> can't promise anything.

Yep, I understand, I just wanted to note that it was still an
outstanding item and give a quick update on why so.

Anything you can look at is much appreciated.

James Coleman




Re: Optimize partial TOAST decompression

2019-07-04 Thread Tomas Vondra

Of course, I forgot to attach the files, so here they are.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 


t.data.gz
Description: application/gzip
(gdb) bt
#0  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
#1  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
#2  0x004c3b08 in heap_tuple_untoast_attr_slice (attr=, 
sliceoffset=0, slicelength=-1) at tuptoaster.c:315
#3  0x0085c1e5 in pg_detoast_datum_slice (datum=, 
first=, count=) at fmgr.c:1767
#4  0x00833b7a in text_substring (str=133761519127512, start=0, 
length=, length_not_specified=) at varlena.c:956
#5  0x00600a5b in ExecInterpExpr (state=0x12549c8, econtext=0x1253728, 
isnull=) at execExprInterp.c:649
#6  0x0061511f in ExecEvalExprSwitchContext (isNull=0x7ffc45db755f, 
econtext=, state=) at 
../../../src/include/executor/executor.h:307
#7  advance_aggregates (aggstate=0x1253500, aggstate=0x1253500) at nodeAgg.c:679
#8  agg_retrieve_direct (aggstate=0x1253500) at nodeAgg.c:1847
#9  ExecAgg (pstate=0x1253500) at nodeAgg.c:1572
#10 0x0060371b in ExecProcNode (node=0x1253500) at 
../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=, dest=0x12595b8, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, planstate=0x1253500, 
estate=0x12532c0) at execMain.c:1648
#12 standard_ExecutorRun (queryDesc=0x11be580, direction=, 
count=0, execute_once=) at execMain.c:365
#13 0x00748b2b in PortalRunSelect (portal=portal@entry=0x12043a0, 
forward=forward@entry=true, count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x12595b8) at pquery.c:929
#14 0x00749f00 in PortalRun (portal=portal@entry=0x12043a0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, 
run_once=run_once@entry=true, dest=dest@entry=0x12595b8, 
altdest=altdest@entry=0x12595b8, completionTag=0x7ffc45db77e0 "") at 
pquery.c:770
#15 0x0074611f in exec_simple_query (query_string=0x119c820 "select 
sum(length(substr(a,0))) from t;") at postgres.c:1215
#16 0x0074783e in PostgresMain (argc=, 
argv=argv@entry=0x11c84d0, dbname=, username=) at 
postgres.c:4249
#17 0x006d75d9 in BackendRun (port=0x11c09e0, port=0x11c09e0) at 
postmaster.c:4431
#18 BackendStartup (port=0x11c09e0) at postmaster.c:4122
#19 ServerLoop () at postmaster.c:1704
#20 0x006d840c in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x11963e0) at postmaster.c:1377


Re: duplicate key entries for primary key -- need urgent help

2019-07-04 Thread Tomas Vondra

On Thu, Jul 04, 2019 at 05:34:21PM +, Kumar, Pawan (Nokia - IN/Bangalore) 
wrote:

Thanks a lot Tomas for the reply.

Which version are you running, exactly? Whih minor version?
[Pawan]: Its (PostgreSQL) 9.5.9



You're missing 2 years of bugfixes, some of which are addressing data
corruption issues and might have caused this.


sai=> select version();
version
--
PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 
(Red Hat 4.8.5-11), 64-bit
(1 row)

Why do you think it's the issue you linked?

[Pawan]: Because the thread which I shared also has problem statement like 
"Duplicate entries of Primary key" .
If this is also known to this version, I will be appreciating a lot if we have 
some Workaround or config change.



Duplicate entries are clearly some sort of data corruption, but that
might have happened in various ways - it does not mean it's the same
issue. And yes, 9.5.9 has a fix for the issue in the thread you linked.


In our production: See below entries, proc_id is primary key and we can see 
duplicate entries. How it is possible?

sai=> select ctid,proc_id from etl_status where proc_id='2993229';
  ctid   | proc_id
--+-
(381,20) | 2993229
(388,28) | 2993229
(2 rows)

Any idea, how it happened?



No, that's impossible to say without you doing some more investigation.
We need to know when those rows were created, on which version that
happened (the system might have been updated and the corruption predates
might have happened on the previous version), and so on. For example, if
the system crashed or had any significant issues, that might be related
to data corruption issues.

We know nothing about your system, so you'll have to do a bit of
investigation, look for suspicious things, etc.

FWIW it might be a good idea to look for other cases of data corruption.
Both to know the extent of the problem, and to gain insight.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Optimize partial TOAST decompression

2019-07-04 Thread Tomas Vondra

On Thu, Jul 04, 2019 at 11:10:24AM +0200, Andrey Borodin wrote:



3 июля 2019 г., в 18:06, Binguo Bao  написал(а):

Paul Ramsey  于2019年7月2日周二 下午10:46写道:
This looks good to me. A little commentary around why
pglz_maximum_compressed_size() returns a universally correct answer
(there's no way the compressed size can ever be larger than this
because...) would be nice for peasants like myself.
...

Thanks for your comment. I've updated the patch.



Thanks Biguo and Paul! From my POV patch looks ready for committer, so I 
switched state on CF.



I've done a bit of testing and benchmaring on this patch today, and
there's a bug somewhere, making it look like there are corrupted data.

What I'm seeing is this:

CREATE TABLE t (a text);

-- attached is data for one row
COPY t FROM '/tmp/t.data';


SELECT length(substr(a,1000)) from t;
psql: ERROR:  compressed data is corrupted

SELECT length(substr(a,0,1000)) from t;
length 


   999
(1 row)

SELECT length(substr(a,1000)) from t;
psql: ERROR:  invalid memory alloc request size 2018785106

That's quite bizarre behavior - it does work with a prefix, but not with
suffix. And the exact ERROR changes after the prefix query. (Of course,
on master it works in all cases.)

The backtrace (with the patch applied) looks like this:

#0  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2291
#1  toast_decompress_datum (attr=0x12572e0) at tuptoaster.c:2277
#2  0x004c3b08 in heap_tuple_untoast_attr_slice (attr=, 
sliceoffset=0, slicelength=-1) at tuptoaster.c:315
#3  0x0085c1e5 in pg_detoast_datum_slice (datum=, 
first=, count=) at fmgr.c:1767
#4  0x00833b7a in text_substring (str=133761519127512, start=0, 
length=, length_not_specified=) at varlena.c:956
...

I've only observed this with a very small number of rows (the  data is
generated randomly with different compressibility etc.), so I'm only
attaching one row that exhibits this issue.

My guess is toast_fetch_datum_slice() gets confused by the headers or
something, or something like that. FWIW the new code added to this
function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-07-04 Thread Peter Geoghegan
On Thu, Jul 4, 2019 at 5:06 AM Anastasia Lubennikova
 wrote:
> i - number of distinct values in the index.
> So i=1 means that all rows have the same key,
> and i=1000 means that all keys are different.
>
> i / old size (MB) / new size (MB)
> 121588
> 100021590
> 1021571
> 1000214214
>
> For more, see the attached diagram with test results.

I tried this on my own "UK land registry" test data [1], which was
originally used for the v12 nbtree work. My test case has a low
cardinality, multi-column text index. I chose this test case because
it was convenient for me.

On v12/master, the index is 1100MB. Whereas with your patch, it ends
up being 196MB -- over 5.5x smaller!

I also tried it out with the "Mouse genome informatics" database [2],
which was already improved considerably by the v12 work on duplicates.
This is helped tremendously by your patch. It's not quite 5.5x across
the board, of course. There are 187 indexes (on 28 tables), and almost
all of the indexes are smaller. Actually, *most* of the indexes are
*much* smaller. Very often 50% smaller.

I don't have time to do an in-depth analysis of these results today,
but clearly the patch is very effective on real world data. I think
that we tend to underestimate just how common indexes with a huge
number of duplicates are.

[1] 
https://https:/postgr.es/m/cah2-wzn_nayk4pr0hrwo0stwhmxjp5qyu+x8vppt030xpqr...@mail.gmail.com
[2] http://www.informatics.jax.org/software.shtml
--
Peter Geoghegan




Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-04 Thread Fabrízio de Royes Mello
On Thu, Jul 4, 2019 at 5:17 AM Michael Paquier  wrote:
>
> On Tue, Jul 02, 2019 at 11:31:49AM -0300, Fabrízio de Royes Mello wrote:
> > New version attached.
>
> This looks in pretty good shape to me, and no objections from me to
> get those functions as the min() flavor is useful for monitoring WAL
> retention for complex deployments.
>
> Do you have a particular use-case in mind for max() one?  I can think
> of at least one case: monitoring the flush LSNs of a set of standbys
> to find out how much has been replayed at most.
>

I use min/max to measure the amount of generated WAL (diff) during some
periods based on wal position stored in some monitoring system.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-04 Thread Fabrízio de Royes Mello
On Thu, Jul 4, 2019 at 10:57 AM Robert Haas  wrote:
>
> On Thu, Jul 4, 2019 at 4:17 AM Michael Paquier 
wrote:
> > Do you have a particular use-case in mind for max() one?  I can think
> > of at least one case: monitoring the flush LSNs of a set of standbys
> > to find out how much has been replayed at most.
>
> It would be pretty silly to have one and not the other, regardless of
> whether we can think of an immediate use case.
>

+1


--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: duplicate key entries for primary key -- need urgent help

2019-07-04 Thread Tomas Vondra

On Thu, Jul 04, 2019 at 01:37:01PM +, Kumar, Pawan (Nokia - IN/Bangalore) 
wrote:

Hi Guys,

Can you please help here?

Below reported issue in past about duplicate key entries for primary key.
https://www.postgresql.org/message-id/534c8b33.9050...@pgexperts.com

the solution was provided in 9.3 version of postgres but it seems issue is 
still there in 9.5 version which I am running currently.

Can you please let me know if this is also known in 9.5? any fix or Workaround 
please?



Which version are you running, exactly? Whih minor version?

Why do you think it's the issue you linked?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





More tzdb fun: POSIXRULES is being deprecated upstream

2019-07-04 Thread Tom Lane
Paul Eggert, in https://mm.icann.org/pipermail/tz/2019-June/028172.html:
> zic’s -p option was intended as a transition from historical
> System V code that treated TZ="XXXnYYY" as meaning US
> daylight-saving rules in a time zone n hours west of UT,
> with XXX abbreviating standard time and YYY abbreviating DST.
> zic -p allows the tzdata installer to specify (say)
> Europe/Brussels's rules instead of US rules.  This behavior
> is not well documented and often fails in practice; for example it
> does not work with current glibc for contemporary timestamps, and
> it does not work in tzdb itself for timestamps after 2037.
> So, document it as being obsolete, with the intent that it
> will be removed in a future version.  This change does not
> affect behavior of the default installation.

As he says, this doesn't work for post-2038 dates:

regression=# set timezone = 'FOO5BAR';
SET
regression=# select now();
  now  
---
 2019-07-04 11:55:46.905382-04
(1 row)

regression=# select timeofday();
  timeofday  
-
 Thu Jul 04 11:56:14.102770 2019 BAR
(1 row)

regression=# select '2020-07-04'::timestamptz;
  timestamptz   

 2020-07-04 00:00:00-04
(1 row)

regression=# select '2040-07-04'::timestamptz;
  timestamptz   

 2040-07-04 00:00:00-05  <<-- should be -04
(1 row)

and this note makes it clear that the IANA crew aren't planning on fixing
that.  It does work if you write a full POSIX-style DST specification:

regression=# set timezone = 'FOO5BAR,M3.2.0,M11.1.0';
SET
regression=# select '2040-07-04'::timestamptz;
  timestamptz   

 2040-07-04 00:00:00-04
(1 row)

so I think what Eggert has in mind here is that they'll remove the
TZDEFRULES-loading logic and always fall back to TZDEFRULESTRING when
presented with a POSIX-style zone spec that lacks explicit transition
date rules.

So, what if anything should we do about this?  We do document posixrules,
very explicitly, see datatype.sgml around line 2460:

When a daylight-savings zone abbreviation is present,
it is assumed to be used
according to the same daylight-savings transition rules used in the
IANA time zone database's posixrules entry.
In a standard PostgreSQL installation,
posixrules is the same as 
US/Eastern, so
that POSIX-style time zone specifications follow USA daylight-savings
rules.  If needed, you can adjust this behavior by replacing the
posixrules file.

One option is to do nothing until the IANA code actually changes,
but as 2038 gets closer, people are more likely to start noticing
that this "feature" doesn't work as one would expect.

We could get out front of the problem and remove the TZDEFRULES-loading
logic ourselves.  That would be a bit of a maintenance hazard, but perhaps
not too awful, because we already deviate from the IANA code in that area
(we have our own ideas about when/whether to try to load TZDEFRULES).

I don't think we'd want to change this behavior in the back branches,
but it might be OK to do it as a HEAD change.  I think I'd rather do
it like that than be forced into playing catchup when the IANA code
does change.

A more aggressive idea would be to stop supporting POSIX-style timezone
specs altogether, but I'm not sure I like that answer.  Even if we could
get away with it from a users'-eye standpoint, I think we have some
internal dependencies on being able to use such specifications.

regards, tom lane




Re: [PATCH] Implement uuid_version()

2019-07-04 Thread Alvaro Herrera
On 2019-Jul-04, Tom Lane wrote:

> A possible option 3 is to keep the function in pgcrypto but change
> its C code to call the core code.

This seems most reasonable, and is what José Luis proposed upthread.  We
don't have to bump the pgcrypto extension version, as nothing changes
for pgcrypto externally.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




duplicate key entries for primary key -- need urgent help

2019-07-04 Thread Kumar, Pawan (Nokia - IN/Bangalore)
Hi Guys,

Can you please help here?

Below reported issue in past about duplicate key entries for primary key.
https://www.postgresql.org/message-id/534c8b33.9050...@pgexperts.com

the solution was provided in 9.3 version of postgres but it seems issue is 
still there in 9.5 version which I am running currently.

Can you please let me know if this is also known in 9.5? any fix or Workaround 
please?

WBR,
-Pawan




Re: [PATCH] Implement uuid_version()

2019-07-04 Thread Tom Lane
Peter Eisentraut  writes:
> I think the alternatives are:

> 1. We keep the code in both places.  This is fine.  There is no problem
> with having the same C function or the same SQL function name in both
> places.

> 2. We remove the C function from pgcrypto and make an extension version
> bump.  This will create breakage for (some) current users of the
> function from pgcrypto.

> So option 2 would ironically punish the very users we are trying to
> help.  So I think just doing nothing is the best option.

Hm.  Option 1 means that it's a bit unclear which function you are
actually calling.  As long as the implementations behave identically,
that seems okay, but I wonder if that's a constraint we want for the
long term.

A possible option 3 is to keep the function in pgcrypto but change
its C code to call the core code.

regards, tom lane




Re: [PATCH] Implement uuid_version()

2019-07-04 Thread Peter Eisentraut
On 2019-07-02 17:09, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 2019-06-30 14:50, Fabien COELHO wrote:
>>> I'm wondering whether pg_random_uuid() should be taken out of pgcrypto if 
>>> it is available in core?
> 
>> That would probably require an extension version update dance in
>> pgcrypto.  I'm not sure if it's worth that.  Thoughts?
> 
> We have some previous experience with this type of thing when we migrated
> contrib/tsearch2 stuff into core.  I'm too caffeine-deprived to remember
> exactly what we did or how well it worked.  But it seems advisable to go
> study that history, because we could easily make things a mess for users
> if we fail to consider their upgrade experience.

I think in that case we wanted users of the extension to transparently
end up using the in-core code.  This is not the case here: Both the
extension and the proposed in-core code do the same thing and there is
very little code duplication, so having them coexist would be fine in
principle.

I think the alternatives are:

1. We keep the code in both places.  This is fine.  There is no problem
with having the same C function or the same SQL function name in both
places.

2. We remove the C function from pgcrypto and make an extension version
bump.  This will create breakage for (some) current users of the
function from pgcrypto.

So option 2 would ironically punish the very users we are trying to
help.  So I think just doing nothing is the best option.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: "long" type is not appropriate for counting tuples

2019-07-04 Thread Peter Eisentraut
On 2019-07-02 22:56, Tom Lane wrote:
> I took a look through these and see nothing objectionable.  There are
> probably more places that can be improved, but we need not insist on
> getting every such place in one go.
> 
> Per Robert's position that variables ought to have well-defined widths,
> there might be something to be said for not touching the variable
> declarations that you changed from int64 to long long, and instead
> casting them to long long in the sprintf calls.  But I'm not really
> convinced that that's better than what you've done.
> 
> Marked CF entry as ready-for-committer.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-04 Thread Tomas Vondra

On Thu, Jul 04, 2019 at 09:29:49AM -0400, James Coleman wrote:

On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
 wrote:


On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
>
>Unrelated: if you or someone else you know that's more familiar with
>the parallel code, I'd be interested in their looking at the patch at
>some point, because I have a suspicion it might not be operating in

...

So I've looked into that, and the reason seems fairly simple - when
generating the Gather Merge paths, we only look at paths that are in
partial_pathlist. See generate_gather_paths().

And we only have sequential + index paths in partial_pathlist, not
incremental sort paths.

IMHO we can do two things:

1) modify generate_gather_paths to also consider incremental sort for
each sorted path, similarly to what create_ordered_paths does

2) modify build_index_paths to also generate an incremental sort path
for each index path

IMHO (1) is the right choice here, because it automatically does the
trick for all other types of ordered paths, not just index scans. So,
something like the attached patch, which gives me plans like this:

...

But I'm not going to claim those are total fixes, it's the minimum I
needed to do to make this particular type of plan work.


Thanks for looking into this!

I intended to apply this to my most recent version of the patch (just
sent a few minutes ago), but when I apply it I noticed that the
partition_aggregate regression tests have several of these failures:

ERROR:  could not find pathkey item to sort

I haven't had time to look into the cause yet, so I decided to wait
until the next patch revision.



FWIW I don't claim the patch I shared is complete and/or 100% correct.
It was more an illustration of the issue and the smallest patch to make
a particular query work. The test failures are a consequence of that.

I'll try looking into the failures over the next couple of days, but I
can't promise anything.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Memory-Bounded Hash Aggregation

2019-07-04 Thread Tomas Vondra

On Wed, Jul 03, 2019 at 07:03:06PM -0700, Jeff Davis wrote:

On Wed, 2019-07-03 at 02:17 +0200, Tomas Vondra wrote:

What does "partitioned hash strategy" do? It's probably explained in
one
of the historical discussions, but I'm not sure which one. I assume
it
simply hashes the group keys and uses that to partition the data, and
then
passing it to hash aggregate.


Yes. When spilling, it is cheap to partition on the hash value at the
same time, which dramatically reduces the need to spill multiple times.
Previous discussions:



Unfortunately the second link does not work :-(


It's supposed to be:


https://www.postgresql.org/message-id/CAGTBQpa__-NP7%3DkKwze_enkqw18vodRxKkOmNhxAPzqkruc-8g%40mail.gmail.com



I'm not going to block Approach 1, althought I'd really like to see
something that helps with array_agg.


I have a WIP patch that I just posted. It doesn't yet work with
ARRAY_AGG, but I think it can be made to work by evicting the entire
hash table, serializing the transition states, and then later combining
them.


Aren't all three approaches a way to "fix" hash aggregate? In any
case,
it's certainly reasonable to make incremental changes. The question
is
whether "approach 1" is sensible step towards some form of "approach
3"


Disk-based hashing certainly seems like a reasonable algorithm on paper
that has some potential advantages over sorting. It certainly seems
sensible to me that we explore the disk-based hashing strategy first,
and then we would at least know what we are missing (if anything) by
going with the hybrid approach later.

There's also a fair amount of design space to explore in the hybrid
strategy. That could take a while to converge, especially if we don't
have anything in place to compare against.



Makes sense. I haven't thought about how the hybrid approach would be
implemented very much, so I can't quite judge how complicated would it be
to extend "approach 1" later. But if you think it's a sensible first step,
I trust you. And I certainly agree we need something to compare the other
approaches against.



> * It means we have a hash table and sort running concurrently, each
>  using memory. Andres said this might not be a problem[3], but I'm
>  not convinced that the problem is zero. If you use small work_mem
>  for the write phase of sorting, you'll end up with a lot of runs
> to
>  merge later and that has some kind of cost.
>

Why would we need to do both concurrently? I thought we'd empty the
hash
table before doing the sort, no?


So you are saying we spill the tuples into a tuplestore, then feed the
tuplestore through a tuplesort? Seems inefficient, but I guess we can.



I think the question is whether we see this as "emergency fix" (for cases
that are misestimated and could/would fail with OOM at runtime), or as
something that is meant to make "hash agg" more widely applicable.

I personally see it as an emergency fix, in which cases it's perfectly
fine if it's not 100% efficient, assuming it kicks in only rarely.
Effectively, we're betting on hash agg, and from time to time we lose.

But even if we see it as a general optimization technique it does not have
to be perfectly efficient, as long as it's properly costed (so the planner
only uses it when appropriate).

If we have a better solution (in terms of efficiency, code complexity,
etc.) then sure - let's use that. But considering we've started this
discussion in ~2015 and we still don't have anything, I wouldn't hold my
breath. Let's do something good enough, and maybe improve it later.


Do we actually need to handle that case? How many such aggregates are
there? I think it's OK to just ignore that case (and keep doing what
we do
now), and require serial/deserial functions for anything better.


Punting on a few cases is fine with me, if the user has a way to fix
it.



+1 to doing that


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-04 Thread Robert Haas
On Thu, Jul 4, 2019 at 4:17 AM Michael Paquier  wrote:
> Do you have a particular use-case in mind for max() one?  I can think
> of at least one case: monitoring the flush LSNs of a set of standbys
> to find out how much has been replayed at most.

It would be pretty silly to have one and not the other, regardless of
whether we can think of an immediate use case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Binary support for pgoutput plugin

2019-07-04 Thread Dave Cramer
On Wed, 5 Jun 2019 at 18:50, Andres Freund  wrote:

> Hi,
>
> On 2019-06-05 18:47:57 -0400, Dave Cramer wrote:
> > So one of the things they would like added is to get not null information
> > in the schema record. This is so they can mark the field Optional in
> Java.
> > I presume this would also have some uses in other languages. As I
> > understand it this would require a protocol bump. If this were to be
> > accepted are there any outstanding asks that would useful to add if we
> were
> > going to bump the protocol?
>
> I'm pretty strongly opposed to this. What's the limiting factor when
> adding such information? I think clients that want something like this
> ought to query the database for catalog information when getting schema
> information.
>
>
So talking some more to the guys that want to use this for Change Data
Capture they pointed out that if the schema changes while they are offline
there is no way to query the catalog information

Dave


>


Re: SQL/JSON path issues/questions

2019-07-04 Thread Liudmila Mantrova


On 7/3/19 11:59 PM, Alexander Korotkov wrote:

Hi!

On Wed, Jul 3, 2019 at 5:27 PM Liudmila Mantrova
 wrote:

I have rechecked the standard and I agree that we should use "filter
expression" whenever possible.
"A filter expression must be enclosed in parentheses..." looks like an
oversight, so I fixed it. As for what's actually enclosed, I believe we
can still use the word "condition" here as it's easy to understand and
is already used in our docs, e.g. in description of the WHERE clause
that serves a similar purpose.
The new version of the patch fixes the terminology, tweaks the examples,
and provides some grammar and style fixes in the jsonpath-related chapters.


It looks good to me.  But this sentence looks a bit too complicated.

"It can be followed by one or more accessor operators to define the
JSON element on a lower nesting level by which to filter the result."

Could we phrase this as following?

"In order to filter the result by values lying on lower nesting level,
@ operator can be followed by one or more accessor operators."

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Thank  you!

I think we can make this sentence even shorter, the fix is attached:

"To refer to a JSON element stored at a lower nesting level, add one or 
more accessor operators after @."



--
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3a8581d..6d2aefb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -11538,7 +11538,8 @@ table2-mapping
from the JSON data, similar to XPath expressions used
for SQL access to XML. In PostgreSQL,
path expressions are implemented as the jsonpath
-   data type, described in .
+   data type and can use any elements described in
+   .
   
 
   JSON query functions and operators
@@ -11585,7 +11586,7 @@ table2-mapping
   },
   { "location":   [ 47.706, 13.2635 ],
 "start time": "2018-10-14 10:39:21",
-"HR": 130
+"HR": 135
   } ]
   }
 }
@@ -11637,23 +11638,33 @@ table2-mapping
 
   
When defining the path, you can also use one or more
-   filter expressions, which work similar to
-   the WHERE clause in SQL. Each filter expression
-   can provide one or more filtering conditions that are applied
-   to the result of the path evaluation. Each filter expression must
-   be enclosed in parentheses and preceded by a question mark.
-   Filter expressions are evaluated from left to right and can be nested.
-   The @ variable denotes the current path evaluation
-   result to be filtered, and can be followed by one or more accessor
-   operators to define the JSON element by which to filter the result.
-   Functions and operators that can be used in the filtering condition
-   are listed in .
-   SQL/JSON defines three-valued logic, so the result of the filter
-   expression may be true, false,
+   filter expressions that work similar to the
+   WHERE clause in SQL. A filter expression begins with
+   a question mark and provides a condition in parentheses:
+
+
+? (condition)
+
+  
+
+  
+   Filter expressions must be specified right after the path evaluation step
+   to which they are applied. The result of this step is filtered to include
+   only those items that satisfy the provided condition. SQL/JSON defines
+   three-valued logic, so the condition can be true, false,
or unknown. The unknown value
-   plays the same role as SQL NULL. Further path
+   plays the same role as SQL NULL and can be tested
+   for with the is unknown predicate. Further path
evaluation steps use only those items for which filter expressions
-   return true.
+   return true.
+  
+
+  
+   Functions and operators that can be used in filter expressions are listed
+   in . The path
+   evaluation result to be filtered is denoted by the @
+   variable. To refer to a JSON element stored at a lower nesting level,
+   add one or more accessor operators after @.
   
 
   
@@ -11667,8 +11678,8 @@ table2-mapping
   
To get the start time of segments with such values instead, you have to
filter out irrelevant segments before returning the start time, so the
-   filter is applied to the previous step and the path in the filtering
-   condition is different:
+   filter expression is applied to the previous step, and the path used
+   in the condition is different:
 
 '$.track.segments[*] ? (@.HR  130)."start time"'
 
@@ -11693,9 +11704,9 @@ table2-mapping
   
 
   
-   You can also nest filters within each other:
+   You can also nest filter expressions within each other:
 
-'$.track ? (@.segments[*] ? (@.HR  130)).segments.size()'
+'$.track ? (exists(@.segments[*] ? (@.HR  130))).segments.size()'
 
This expression returns the size of the track if it contains any
segments with high heart rate values, or an empty sequence otherwise.
@@ 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-04 Thread James Coleman
On Tue, Jun 25, 2019 at 7:22 PM Tomas Vondra
 wrote:
>
> On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote:
> >
> >Unrelated: if you or someone else you know that's more familiar with
> >the parallel code, I'd be interested in their looking at the patch at
> >some point, because I have a suspicion it might not be operating in
...
> So I've looked into that, and the reason seems fairly simple - when
> generating the Gather Merge paths, we only look at paths that are in
> partial_pathlist. See generate_gather_paths().
>
> And we only have sequential + index paths in partial_pathlist, not
> incremental sort paths.
>
> IMHO we can do two things:
>
> 1) modify generate_gather_paths to also consider incremental sort for
> each sorted path, similarly to what create_ordered_paths does
>
> 2) modify build_index_paths to also generate an incremental sort path
> for each index path
>
> IMHO (1) is the right choice here, because it automatically does the
> trick for all other types of ordered paths, not just index scans. So,
> something like the attached patch, which gives me plans like this:
...
> But I'm not going to claim those are total fixes, it's the minimum I
> needed to do to make this particular type of plan work.

Thanks for looking into this!

I intended to apply this to my most recent version of the patch (just
sent a few minutes ago), but when I apply it I noticed that the
partition_aggregate regression tests have several of these failures:

ERROR:  could not find pathkey item to sort

I haven't had time to look into the cause yet, so I decided to wait
until the next patch revision.

James Coleman




Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2019-07-04 Thread Anastasia Lubennikova

The new version of the patch is attached.
This version is even simpler than the previous one,
thanks to the recent btree design changes and all the feedback I received.
I consider it ready for review and testing.

[feature overview]
This patch implements the deduplication of btree non-pivot tuples on 
leaf pages

in a manner similar to GIN index "posting lists".

Non-pivot posting tuple has following format:
t_tid | t_info | key values | posting_list[]

Where t_tid and t_info fields are used to store meta info
about tuple's posting list.
posting list is an array of ItemPointerData.

Currently, compression is applied to all indexes except system indexes, 
unique

indexes, and indexes with included columns.

On insertion, compression applied not to each tuple, but to the page before
split. If the target page is full, we try to compress it.

[benchmark results]
idx ON tbl(c1);
index contains 1000 integer values

i - number of distinct values in the index.
So i=1 means that all rows have the same key,
and i=1000 means that all keys are different.

i / old size (MB) / new size (MB)
1            215    88
1000        215    90
10        215    71
1000    214    214

For more, see the attached diagram with test results.

[future work]
Many things can be improved in this feature.
Personally, I'd prefer to keep this patch as small as possible
and work on other improvements after a basic part is committed.
Though, I understand that some of these can be considered essential
for this patch to be approved.

1. Implement a split of the posting tuples on a page split.
2. Implement microvacuum of posting tuples.
3. Add a flag into pg_index, which allows enabling/disabling compression
for a particular index.
4. Implement posting list compression.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 602f884..fce499b 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -20,6 +20,7 @@
 #include "access/tableam.h"
 #include "access/transam.h"
 #include "access/xloginsert.h"
+#include "catalog/catalog.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -56,6 +57,8 @@ static void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf,
 static bool _bt_pgaddtup(Page page, Size itemsize, IndexTuple itup,
 		 OffsetNumber itup_off);
 static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel);
+static bool insert_itupprev_to_page(Page page, BTCompressState *compressState);
+static void _bt_compress_one_page(Relation rel, Buffer buffer, Relation heapRel);
 
 /*
  *	_bt_doinsert() -- Handle insertion of a single index tuple in the tree.
@@ -759,6 +762,12 @@ _bt_findinsertloc(Relation rel,
 			_bt_vacuum_one_page(rel, insertstate->buf, heapRel);
 			insertstate->bounds_valid = false;
 		}
+
+		/*
+		 * If the target page is full, try to compress the page
+		 */
+		if (PageGetFreeSpace(page) < insertstate->itemsz)
+			_bt_compress_one_page(rel, insertstate->buf, heapRel);
 	}
 	else
 	{
@@ -806,6 +815,11 @@ _bt_findinsertloc(Relation rel,
 			}
 
 			/*
+			 * Before considering moving right, try to compress the page
+			 */
+			_bt_compress_one_page(rel, insertstate->buf, heapRel);
+
+			/*
 			 * Nope, so check conditions (b) and (c) enumerated above
 			 *
 			 * The earlier _bt_check_unique() call may well have established a
@@ -2286,3 +2300,232 @@ _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel)
 	 * the page.
 	 */
 }
+
+/*
+ * Add new item (compressed or not) to the page, while compressing it.
+ * If insertion failed, return false.
+ * Caller should consider this as compression failure and
+ * leave page uncompressed.
+ */
+static bool
+insert_itupprev_to_page(Page page, BTCompressState *compressState)
+{
+	IndexTuple to_insert;
+	OffsetNumber offnum =  PageGetMaxOffsetNumber(page);
+
+	if (compressState->ntuples == 0)
+		to_insert = compressState->itupprev;
+	else
+	{
+		IndexTuple postingtuple;
+		/* form a tuple with a posting list */
+		postingtuple = BTreeFormPostingTuple(compressState->itupprev,
+			compressState->ipd,
+			compressState->ntuples);
+		to_insert = postingtuple;
+		pfree(compressState->ipd);
+	}
+
+	/* Add the new item into the page */
+	offnum = OffsetNumberNext(offnum);
+
+	elog(DEBUG4, "insert_itupprev_to_page. compressState->ntuples %d IndexTupleSize %zu free %zu",
+			 compressState->ntuples, IndexTupleSize(to_insert), PageGetFreeSpace(page));
+
+	if (PageAddItem(page, (Item) to_insert, IndexTupleSize(to_insert),
+	offnum, false, false) == InvalidOffsetNumber)
+	{
+		elog(DEBUG4, "insert_itupprev_to_page. failed");
+		/*
+		 * this may happen if tuple is bigger than freespace
+		 * fallback to uncompressed page case
+		 */
+		if (compressState->ntuples > 0)
+			pfree(to_insert);
+		return false;
+	}

run os command in pg_regress?

2019-07-04 Thread Alex
In my case, I want to sleep 3 seconds in xxx.sql for pg_regress program.
but I don't want to run 'select pg_sleep(3)' .   so it is possible for
pg_regress?

in psql, I can run \! sleep(3); exit;

but looks pg_regress doesn't support it.


Re: Minimal logical decoding on standbys

2019-07-04 Thread Amit Khandekar
On Thu, 4 Jul 2019 at 17:21, Amit Khandekar  wrote:
>
> On Thu, 4 Jul 2019 at 15:52, tushar  wrote:
> >
> > On 07/01/2019 11:04 AM, Amit Khandekar wrote:
> >
> > Also, in the updated patch (v11), I have added some scenarios that
> > verify that slot is dropped when either master wal_level is
> > insufficient, or when slot is conflicting. Also organized the test
> > file a bit.
> >
> > One scenario where replication slot removed even after fixing the problem 
> > (which Error message suggested to do)
>
> Which specific problem are you referring to ? Removing a conflicting
> slot, itself is the part of the fix for the conflicting slot problem.
>
> >
> > Please refer this below scenario
> >
> > Master cluster-
> > postgresql,conf file
> > wal_level=logical
> > hot_standby_feedback = on
> > port=5432
> >
> > Standby cluster-
> > postgresql,conf file
> > wal_level=logical
> > hot_standby_feedback = on
> > port=5433
> >
> > both Master/Slave cluster are up and running and are in SYNC with each other
> > Create a logical replication slot on SLAVE ( SELECT * from   
> > pg_create_logical_replication_slot('m', 'test_decoding'); )
> >
> > change wal_level='hot_standby' on Master postgresql.conf file / restart the 
> > server
> > Run get_changes function on Standby -
> > postgres=# select * from pg_logical_slot_get_changes('m',null,null);
> > ERROR:  logical decoding on standby requires wal_level >= logical on master
> >
> > Correct it on Master postgresql.conf file ,i.e set  wal_level='logical'  
> > again / restart the server
> > and again fire  get_changes function on Standby -
> > postgres=# select * from pg_logical_slot_get_changes('m',null,null);
> > ERROR:  replication slot "m" does not exist
> >
> > This looks little weird as slot got dropped/removed internally . i guess it 
> > should get invalid rather than removed automatically.
> > Lets user's  delete the slot themself rather than automatically removed  as 
> > a surprise.
>
> It was earlier discussed about what action should be taken when we
> find conflicting slots. Out of the options, one was to drop the slot,
> and we chose that because that was simple. See this :
> https://www.postgresql.org/message-id/flat/20181212204154.nsxf3gzqv3gesl32%40alap3.anarazel.de

Sorry, the above link is not the one I wanted to refer to. Correct one is this :

https://www.postgresql.org/message-id/20181214005521.jaty2d24lz4nroil%40alap3.anarazel.de

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company




Re: POC: Cleaning up orphaned files using undo logs

2019-07-04 Thread Amit Kapila
On Mon, Jul 1, 2019 at 1:24 PM Thomas Munro  wrote:
>
> Another small change/review: the function UndoLogGetNextInsertPtr()
> previously took a transaction ID, but I'm not sure if that made sense,
> I need to think about it some more.
>

The changes you have made related to UndoLogGetNextInsertPtr() doesn't
seem correct to me.

@@ -854,7 +854,9 @@ FindUndoEndLocationAndSize(UndoRecPtr start_urecptr,
  * has already started in this log then lets re-fetch the undo
  * record.
  */
- next_insert = UndoLogGetNextInsertPtr(slot->logno, uur->uur_xid);
+ next_insert = UndoLogGetNextInsertPtr(slot->logno);
+
+ /* TODO this can't happen */
  if (!UndoRecPtrIsValid(next_insert))

I think this is a possible case.  Say while the discard worker tries
to register the rollback request from some log and after it fetches
the undo record corresponding to start location in this function,
another backend adds the new transaction undo.  The same is mentioned
in comments as well.   Can you explain what makes you think that this
can't happen?  If we don't want to pass the xid to
UndoLogGetNextInsertPtr, then I think we need to get the insert
location before fetching the record.  I will think more on it to see
if there is any other problem with the same.

2.
@@ -167,25 +205,14 @@ UndoDiscardOneLog(UndoLogSlot *slot,
TransactionId xmin, bool *hibernate)
+ if (!TransactionIdIsValid(wait_xid) && !pending_abort)
  {
  UndoRecPtr next_insert = InvalidUndoRecPtr;
- /*
- * If more undo has been inserted since we checked last, then
- * we can process that as well.
- */
- next_insert = UndoLogGetNextInsertPtr(logno, undoxid);
- if (!UndoRecPtrIsValid(next_insert))
- continue;
+ next_insert = UndoLogGetNextInsertPtr(logno);

This change is also not safe.  It can lead to discarding the undo of
some random transaction because a new undo records from some other
transaction would have been added since we last fetched the undo
record.  This can be fixed by just removing the call to
UndoLogGetNextInsertPtr.  I have done so in undoprocessing branch and
added the comment as well.

I think the common problem with the above changes is that it assumes
that new undo can't be added to undo logs while discard worker is
processing them.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Minimal logical decoding on standbys

2019-07-04 Thread Amit Khandekar
On Thu, 4 Jul 2019 at 15:52, tushar  wrote:
>
> On 07/01/2019 11:04 AM, Amit Khandekar wrote:
>
> Also, in the updated patch (v11), I have added some scenarios that
> verify that slot is dropped when either master wal_level is
> insufficient, or when slot is conflicting. Also organized the test
> file a bit.
>
> One scenario where replication slot removed even after fixing the problem 
> (which Error message suggested to do)

Which specific problem are you referring to ? Removing a conflicting
slot, itself is the part of the fix for the conflicting slot problem.

>
> Please refer this below scenario
>
> Master cluster-
> postgresql,conf file
> wal_level=logical
> hot_standby_feedback = on
> port=5432
>
> Standby cluster-
> postgresql,conf file
> wal_level=logical
> hot_standby_feedback = on
> port=5433
>
> both Master/Slave cluster are up and running and are in SYNC with each other
> Create a logical replication slot on SLAVE ( SELECT * from   
> pg_create_logical_replication_slot('m', 'test_decoding'); )
>
> change wal_level='hot_standby' on Master postgresql.conf file / restart the 
> server
> Run get_changes function on Standby -
> postgres=# select * from pg_logical_slot_get_changes('m',null,null);
> ERROR:  logical decoding on standby requires wal_level >= logical on master
>
> Correct it on Master postgresql.conf file ,i.e set  wal_level='logical'  
> again / restart the server
> and again fire  get_changes function on Standby -
> postgres=# select * from pg_logical_slot_get_changes('m',null,null);
> ERROR:  replication slot "m" does not exist
>
> This looks little weird as slot got dropped/removed internally . i guess it 
> should get invalid rather than removed automatically.
> Lets user's  delete the slot themself rather than automatically removed  as a 
> surprise.

It was earlier discussed about what action should be taken when we
find conflicting slots. Out of the options, one was to drop the slot,
and we chose that because that was simple. See this :
https://www.postgresql.org/message-id/flat/20181212204154.nsxf3gzqv3gesl32%40alap3.anarazel.de

By the way, you are getting the "logical decoding on standby requires
wal_level >= logical on master" error while using the slot, which is
because we reject the command even before checking the existence of
the slot. Actually the slot is already dropped due to master
wal_level. Then when you correct the master wal_level, the command is
not rejected, and proceeds to use the slot and then it is found that
the slot does not exist.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company




Re: Feature: Add Greek language fulltext search

2019-07-04 Thread Peter Eisentraut
On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> Last November snowball added support for Greek language [1]. Following
> the instructions [2], I wrote a patch that adds fulltext search for
> Greek in Postgres. The patch is attached. 

I have committed a full sync from the upstream snowball repository,
which pulled in the new greek stemmer.

Could you please clarify where you got the stopword list from?  The
README says those need to be downloaded separately, but I wasn't able to
find the download location.  It would be good to document this, for
example in the commit message.  I haven't committed the stopword list yet.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: [PATCH] Speedup truncates of relation forks

2019-07-04 Thread Jamison, Kirk
On Tuesday, July 2, 2019 4:59 PM (GMT+9), Masahiko Sawada wrote:
> Thank you for updating the patch. Here is the review comments for v2 patch.

Thank you so much for review!
I indicated the addressed parts below and attached the updated patch.

---
visibilitymap.c: visibilitymap_truncate()

> I don't think we should describe about the previous behavior here.
> Rather we need to describe what visibilitymap_mark_truncate does and what
> it returns to the caller.
>
> I'm not sure that visibilitymap_mark_truncate function name is appropriate
> here since it actually truncate map bits, not only marking. Perhaps we can
> still use visibilitymap_truncate or change to
> visibilitymap_truncate_prepare, or something? Anyway, this function
> truncates only tail bits in the last remaining map page and we can have a
> rule that the caller must call smgrtruncate() later to actually truncate
> pages.
> 
> The comment of second paragraph is now out of date since this function no
> longer sends smgr invalidation message.

(1) I updated function name to "visibilitymap_truncate_prepare()" as suggested.
I think that parameter name fits, unless other reviewers suggest a better name.
I also updated its description more accurately: describing current behavior,
caller must eventually call smgrtruncate() to actually truncate vm pages,
and removed the outdated description.


> Is it worth to leave the current visibilitymap_truncate function as a shortcut
> function, instead of replacing? That way we don't need to change
> pg_truncate_visibility_map function.

(2) Yeah, it's kinda displeasing that I had to add lines in 
pg_truncate_visibility_map.
By any chance, re: shortcut function, you meant to retain the function
visibilitymap_truncate() and just add another visibilitymap_truncate_prepare(),
isn't it? I'm not sure if it's worth the additional lines of adding
another function in visibilitymap.c, that's why I just updated the function 
itself
which just adds 10 lines to pg_truncate_visibility_map anyway.
Hmm. If it's not wrong to do it this way, then I will retain this change.
OTOH, if pg_visibility.c *must* not be modified, then I'll follow your advice.



pg_visibility.c: pg_truncate_visibility_map()

> @@ -383,6 +383,10 @@ pg_truncate_visibility_map(PG_FUNCTION_ARGS)
>  {
> Oid relid = PG_GETARG_OID(0);
> Relationrel;
> +   ForkNumber  forks[MAX_FORKNUM];
> +   BlockNumber blocks[MAX_FORKNUM];
> +   BlockNumber newnblocks = InvalidBlockNumber;
> +   int nforks = 0;
> 
> Why do we need the array of forks and blocks here? I think it's enough to
> have one fork and one block number.

(3) Thanks for the catch. Updated.



freespace.c: MarkFreeSpaceMapTruncateRel()

> The same comments are true for MarkFreeSpaceMapTruncateRel.

> +   BlockNumber new_nfsmblocks = InvalidBlockNumber;/* FSM
> blocks */
> +   BlockNumber newnblocks = InvalidBlockNumber;/* VM
> blocks */
> +   int nforks = 0;
> 
> I think that we can have new_nfsmblocks and new_nvmblocks and wipe out the
> comments.

(4) I updated the description accordingly, describing only the current behavior.
The caller must eventually call smgrtruncate() to actually truncate fsm pages.
I also removed the outdated description and irrelevant comments.


--
storage.c: RelationTruncate()

> +* We might as well update the local smgr_fsm_nblocks and
> smgr_vm_nblocks
> +* setting. smgrtruncate sent an smgr cache inval message,
> which will cause
> +* other backends to invalidate their copy of smgr_fsm_nblocks and
> +* smgr_vm_nblocks, and this one too at the next command
> boundary. But this
> +* ensures it isn't outright wrong until then.
> +*/
> +   if (rel->rd_smgr)
> +   {
> +   rel->rd_smgr->smgr_fsm_nblocks = new_nfsmblocks;
> +   rel->rd_smgr->smgr_vm_nblocks = newnblocks;
> +   }
> 
> new_nfsmblocks and newnblocks could be InvalidBlockNumber when the forks are
> already enough small. So the above code sets incorrect values to
> smgr_{fsm,vm}_nblocks.
> 
> Also, I wonder if we can do the above code in smgrtruncate. Otherwise we 
> always
> need to set smgr_{fsm,vm}_nblocks after smgrtruncate, which is inconvenient.

(5) 
In my patch, did you mean that there's a possibility that these values
were already set to InvalidBlockNumber even before I did the setting, is it? 
I'm not sure if IIUC, the point of the above code is to make sure that
smgr_{fsm,vm}_nblocks are not InvalidBlockNumber until the next command
boundary, and while we don't reach that boundary yet, we make sure
these values are valid within that window. Is my understanding correct?
Maybe following your advice that putting it inside the smgrtruncate loop
will make these values correct.
For example, below?

void smgrtruncate
{
...
CacheInvalidateSmgr(reln->smgr_rnode);

/* Do 

Re: Index Skip Scan

2019-07-04 Thread Thomas Munro
On Wed, Jul 3, 2019 at 12:27 AM David Rowley
 wrote:
> On Tue, 2 Jul 2019 at 21:00, Thomas Munro  wrote:
> > 2.  SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if
> > you're allowed to go forwards.  Same for SELECT i, MAX(j) FROM t GROUP
> > BY i if you're allowed to go backwards.  Those queries are equivalent
> > to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC]
> > (though as Floris noted, the backwards version gives the wrong answers
> > with v20).  That does seem like a much more specific thing applicable
> > only to MIN and MAX, and I think preprocess_minmax_aggregates() could
> > be taught to handle that sort of query, building an index only scan
> > path with skip scan in build_minmax_path().
>
> For the MIN query you just need a path with Pathkeys: { i ASC, j ASC
> }, UniqueKeys: { i, j }, doing the MAX query you just need j DESC.

While updating the Loose Index Scan wiki page with links to other
products' terminology on this subject, I noticed that MySQL can
skip-scan MIN() and MAX() in the same query.  Hmm.  That seems quite
desirable.  I think it requires a new kind of skipping: I think you
have to be able to skip to the first AND last key that has each
distinct prefix, and then stick a regular agg on top to collapse them
into one row.  Such a path would not be so neatly describable by
UniqueKeys, or indeed by the amskip() interface in the current patch.
I mention all this stuff not because I want us to run before we can
walk, but because to be ready to commit the basic distinct skip scan
feature, I think we should know approximately how it'll handle the
future stuff we'll need.

-- 
Thomas Munro
https://enterprisedb.com




Re: Minimal logical decoding on standbys

2019-07-04 Thread tushar

On 07/01/2019 11:04 AM, Amit Khandekar wrote:

Also, in the updated patch (v11), I have added some scenarios that
verify that slot is dropped when either master wal_level is
insufficient, or when slot is conflicting. Also organized the test
file a bit.


One scenario where replication slot removed even after fixing the 
problem (which Error message suggested to do)


Please refer this below scenario

Master cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5432

Standby cluster-
postgresql,conf file
wal_level=logical
hot_standby_feedback = on
port=5433

both Master/Slave cluster are up and running and are in SYNC with each other
Create a logical replication slot on SLAVE ( SELECT * from 
pg_create_logical_replication_slot('m', 'test_decoding'); )


change wal_level='hot_standby' on Master postgresql.conf file / restart 
the server

Run get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
ERROR:  logical decoding on standby requires wal_level >= logical on master

Correct it on Master postgresql.conf file ,i.e set wal_level='logical'  
again / restart the server

and again fire  get_changes function on Standby -
postgres=# select * from pg_logical_slot_get_changes('m',null,null);
*ERROR:  replication slot "m" does not exist

*This looks little weird as slot got dropped/removed internally . i 
guess it should get invalid rather than removed automatically.
Lets user's  delete the slot themself rather than automatically removed  
as a surprise.


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company



Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-07-04 Thread Julien Rouhaud
On Tue, May 28, 2019 at 6:57 AM Amit Langote
 wrote:
>
> On 2019/05/27 21:56, Tom Lane wrote:
> > Amit Langote  writes:
> >> On 2019/05/24 23:28, Tom Lane wrote:
> >>> So my thought, if we want to do something about this, is not "find
> >>> some things we can pfree at the end of planning" but "find a way
> >>> to use a separate context for each statement in the query string".
> >
> >> Maybe like the attached?  I'm not sure if we need to likewise be concerned
> >> about exec_sql_string() being handed multi-query strings.

the whole extension sql script is passed to execute_sql_string(), so I
think that it's a good thing to have similar workaround there.

About the patch:

 -* Switch to appropriate context for constructing querytrees (again,
-* these must outlive the execution context).
+* Switch to appropriate context for constructing querytrees.
+* Memory allocated during this construction is released before
+* the generated plan is executed.

The comment should mention query and plan trees, everything else seems ok to me.




Re: Optimize partial TOAST decompression

2019-07-04 Thread Andrey Borodin


> 3 июля 2019 г., в 18:06, Binguo Bao  написал(а):
> 
> Paul Ramsey  于2019年7月2日周二 下午10:46写道:
> This looks good to me. A little commentary around why
> pglz_maximum_compressed_size() returns a universally correct answer
> (there's no way the compressed size can ever be larger than this
> because...) would be nice for peasants like myself.
> ...
> 
> Thanks for your comment. I've updated the patch.


Thanks Biguo and Paul! From my POV patch looks ready for committer, so I 
switched state on CF.

Best regards, Andrey Borodin.



Re: pg_waldump and PREPARE

2019-07-04 Thread Julien Rouhaud
On Thu, Jul 4, 2019 at 9:45 AM Michael Paquier  wrote:
>
> On Wed, Jul 03, 2019 at 08:23:44PM +0200, Julien Rouhaud wrote:
> > So the patch compiles and works as intended. I don't have much to say
> > about it, it all looks good to me, since the concerns about xactdesc.c
> > aren't worth the trouble.
> >
> > I'm not sure that I understand Michael's objection though, as
> > xl_xact_prepare is not a new definition and AFAICS it couldn't contain
> > the records anyway.  So I'll let him say if he has further objections
> > or if it's ready for committer!
>
> This patch provides parsing information only for the header of the 2PC
> record.  Wouldn't it be interesting to get more information from the
> various TwoPhaseRecordOnDisk's callbacks?  We could also print much
> more information in xact_desc_prepare().  Like the subxacts, the XID,
> the invalidation messages and the delete-on-abort/commit rels.

Most of those are already described in the COMMIT PREPARE message,
wouldn't that be redundant?  abortrels aren't displayed anywhere
though, so +1 for adding them.

I also see that the dbid isn't displayed in any of the 2PC message,
that'd be useful to have it directly instead of looking for it in
other messages for the same transaction.




Re: Introduce MIN/MAX aggregate functions to pg_lsn

2019-07-04 Thread Michael Paquier
On Tue, Jul 02, 2019 at 11:31:49AM -0300, Fabrízio de Royes Mello wrote:
> New version attached.

This looks in pretty good shape to me, and no objections from me to
get those functions as the min() flavor is useful for monitoring WAL
retention for complex deployments.

Do you have a particular use-case in mind for max() one?  I can think
of at least one case: monitoring the flush LSNs of a set of standbys
to find out how much has been replayed at most.
--
Michael


signature.asc
Description: PGP signature


Re: extension patch of CREATE OR REPLACE TRIGGER

2019-07-04 Thread Michael Paquier
On Wed, Jul 03, 2019 at 04:37:00AM +, osumi.takami...@fujitsu.com wrote:
> I really appreciate your comments.
> Recently, I'm very busy because of my work.
> So, I'll fix this within a couple of weeks.

Please note that I have switched the patch as waiting on author.
--
Michael


signature.asc
Description: PGP signature


Re: pg_waldump and PREPARE

2019-07-04 Thread Michael Paquier
On Wed, Jul 03, 2019 at 08:23:44PM +0200, Julien Rouhaud wrote:
> So the patch compiles and works as intended. I don't have much to say
> about it, it all looks good to me, since the concerns about xactdesc.c
> aren't worth the trouble.
> 
> I'm not sure that I understand Michael's objection though, as
> xl_xact_prepare is not a new definition and AFAICS it couldn't contain
> the records anyway.  So I'll let him say if he has further objections
> or if it's ready for committer!

This patch provides parsing information only for the header of the 2PC
record.  Wouldn't it be interesting to get more information from the
various TwoPhaseRecordOnDisk's callbacks?  We could also print much
more information in xact_desc_prepare().  Like the subxacts, the XID,
the invalidation messages and the delete-on-abort/commit rels.
--
Michael


signature.asc
Description: PGP signature


Re: Refactoring base64 encoding and decoding into a safer interface

2019-07-04 Thread Michael Paquier
On Tue, Jul 02, 2019 at 09:56:03AM +0200, Daniel Gustafsson wrote:
> Looks good, passes tests, provides value to the code.  Bumping this to ready
> for committer as I no more comments to add.

Thanks.  I have spent more time testing the different error paths and
the new checks in base64.c, and committed the thing.
--
Michael


signature.asc
Description: PGP signature


Re: Replacing the EDH SKIP primes

2019-07-04 Thread Daniel Gustafsson



> On 04 Jul 2019, at 02:58, Michael Paquier  wrote:
> 
>> On Wed, Jul 03, 2019 at 08:56:42PM +0200, Daniel Gustafsson wrote:
>> Agreed, I’ve updated the patch with a comment on this formulated such that it
>> should stand the test of time even as OpenSSL changes etc.
> 
> I'd like to think that we had rather mention the warning issue
> explicitely, so as people don't get surprised, like that for example:
> 
> *  This is the 2048-bit DH parameter from RFC 3526.  The generation of the
> *  prime is specified in RFC 2412, which also discusses the design choice
> *  of the generator.  Note that when loaded with OpenSSL this causes
> *  DH_check() to fail on with DH_NOT_SUITABLE_GENERATOR, where leaking
> *  a bit is preferred.
> 
> Now this makes an OpenSSL-specific issue pop up within a section of
> the code where we want to make things more generic with SSL, so your
> simpler version has good arguments as well.
> 
> I have just rechecked the shape of the key, and we have an exact
> match.

LGTM, thanks.

cheers ./daniel