Re: run pgindent on a regular basis / scripted manner

2020-08-15 Thread Tom Lane
Andres Freund  writes:
> One thing is that some here are actively against manually adding entries
> to typedefs.list.

I've been of the opinion that it's pointless to do so under the current
regime where (a) only a few people do that and (b) we only officially
re-indent once a year anyway.  When I want to manually run pgindent,
I always pull down a fresh typedefs.list from the buildfarm, which is
reasonably up-to-date regardless of what people added or didn't add,
and then add any new typedefs from my current patch to that out-of-tree
copy.

Now, if we switch to a regime where we're trying to keep the tree in
more nearly correctly-indented shape all the time, it would make sense
to revisit that.  I'm not saying that it's unreasonable to want to have
the in-tree typedefs.list track reality more closely --- only that doing
so in a half-baked way won't be very helpful.

>> I think as a start, we could just issue a guidelines that all committed code
>> should follow pgindent.  That has never really been a guideline, so it's not
>> surprising that it's not followed.

> Without a properly indented baseline that's hard to do, because it'll
> cause damage all over. So I don't think we easily can start just there -
> we'd first need to re-indent everything.

Well, we can certainly do a tree-wide re-indent anytime we're ready.
I doubt it would be very painful right now, with so little new work
since the last run.

regards, tom lane




Re: use pg_get_functiondef() in pg_dump

2020-08-15 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I wrote:
> > I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
> > me that this proposal has pg_dump assembling CREATE FUNCTION commands in
> > very different ways depending on the server version.  I'd rather see us
> > continuing to build the bulk of the command the same as before, and
> > introduce new behavior only for deparsing the function body.
> 
> BTW, a concrete argument for doing it that way is that if you make a
> backend function that does the whole CREATE-FUNCTION-building job in
> exactly the way pg_dump wants it, that function is nigh useless for
> any other client with slightly different requirements.  A trivial
> example here is that I don't think we want to become locked into
> the proposition that psql's \ef and \sf must print functions exactly
> the same way that pg_dump would.

The fact that the need that psql has and that which pg_dump has are at
least somewhat similar really argues that we should have put this code
into libpgcommon in the first place and not in the backend, and then had
both psql and pg_dump use that.

I'm sure there's a lot of folks who'd like to see more of the logic we
have in pg_dump for building objects from the catalog available to more
tools through libpgcommon- psql being one of the absolute first
use-cases for exactly that (there's certainly no shortage of people
who've asked how they can get a CREATE TABLE statement for a table by
using psql...).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Is it worth accepting multiple CRLs?

2020-08-15 Thread Stephen Frost
Greetings,

* Kyotaro Horiguchi (horikyota@gmail.com) wrote:
> At Mon, 03 Aug 2020 16:20:40 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > Thanks for the opinion. I'll continue working on this.
> 
> This is it, but.. 

Thanks!

> Looking closer I realized that certificates are verified in each
> backend so CRL cache doesn't work at all for the hashed directory
> method. Therefore, all CRL files relevant to a certificate to be
> verfied are loaded every time a backend starts.
> 
> The only advantage of this is avoiding irrelevant CRLs from being
> loaded in exchange of loading relevant CRLs at every session
> start. Session startup gets slower by many delta CRLs from the same
> CA.
> 
> Seems far from promising.

I agree that it's not ideal, but I don't know that this is a reason to
not move forward with this feature..?

We could certainly have a later patch which improves this in some way
(though exactly how isn't clear...  if we move the CRL loading into
postmaster then we'd have to load *all* of them, and then we'd still
need to check if they've changed since we loaded them, and presumably
have some way to signal the postmaster to update its set from time to
time..), but that can be a future effort.

I took a quick look through the patch and it seemed pretty straight
forward to me and a good improvement.

Would love to hear other thoughts.  I hope you'll submit this for the
September CF and ping me when you do and I'll see if I can get it
committed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: run pgindent on a regular basis / scripted manner

2020-08-15 Thread Andres Freund
Hi,

On 2020-08-15 13:47:41 +0200, Peter Eisentraut wrote:
> On 2020-08-13 00:34, Andres Freund wrote:
> > I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
> > most of the hunks were entirely unrelated. Despite the development
> > window for 14 having only relatively recently opened. Based on my
> > experience it tends to get worse over time.
> 
> Do we have a sense of why poorly-indented code gets committed?  I think some
> of the indentation rules are hard to follow manually.  (pgperltidy is
> worse.)
> 
> Also, since pgindent gets run eventually anyway, it's not really that
> important to get the indentation right the first time.  I suspect the goal
> of most authors and committers is to write readable code rather than to
> divine the exact pgindent output.

One thing is that some here are actively against manually adding entries
to typedefs.list. Which then means that code gets oddly indented if you
use pgindent.  I personally try to make the predictable updates to
typedefs.list, which then at least allows halfway sensibly indenting my
own changes.


> I think as a start, we could just issue a guidelines that all committed code
> should follow pgindent.  That has never really been a guideline, so it's not
> surprising that it's not followed.

Without a properly indented baseline that's hard to do, because it'll
cause damage all over. So I don't think we easily can start just there -
we'd first need to re-indent everything.

Greetings,

Andres Freund




Re: Improving connection scalability: GetSnapshotData()

2020-08-15 Thread Andres Freund
Hi,

On 2020-08-15 11:10:51 -0400, Tom Lane wrote:
> We have two essentially identical buildfarm failures since these patches
> went in:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly=2020-08-15%2011%3A27%3A32
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus=2020-08-15%2003%3A09%3A14
>
> They're both in the same place in the freeze-the-dead isolation test:

> TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", 
> File: "heapam.c", Line: 6051)
> 0x9613eb  at 
> /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x52d586  at 
> /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x53bc7e  at 
> /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x6949bb  at 
> /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x694532  at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x693d1c  at 
> /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> 0x8324b3
> ...
> 2020-08-14 22:16:41.783 CDT [78410:4] LOG:  server process (PID 80395) was 
> terminated by signal 6: Abort trap
> 2020-08-14 22:16:41.783 CDT [78410:5] DETAIL:  Failed process was running: 
> VACUUM FREEZE tab_freeze;
>
> peripatus has successes since this failure, so it's not fully reproducible
> on that machine.  I'm suspicious of a timing problem in computing vacuum's
> cutoff_xid.

Hm, maybe it's something around what I observed in
https://www.postgresql.org/message-id/20200723181018.neey2jd3u7rfrfrn%40alap3.anarazel.de

I.e. that somehow we end up with hot pruning and freezing coming to a
different determination, and trying to freeze a hot tuple.

I'll try to add a few additional asserts here, and burn some cpu tests
trying to trigger the issue.

I gotta escape the heat in the house for a few hours though (no AC
here), so I'll not look at the results till later this afternoon, unless
it triggers soon.


> (I'm also wondering why the failing check is an Assert rather than a real
> test-and-elog.  Assert doesn't seem like an appropriate way to check for
> plausible data corruption cases.)

Robert, and to a lesser degree you, gave me quite a bit of grief over
converting nearby asserts to elogs. I agree it'd be better if it were
an assert, but ...

Greetings,

Andres Freund




Re: LSM tree for Postgres

2020-08-15 Thread AJG
Dmitry Dolgov wrote
>> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote:
>>
>> Then I think about implementing ideas of LSM using standard Postgres
>> nbtree.
>>
>> We need two indexes: one small for fast inserts and another - big
>> (main) index. This top index is small enough to fit in memory so
>> inserts in this index are very fast.  Periodically we will merge data
>> from top index to base index and truncate the top index. To prevent
>> blocking of inserts in the table while we are merging indexes we can
>> add ... on more index, which will be used during merge.
>>
>> So final architecture of Lsm3 is the following: two top indexes used
>> in cyclic way and one main index. When top index reaches some
>> threshold value we initiate merge with main index, done by bgworker
>> and switch to another top index.  As far as merging indexes is done in
>> background, it doesn't  affect insert speed.  Unfortunately Postgres
>> Index AM has not bulk insert operation, so we have to perform normal
>> inserts.  But inserted data is already sorted by key which should
>> improve access locality and partly solve random reads problem for base
>> index.
>>
>> Certainly to perform search in Lsm3 we have to make lookups in all
>> three indexes and merge search results.
> 
> Thanks for sharing this! In fact this reminds me more of partitioned
> b-trees [1] (and more older [2]) rather than LSM as it is (although
> could be that the former was influenced by the latter). What could be
> interesting is that quite often in these and many other whitepapers
> (e.g. [3]) to address the lookup overhead the design includes bloom
> filters in one or another way to avoid searching not relevant part of an
> index. Tomas mentioned them in this thread as well (in the different
> context), probably the design suggested here could also benefit from it?
> 
> [1]: Riegger Christian, Vincon Tobias, Petrov Ilia. Write-optimized
> indexing with partitioned b-trees. (2017). 296-300.
> 10.1145/3151759.3151814.
> [2]: Graefe Goetz. Write-Optimized B-Trees. (2004). 672-683.
> 10.1016/B978-012088469-8/50060-7.
> [3]: Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky,
> Lin Ma, and Rui Shen. Reducing the Storage Overhead of Main-Memory OLTP
> Databases with Hybrid Indexes. (2016). 1567–1581. 10.1145/2882903.2915222.


I found this 2019 paper recently, might be worth a skim read for some
different ideas. too technical for me :)
"Jungle: Towards Dynamically Adjustable Key-Value Storeby Combining LSM-Tree
and Copy-On-Write B+-Tree"
https://www.usenix.org/system/files/hotstorage19-paper-ahn.pdf



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Improving connection scalability: GetSnapshotData()

2020-08-15 Thread Tom Lane
We have two essentially identical buildfarm failures since these patches
went in:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly=2020-08-15%2011%3A27%3A32
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus=2020-08-15%2003%3A09%3A14

They're both in the same place in the freeze-the-dead isolation test:

TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", 
File: "heapam.c", Line: 6051)
0x9613eb  at 
/home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
0x52d586  at 
/home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
0x53bc7e  at 
/home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
0x6949bb  at 
/home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
0x694532  at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
0x693d1c  at 
/home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
0x8324b3
...
2020-08-14 22:16:41.783 CDT [78410:4] LOG:  server process (PID 80395) was 
terminated by signal 6: Abort trap
2020-08-14 22:16:41.783 CDT [78410:5] DETAIL:  Failed process was running: 
VACUUM FREEZE tab_freeze;

peripatus has successes since this failure, so it's not fully reproducible
on that machine.  I'm suspicious of a timing problem in computing vacuum's
cutoff_xid.

(I'm also wondering why the failing check is an Assert rather than a real
test-and-elog.  Assert doesn't seem like an appropriate way to check for
plausible data corruption cases.)

regards, tom lane




Re: [BUG] Error in BRIN summarization

2020-08-15 Thread Tom Lane
hyrax's latest report suggests that this patch has issues under
CLOBBER_CACHE_ALWAYS:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2020-08-13%2005%3A09%3A58

Hard to tell whether there's an actual bug there or just test instability,
but either way it needs to be resolved.

regards, tom lane




Re: use pg_get_functiondef() in pg_dump

2020-08-15 Thread Tom Lane
I wrote:
> I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
> me that this proposal has pg_dump assembling CREATE FUNCTION commands in
> very different ways depending on the server version.  I'd rather see us
> continuing to build the bulk of the command the same as before, and
> introduce new behavior only for deparsing the function body.

BTW, a concrete argument for doing it that way is that if you make a
backend function that does the whole CREATE-FUNCTION-building job in
exactly the way pg_dump wants it, that function is nigh useless for
any other client with slightly different requirements.  A trivial
example here is that I don't think we want to become locked into
the proposition that psql's \ef and \sf must print functions exactly
the same way that pg_dump would.

regards, tom lane




Re: use pg_get_functiondef() in pg_dump

2020-08-15 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-08-12 21:54, Robert Haas wrote:
>> One problem with this, which I think Tom pointed out before, is that
>> it might make it to handle some forward-compatibility problems. In
>> other words, if something that the server is generating needs to be
>> modified for compatibility with a future release, it's not easy to do
>> that. Like if we needed to quote something we weren't previously
>> quoting, for example.

> We already use a lot of other pg_get_*def functions in pg_dump.  Does 
> this one introduce any fundamentally new problems?

I wouldn't say that it's *fundamentally* new, but nonethless it disturbs
me that this proposal has pg_dump assembling CREATE FUNCTION commands in
very different ways depending on the server version.  I'd rather see us
continuing to build the bulk of the command the same as before, and
introduce new behavior only for deparsing the function body.

We've talked before about what a mess it is that some aspects of pg_dump's
output are built on the basis of what pg_dump sees in its stable snapshot
but others are built by ruleutils.c on the basis of up-to-the-minute
catalog contents.  While I don't insist that this patch fix that, I'm
worried that it may be making things worse, or at least getting in the
way of ever fixing that.

Perhaps these concerns are unfounded, but I'd like to see some arguments
why before we go down this path.

regards, tom lane




Re: run pgindent on a regular basis / scripted manner

2020-08-15 Thread Peter Eisentraut

On 2020-08-13 00:34, Andres Freund wrote:

I e.g. just re-indented patch 0001 of my GetSnapshotData() series and
most of the hunks were entirely unrelated. Despite the development
window for 14 having only relatively recently opened. Based on my
experience it tends to get worse over time.


Do we have a sense of why poorly-indented code gets committed?  I think 
some of the indentation rules are hard to follow manually.  (pgperltidy 
is worse.)


Also, since pgindent gets run eventually anyway, it's not really that 
important to get the indentation right the first time.  I suspect the 
goal of most authors and committers is to write readable code rather 
than to divine the exact pgindent output.


I think as a start, we could just issue a guidelines that all committed 
code should follow pgindent.  That has never really been a guideline, so 
it's not surprising that it's not followed.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-08-15 Thread Dilip Kumar
On Thu, Aug 13, 2020 at 6:47 PM Amit Kapila  wrote:
>
> On Thu, Aug 13, 2020 at 12:08 PM Amit Kapila  wrote:
> >
> > On Fri, Aug 7, 2020 at 2:04 PM Dilip Kumar  wrote:
> > >
> > > On Thu, Aug 6, 2020 at 2:43 PM Amit Kapila  
> > > wrote:
> > > >
> > ..
> > > > This patch's functionality can be independently verified by SQL APIs
> > >
> > > Your changes look fine to me.
> > >
> >
> > I have pushed that patch last week and attached are the remaining
> > patches. I have made a few changes in the next patch
> > 0001-Extend-the-BufFile-interface.patch and have some comments on it
> > which are as below:
> >
>
> Few more comments on the latest patches:
> v48-0002-Add-support-for-streaming-to-built-in-replicatio
> 1. It appears to me that we don't remove the temporary folders created
> by the apply worker. So, we have folders like
> pgsql_tmp15324.0.sharedfileset in base/pgsql_tmp directory even when
> the apply worker exits. I think we can remove these by calling
> PathNameDeleteTemporaryDir in SharedFileSetUnregister while removing
> the fileset from registered filesetlist.

I think we need to call SharedFileSetDeleteAll(input_fileset), from
SharedFileSetUnregister, so that all the directories created for this
fileset are removed

> 2.
> +typedef struct SubXactInfo
> +{
> + TransactionId xid; /* XID of the subxact */
> + int fileno; /* file number in the buffile */
> + off_t offset; /* offset in the file */
> +} SubXactInfo;
> +
> +static uint32 nsubxacts = 0;
> +static uint32 nsubxacts_max = 0;
> +static SubXactInfo *subxacts = NULL;
> +static TransactionId subxact_last = InvalidTransactionId;
>
> Will it be better if we move all the subxact related variables (like
> nsubxacts, nsubxacts_max and subxact_last) inside SubXactInfo struct
> as all the information anyway is related to sub-transactions?

I have moved them all to a structure.

> 3.
> + /*
> + * If there is no subtransaction then nothing to do,  but if already have
> + * subxact file then delete that.
> + */
>
> extra space before 'but' in the above sentence is not required.

Fixed

> v48-0001-Extend-the-BufFile-interface
> 4.
> - * SharedFileSets can also be used by backends when the temporary files need
> - * to be opened/closed multiple times and the underlying files need to 
> survive
> + * SharedFileSets can be used by backends when the temporary files need to be
> + * opened/closed multiple times and the underlying files need to survive
>   * across transactions.
>   *
>
> No need of 'also' in the above sentence.

Fixed


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: use pg_get_functiondef() in pg_dump

2020-08-15 Thread Peter Eisentraut

On 2020-08-12 21:54, Robert Haas wrote:

One problem with this, which I think Tom pointed out before, is that
it might make it to handle some forward-compatibility problems. In
other words, if something that the server is generating needs to be
modified for compatibility with a future release, it's not easy to do
that. Like if we needed to quote something we weren't previously
quoting, for example.


We already use a lot of other pg_get_*def functions in pg_dump.  Does 
this one introduce any fundamentally new problems?


A hypothetical change where syntax that we accept now would no longer be 
accepted in a (near-)future version would create a lot of upsetness.  I 
don't think we'd do it.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: remove some ancient port hacks

2020-08-15 Thread Peter Eisentraut

On 2020-08-13 05:22, Noah Misch wrote:

On Wed, Aug 12, 2020 at 09:12:07AM +0200, Peter Eisentraut wrote:

There are two ancient hacks in the cygwin and solaris ports that appear to
have been solved more than 10 years ago, so I think we can remove them.  See
attached patches.


+1 for removing these.  >10y age is not sufficient justification by itself; if
systems that shipped with the defect were not yet EOL, that would tend to
justify waiting longer.  For these particular hacks, though, affected systems
are both old and EOL.


done

In this case, the bug was fixed in the stable release track of this OS, 
so the only way to still be affected would be if you had never installed 
any OS patches in 10 years, which is clearly unreasonable.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: remove some ancient port hacks

2020-08-15 Thread Peter Eisentraut

On 2020-08-12 10:18, Marco Atzeri wrote:

On 12.08.2020 09:12, Peter Eisentraut wrote:

There are two ancient hacks in the cygwin and solaris ports that appear
to have been solved more than 10 years ago, so I think we can remove
them.  See attached patches.



Hi Peter,
This is really archeology

Check for b20.1

as it was released in 1998.
No problem at all to remove it


Committed.  Thanks for the feedback.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services