Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-25 Thread Dilip Kumar
On Sat, Jul 25, 2020 at 5:08 PM Amit Kapila wrote: > > On Fri, Jul 24, 2020 at 7:17 PM Dilip Kumar wrote: > > > > Your changes look fine to me. Additionally, I have changed a test > > case of getting the streaming changes in 0002. Instead of just > > showing the count, I am showing that the tra

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 5:31 PM Peter Geoghegan wrote: > I'm glad that this better principled approach is possible. It's hard > to judge how much of a problem this really is, though. We'll need to > think about this aspect some more. BTW, your HLL patch ameliorates the problem with my extreme "so

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 4:56 PM Jeff Davis wrote: > I wrote a quick patch to use HyperLogLog to estimate the number of > groups contained in a spill file. It seems to reduce the > overpartitioning effect, and is a more principled approach than what I > was doing before. This pretty much fixes the

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra wrote: > I'm not sure what you mean by "reported memory usage doesn't reflect the > space used for transition state"? Surely it does include that, we've > built the memory accounting stuff pretty much exactly to do that. > > I think it's pretty clear wh

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Tomas Vondra
On Sat, Jul 25, 2020 at 10:07:37AM -0700, Peter Geoghegan wrote: On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote: "Peak Memory Usage: 1605334kB" Hash agg avoids spilling entirely (so the planner gets it right this time around). It even uses notably less memory. I guess that this is bec

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Sat, 2020-07-25 at 13:27 -0700, Peter Geoghegan wrote: > It's not clear to me that overpartitioning is a real problem in > > this > > case -- but I think the fact that it's causing confusion is enough > > reason to see if we can fix it. > > I'm not sure about that either. > > FWIW I notice tha

Re: hashagg slowdown due to spill changes

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 12:41 PM Peter Geoghegan wrote: > I have added a new open item for this separate > LookupTupleHashEntryHash()/lookup_hash_entry() pipeline-stall issue. Attached is a rebased version of Andres' now-bitrot 2020-06-12 patch ("aggspeed.diff"). I find that Andres original "SEL

Re: [PATCH] Performance Improvement For Copy From Binary Files

2020-07-25 Thread Tom Lane
Amit Langote writes: > [ v7-0001-Improve-performance-of-binary-COPY-FROM-with-buff.patch ] Pushed with cosmetic changes. I'd always supposed that stdio does enough internal buffering that short fread()s shouldn't be much worse than memcpy(). But I reproduced your result of ~30% speedup for data

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 1:10 PM Jeff Davis wrote: > On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > > What worries me a bit is the sharp discontinuities when spilling with > > significantly less work_mem than the "optimal" amount. For example, > > with Tomas' TPC-H query (against my sm

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > What worries me a bit is the sharp discontinuities when spilling with > significantly less work_mem than the "optimal" amount. For example, > with Tomas' TPC-H query (against my smaller TPC-H dataset), I find > that setting work_mem to 6MB

Re: hashagg slowdown due to spill changes

2020-07-25 Thread Peter Geoghegan
On Fri, Jul 24, 2020 at 4:51 PM Andres Freund wrote: > This is still not resolved. We're right now slower than 12. It's > effectively not possible to do performance comparisons right now. This > was nearly two months ago. I have added a new open item for this separate LookupTupleHashEntryHash()/l

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 10:23 AM Jeff Davis wrote: > There's also another effect at work that can cause the total number of > batches to be higher for larger work_mem values: when we do recurse, we > again need to estimate the number of partitions needed. Right now, we > overestimate the number of

Re: Difference for Binary format vs Text format for client-server communication

2020-07-25 Thread Peter Eisentraut
On 2020-07-16 18:52, Andy Fan wrote: The reason I ask this is because I have a task to make numeric output similar to oracle. Oracle: SQL> select 2 / 1.0 from dual;      2/1.0 --          2 PG: postgres=# select  2 / 1.0;       ?column?  2. (1

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Thu, 2020-07-23 at 19:33 -0700, Peter Geoghegan wrote: > That does make it sound like the costs of the hash agg aren't being > represented. I suppose it isn't clear if this is a costing issue > because it isn't clear if the execution time performance itself is > pathological or is instead someth

Re: estimation problems for DISTINCT ON with FDW

2020-07-25 Thread Jeff Janes
On Fri, Jul 3, 2020 at 5:50 PM Tom Lane wrote: > > OK, I'll go ahead and push the patch I proposed yesterday. > Thank you. I tested 12_STABLE with my real queries on the real data set, and the "hard coded" estimate of 200 distinct rows (when use_remote_estimte is turned back on) is enough to ge

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Fri, 2020-07-24 at 10:40 +0200, Tomas Vondra wrote: > FWIW one more suspicious thing that I forgot to mention is the > behavior > of the "planned partitions" depending on work_mem, which looks like > this: > >2MB Planned Partitions: 64HashAgg Batches: 4160 >4MB

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote: > "Peak Memory Usage: 1605334kB" > > Hash agg avoids spilling entirely (so the planner gets it right this > time around). It even uses notably less memory. I guess that this is because the reported memory usage doesn't reflect the space used

Re: Mark unconditionally-safe implicit coercions as leakproof

2020-07-25 Thread Tom Lane
Robert Haas writes: > On Fri, Jul 24, 2020 at 12:17 PM Tom Lane wrote: >> I went through the system's built-in implicit coercions to see >> which ones are unconditionally successful. These could all be >> marked leakproof, as per attached patch. > IMHO, this is a nice improvement. Thanks; push

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Fri, Jul 24, 2020 at 12:55 PM Peter Geoghegan wrote: > Could that be caused by clustering in the data? > > If the input data is in totally random order then we have a good > chance of never having to spill skewed "common" values. That is, we're > bound to encounter common values before entering

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Fri, 2020-07-24 at 21:16 +0200, Tomas Vondra wrote: > Surely more recursive spilling should do more I/O, but the Disk Usage > reported by explain analyze does not show anything like ... I suspect that's because of disk reuse in logtape.c. Regards, Jeff Davis

Re: INSERT INTO SELECT, Why Parallelism is not selected?

2020-07-25 Thread Tom Lane
Amit Kapila writes: > On Fri, Jul 24, 2020 at 7:36 PM Tom Lane wrote: >> Yeah, the proposed comment changes don't actually add much. Also >> please try to avoid inserting non-ASCII   into the source code; >> at least in my mail reader, that attachment has some. > I don't see any non-ASCII chara

Re: Add header support to text format and matching feature

2020-07-25 Thread Daniel Verite
Rémi Lapeyre wrote: > Here’s a new version that fix all the issues. Here's a review of v4. The patch improves COPY in two ways: - COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format (previously it was only for CSV) - COPY FROM also accepts "HEADER match" to tell that ther

Re: proposal: possibility to read dumped table's name from file

2020-07-25 Thread vignesh C
On Tue, Jul 14, 2020 at 12:03 PM Pavel Stehule wrote: >> I meant can this: >> printf(_(" --filter=FILENAMEread object name filter >> expressions from file\n")); >> be changed to: >> printf(_(" --filter=FILENAMEdump objects and data based >> on the filter expressions from

Re: Improving connection scalability: GetSnapshotData()

2020-07-25 Thread Ranier Vilela
Em sex., 24 de jul. de 2020 às 21:00, Andres Freund escreveu: > On 2020-07-24 18:15:15 -0300, Ranier Vilela wrote: > > Em sex., 24 de jul. de 2020 às 14:16, Andres Freund > > escreveu: > > > > > On 2020-07-24 14:05:04 -0300, Ranier Vilela wrote: > > > > Latest Postgres > > > > Windows 64 bits >

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-25 Thread Amit Kapila
On Fri, Jul 24, 2020 at 7:17 PM Dilip Kumar wrote: > > Your changes look fine to me. Additionally, I have changed a test > case of getting the streaming changes in 0002. Instead of just > showing the count, I am showing that the transaction is actually > streaming. > If you want to show the cha

Re: display offset along with block number in vacuum errors

2020-07-25 Thread Michael Paquier
On Fri, Jul 24, 2020 at 11:18:43PM +0530, Mahendra Singh Thalor wrote: > In commit b61d161(Introduce vacuum errcontext to display additional > information), we added vacuum errcontext to display additional > information(block number) so that in case of vacuum error, we can identify > which block we

Re: handle a ECPG_bytea typo

2020-07-25 Thread Michael Paquier
On Sat, Jul 25, 2020 at 07:22:15AM +0530, vignesh C wrote: > I felt the changes look correct. The reason it might be working > earlier is because the structure members are the same for both the > data structures ECPGgeneric_bytea & ECPGgeneric_varchar. ECPGset_noind_null() and ECPGis_noind_null()