On Thu, Mar 8, 2018 at 11:45 AM, Tom Lane wrote:
> Prabhat Sahu writes:
>> On Wed, Mar 7, 2018 at 7:51 PM, Robert Haas wrote:
>>> That looks like the background worker got killed by the OOM killer. How
>>> much memory do you have in the machine where this occurred?
>
>> I have ran the testcase
Prabhat Sahu writes:
> On Wed, Mar 7, 2018 at 7:51 PM, Robert Haas wrote:
>> That looks like the background worker got killed by the OOM killer. How
>> much memory do you have in the machine where this occurred?
> I have ran the testcase in my local machine with below configurations:
> Environm
On Wed, Mar 7, 2018 at 7:51 PM, Robert Haas wrote:
> On Wed, Mar 7, 2018 at 8:59 AM, Prabhat Sahu <
> prabhat.s...@enterprisedb.com> wrote:
>>
>> 2018-03-07 19:24:44.263 IST [54400] LOG: background worker "parallel
>> worker" (PID 54482) was terminated by signal 9: Killed
>>
>
> That looks like
On March 7, 2018 5:40:18 PM PST, Peter Geoghegan wrote:
>On Wed, Mar 7, 2018 at 5:16 PM, Tomas Vondra
> wrote:
>> FWIW that's usually written to the system log. Does dmesg say
>something
>> about the kill?
>
>While it would be nice to confirm that it was indeed the OOM killer,
>either way the cr
On Wed, Mar 7, 2018 at 5:16 PM, Tomas Vondra
wrote:
> FWIW that's usually written to the system log. Does dmesg say something
> about the kill?
While it would be nice to confirm that it was indeed the OOM killer,
either way the crash happened because SIGKILL was sent to a parallel
worker. There i
On 03/07/2018 03:21 PM, Robert Haas wrote:
> On Wed, Mar 7, 2018 at 8:59 AM, Prabhat Sahu
> mailto:prabhat.s...@enterprisedb.com>>
> wrote:
>
> 2018-03-07 19:24:44.263 IST [54400] LOG: background worker
> "parallel worker" (PID 54482) was terminated by signal 9: Killed
>
>
> That looks
On Wed, Mar 7, 2018 at 8:59 AM, Prabhat Sahu
wrote:
>
> 2018-03-07 19:24:44.263 IST [54400] LOG: background worker "parallel
> worker" (PID 54482) was terminated by signal 9: Killed
>
That looks like the background worker got killed by the OOM killer. How
much memory do you have in the machine
On Wed, Mar 7, 2018 at 7:16 PM, Robert Haas wrote:
> On Wed, Mar 7, 2018 at 8:13 AM, Prabhat Sahu <
> prabhat.s...@enterprisedb.com> wrote:
>
>> Hi all,
>>
>> While testing this feature I found a crash on PG head with parallel
>> create index using pgbanch tables.
>>
>> -- GUCs under postgres.con
On Wed, Mar 7, 2018 at 8:13 AM, Prabhat Sahu
wrote:
> Hi all,
>
> While testing this feature I found a crash on PG head with parallel create
> index using pgbanch tables.
>
> -- GUCs under postgres.conf
> max_parallel_maintenance_workers = 16
> max_parallel_workers = 16
> max_parallel_workers_per
Hi all,
While testing this feature I found a crash on PG head with parallel create
index using pgbanch tables.
-- GUCs under postgres.conf
max_parallel_maintenance_workers = 16
max_parallel_workers = 16
max_parallel_workers_per_gather = 8
maintenance_work_mem = 8GB
max_wal_size = 4GB
./pgbench -
On Tue, Feb 6, 2018 at 3:53 PM, Robert Haas wrote:
> On Tue, Feb 6, 2018 at 2:11 PM, Tom Lane wrote:
>> Robert Haas writes:
>>> Unfortunately valgrind does not work at all on my laptop -- the server
>>> appears to start, but as soon as you try to connect, the whole thing
>>> dies with an error c
On 02/06/2018 10:39 PM, Peter Geoghegan wrote:
> On Tue, Feb 6, 2018 at 1:30 PM, Tomas Vondra
> wrote:
>> I have little idea what -Og exactly means. It seems to be focused on
>> debugging experience, and so still does some of the optimizations.
>
> As I understand it, -Og allows any optimizatio
On Tue, Feb 6, 2018 at 1:30 PM, Tomas Vondra
wrote:
> I have little idea what -Og exactly means. It seems to be focused on
> debugging experience, and so still does some of the optimizations.
As I understand it, -Og allows any optimization that does not hamper
walking through code with a debugger
On 02/06/2018 10:14 PM, Peter Geoghegan wrote:
> On Tue, Feb 6, 2018 at 1:04 PM, Tomas Vondra
> wrote:
>> Did you do a test with "-O0"? In my experience that makes valgrind tests
>> much more reliable and repeatable. Some time ago we've seen cases that
>> were failing for me but not for others,
On Tue, Feb 6, 2018 at 1:04 PM, Tomas Vondra
wrote:
> Did you do a test with "-O0"? In my experience that makes valgrind tests
> much more reliable and repeatable. Some time ago we've seen cases that
> were failing for me but not for others, and I suspect it was due to me
> using "-O0".
FWIW, I u
On 02/06/2018 09:56 PM, Peter Geoghegan wrote:
> On Tue, Feb 6, 2018 at 12:53 PM, Robert Haas wrote:
>>> Do you want somebody who does have a working valgrind installation
>>> (ie me) to take responsibility for pushing this patch?
>>
>> I committed it before seeing this. It probably would've been
On Tue, Feb 6, 2018 at 12:53 PM, Robert Haas wrote:
>> Do you want somebody who does have a working valgrind installation
>> (ie me) to take responsibility for pushing this patch?
>
> I committed it before seeing this. It probably would've been better
> if you had done it, but I assume Peter test
On Tue, Feb 6, 2018 at 2:11 PM, Tom Lane wrote:
> Robert Haas writes:
>> Unfortunately valgrind does not work at all on my laptop -- the server
>> appears to start, but as soon as you try to connect, the whole thing
>> dies with an error claiming that the startup process has failed. So I
>> can'
Robert Haas writes:
> Unfortunately valgrind does not work at all on my laptop -- the server
> appears to start, but as soon as you try to connect, the whole thing
> dies with an error claiming that the startup process has failed. So I
> can't easily test this at the moment. I'll try to get it w
On Mon, Feb 5, 2018 at 1:45 PM, Peter Geoghegan wrote:
>> So, I guess another option might be to call VALGRIND_MAKE_MEM_DEFINED
>> on the buffer. "We know what we're doing, trust us!"
>>
>> In some ways, that seems better than inserting a suppression, because
>> it only affects the memory in the
On Mon, Feb 5, 2018 at 1:39 PM, Tels wrote:
> Are the uninitialized bytes that are written out "whatever was in the
> memory previously" or just some "0x00 bytes from the allocation but not
> yet overwritten from the PG code"?
>
> Because the first sounds like it could be a security problem - if r
On Mon, Feb 5, 2018 at 1:27 PM, Robert Haas wrote:
> On Mon, Feb 5, 2018 at 1:03 PM, Peter Geoghegan wrote:
>> It certainly is common. In the case of logtape.c, we almost always
>> write out some garbage bytes, even with serial sorts. The only
>> difference here is the *sense* in which they're ga
On Mon, February 5, 2018 4:27 pm, Robert Haas wrote:
> On Mon, Feb 5, 2018 at 1:03 PM, Peter Geoghegan wrote:
>> It certainly is common. In the case of logtape.c, we almost always
>> write out some garbage bytes, even with serial sorts. The only
>> difference here is the *sense* in which they're
On Mon, Feb 5, 2018 at 1:03 PM, Peter Geoghegan wrote:
> It certainly is common. In the case of logtape.c, we almost always
> write out some garbage bytes, even with serial sorts. The only
> difference here is the *sense* in which they're garbage: they're
> uninitialized bytes, which Valgrind care
On Mon, Feb 5, 2018 at 9:43 AM, Robert Haas wrote:
> # LogicalTapeFreeze() may write out its first block when it is dirty but
> # not full, and then immediately read the first block back in from its
> # BufFile as a BLCKSZ-width block. This can only occur in rare cases
> # where next to no tuples
On Fri, Feb 2, 2018 at 10:26 PM, Peter Geoghegan wrote:
> My proposed commit message
> has a full explanation of the Valgrind issue, which I won't repeat
> here. Go read it before reading the rest of this e-mail.
I'm going to paste the first two sentences of your proposed commit
message in here f
On Fri, Feb 2, 2018 at 4:31 PM, Peter Geoghegan wrote:
> On Fri, Feb 2, 2018 at 1:58 PM, Andres Freund wrote:
>> Not saying you're wrong, but you should include a comment on why this is
>> a benign warning. Presumably it's some padding memory somewhere, but
>> it's not obvious from the above blea
On Fri, Feb 2, 2018 at 1:58 PM, Andres Freund wrote:
> Not saying you're wrong, but you should include a comment on why this is
> a benign warning. Presumably it's some padding memory somewhere, but
> it's not obvious from the above bleat.
Sure. This looks slightly more complicated than first ant
On 2018-02-02 13:35:59 -0800, Peter Geoghegan wrote:
> On Fri, Feb 2, 2018 at 10:38 AM, Peter Geoghegan wrote:
> > On Fri, Feb 2, 2018 at 10:37 AM, Robert Haas wrote:
> >> If you could keep an eye on the buildfarm and investigate anything
> >> that breaks, I would appreciate it.
>
> > I can keep
On Fri, Feb 2, 2018 at 10:38 AM, Peter Geoghegan wrote:
> On Fri, Feb 2, 2018 at 10:37 AM, Robert Haas wrote:
>> If you could keep an eye on the buildfarm and investigate anything
>> that breaks, I would appreciate it.
> I can keep an eye on it throughout the day.
There is a benign Valgrind err
On Fri, Feb 2, 2018 at 3:35 PM, Peter Geoghegan wrote:
> On Fri, Feb 2, 2018 at 12:30 PM, Robert Haas wrote:
>> For the record, I typically construct the list of reviewers by reading
>> over the thread and adding all the people whose names I find there in
>> chronological order, excluding things
On Fri, Feb 2, 2018 at 12:30 PM, Robert Haas wrote:
> For the record, I typically construct the list of reviewers by reading
> over the thread and adding all the people whose names I find there in
> chronological order, excluding things that are clearly not review
> (like "Bumped to next CF.") and
On Fri, Feb 2, 2018 at 3:23 PM, Peter Geoghegan wrote:
> On Fri, Feb 2, 2018 at 10:38 AM, Peter Geoghegan wrote:
>> Thanks everyone
>
> I would like to acknowledge the assistance of Corey Huinker with early
> testing of the patch (this took place in 2016, and much of it was not
> on-list). Even t
On Fri, Feb 2, 2018 at 10:38 AM, Peter Geoghegan wrote:
> Thanks everyone
I would like to acknowledge the assistance of Corey Huinker with early
testing of the patch (this took place in 2016, and much of it was not
on-list). Even though he wasn't credited in the commit message, he
should appear i
On Fri, Feb 2, 2018 at 10:37 AM, Robert Haas wrote:
> And that patch you attached is also, now, committed.
>
> If you could keep an eye on the buildfarm and investigate anything
> that breaks, I would appreciate it.
Fantastic!
I can keep an eye on it throughout the day.
Thanks everyone
--
Pete
On Fri, Feb 2, 2018 at 11:16 AM, Peter Geoghegan wrote:
> Attached patch has these changes.
And that patch you attached is also, now, committed.
If you could keep an eye on the buildfarm and investigate anything
that breaks, I would appreciate it.
--
Robert Haas
EnterpriseDB: http://www.enterp
On Sat, Jan 27, 2018 at 12:20 AM, Amit Kapila wrote:
> I have posted the patch for the above API and posted it on a new
> thread [1]. Do let me know either here or on that thread if the patch
> suffices your need?
I've responded to you over on that thread. Thanks again for helping me.
I already
On Fri, Jan 26, 2018 at 12:36 PM, Amit Kapila wrote:
> On Fri, Jan 26, 2018 at 12:00 PM, Peter Geoghegan wrote:
>> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila
>> wrote:
At this point, my preferred solution is for someone to go implement
Amit's WaitForParallelWorkersToAttach() idea [
On Fri, Jan 26, 2018 at 6:40 PM, Thomas Munro
wrote:
> Thanks for looking into this. Yeah. I think you're right that it
> could add a bit of overhead in some cases (ie if you receive a lot of
> signals that AREN'T caused by fork failure, then you'll enter
> HandleParallelMessage() every time unn
On Fri, Jan 26, 2018 at 7:30 PM, Peter Geoghegan wrote:
> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila wrote:
>> However, now I see that you and Thomas are trying to find a different
>> way to overcome this problem differently, so not sure if I should go
>> ahead or not. I have seen that you to
On Fri, Jan 26, 2018 at 11:17 AM, Robert Haas wrote:
> Hmm, I like the idea of making it a #define instead of having it
> depend on parallel_leader_participation. Let's do that. If the
> consensus is later that it was the wrong decision, it'll be easy to
> change it back.
WFM.
--
Peter Geoghe
On Fri, Jan 26, 2018 at 2:04 PM, Peter Geoghegan wrote:
> On Fri, Jan 26, 2018 at 10:33 AM, Robert Haas wrote:
>> I'm busy with other things, so no rush.
>
> Got it.
>
> There is one question that I should probably get clarity on ahead of
> the next revision, which is: Should I rip out the code t
On Fri, Jan 26, 2018 at 10:33 AM, Robert Haas wrote:
> I'm busy with other things, so no rush.
Got it.
There is one question that I should probably get clarity on ahead of
the next revision, which is: Should I rip out the code that disallows
a "degenerate parallel CREATE INDEX" when
parallel_lea
On Fri, Jan 26, 2018 at 1:17 PM, Peter Geoghegan wrote:
> On Fri, Jan 26, 2018 at 10:01 AM, Robert Haas wrote:
>> On Fri, Jan 26, 2018 at 1:30 AM, Peter Geoghegan wrote:
>>> I had imagined that WaitForParallelWorkersToAttach() would give me an
>>> error in the style of WaitForParallelWorkersToFi
On Fri, Jan 26, 2018 at 10:01 AM, Robert Haas wrote:
> On Fri, Jan 26, 2018 at 1:30 AM, Peter Geoghegan wrote:
>> I had imagined that WaitForParallelWorkersToAttach() would give me an
>> error in the style of WaitForParallelWorkersToFinish(), without
>> actually waiting for the parallel workers t
On Fri, Jan 26, 2018 at 1:30 AM, Peter Geoghegan wrote:
> I had imagined that WaitForParallelWorkersToAttach() would give me an
> error in the style of WaitForParallelWorkersToFinish(), without
> actually waiting for the parallel workers to finish.
+1. If we're going to go that route, and that s
On Fri, Jan 26, 2018 at 12:00 PM, Peter Geoghegan wrote:
> On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila wrote:
>>> At this point, my preferred solution is for someone to go implement
>>> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
>>> like the logical person for the job)
On Thu, Jan 25, 2018 at 10:00 PM, Amit Kapila wrote:
>> At this point, my preferred solution is for someone to go implement
>> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems
>> like the logical person for the job).
>>
>
> I can implement it and share a prototype patch with yo
On Fri, Jan 26, 2018 at 11:30 AM, Amit Kapila wrote:
> On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan wrote:
>> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila wrote:
>>> Right, but what if the worker dies due to something proc_exit(1) or
>>> something like that before calling BarrierArriveAndWai
On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan wrote:
> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila wrote:
>> Right, but what if the worker dies due to something proc_exit(1) or
>> something like that before calling BarrierArriveAndWait. I think this
>> is part of the problem we have solved i
On Wed, Jan 24, 2018 at 5:31 PM, Thomas Munro
wrote:
> Here's a version that works, and a minimal repro test module thing.
> Without 0003 applied, it hangs.
I can confirm that this version does in fact fix the problem with
parallel CREATE INDEX hanging in the event of (simulated) worker
fork() fa
On Thu, Jan 25, 2018 at 9:28 AM, Peter Geoghegan wrote:
> On Wed, Jan 24, 2018 at 12:13 PM, Thomas Munro
> wrote:
>> On Thu, Jan 25, 2018 at 8:54 AM, Peter Geoghegan wrote:
>>> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>>>
>>> 1. The problem of fork failure causing nbtsort.c
On Wed, Jan 24, 2018 at 12:13 PM, Thomas Munro
wrote:
> On Thu, Jan 25, 2018 at 8:54 AM, Peter Geoghegan wrote:
>> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>>
>> 1. The problem of fork failure causing nbtsort.c to wait forever is a
>> real problem. Sure enough, the coding pa
On Thu, Jan 25, 2018 at 8:54 AM, Peter Geoghegan wrote:
> I have used Thomas' chaos-monkey-fork-process.patch to verify:
>
> 1. The problem of fork failure causing nbtsort.c to wait forever is a
> real problem. Sure enough, the coding pattern within
> _bt_leader_heapscan() can cause us to wait for
On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila wrote:
> Right, but what if the worker dies due to something proc_exit(1) or
> something like that before calling BarrierArriveAndWait. I think this
> is part of the problem we have solved in
> WaitForParallelWorkersToFinish such that if the worker exi
On Wed, Jan 24, 2018 at 6:43 PM, Amit Kapila wrote:
> On Wed, Jan 24, 2018 at 10:36 AM, Thomas Munro
> wrote:
>> On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila wrote:
> I am going to repeat my previous suggest that we use a Barrier here.
> Given the discussion subsequent to my original pro
On Wed, Jan 24, 2018 at 10:36 AM, Thomas Munro
wrote:
> On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila wrote:
I am going to repeat my previous suggest that we use a Barrier here.
Given the discussion subsequent to my original proposal, this can be a
lot simpler than what I suggested
On Wed, Jan 24, 2018 at 5:59 PM, Amit Kapila wrote:
>>> I am going to repeat my previous suggest that we use a Barrier here.
>>> Given the discussion subsequent to my original proposal, this can be a
>>> lot simpler than what I suggested originally. Each worker does
>>> BarrierAttach() before beg
On Wed, Jan 24, 2018 at 12:20 AM, Peter Geoghegan wrote:
> On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas wrote:
>> As Amit says, what remains is the case where fork() fails or the
>> worker dies before it reaches the line in ParallelWorkerMain that
>> reads shm_mq_set_sender(mq, MyProc). In thos
On Fri, Jan 19, 2018 at 6:22 AM, Robert Haas wrote:
>> (3)
>> erm, maybe it's a problem that errors occurring in workers while the
>> leader is waiting at a barrier won't unblock the leader (we don't
>> detach from barriers on abort/exit) -- I'll look into this.
>
> I think if there's an ERROR, th
On Tue, Jan 23, 2018 at 2:11 PM, Peter Geoghegan wrote:
> Finally, it's still not clear to me why nodeGather.c's use of
> parallel_leader_participation=off doesn't suffer from similar problems
> [1].
Thomas and I just concluded that it does. See my email on the other
thread just now.
I thought
On Tue, Jan 23, 2018 at 10:50 AM, Peter Geoghegan wrote:
> On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas wrote:
>> I am going to repeat my previous suggest that we use a Barrier here.
>> Given the discussion subsequent to my original proposal, this can be a
>> lot simpler than what I suggested or
On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas wrote:
> As Amit says, what remains is the case where fork() fails or the
> worker dies before it reaches the line in ParallelWorkerMain that
> reads shm_mq_set_sender(mq, MyProc). In those cases, no error will be
> signaled until you call WaitForPara
On Mon, Jan 22, 2018 at 10:13 PM, Peter Geoghegan wrote:
> _bt_leader_heapscan() can detect when workers exit early, at least in
> the vast majority of cases. It can do this simply by processing
> interrupts and automatically propagating any error -- nothing special
> about that. It can also detec
On Tue, Jan 23, 2018 at 8:43 AM, Peter Geoghegan wrote:
> On Mon, Jan 22, 2018 at 6:45 PM, Amit Kapila wrote:
>>> FWIW, I don't think that that's really much of a difference.
>>>
>>> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is
>>> similar to how _bt_end_parallel() calls
On Mon, Jan 22, 2018 at 6:45 PM, Amit Kapila wrote:
>> FWIW, I don't think that that's really much of a difference.
>>
>> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is
>> similar to how _bt_end_parallel() calls
>> WaitForParallelWorkersToFinish() in the patch. The
>> _bt_le
On Tue, Jan 23, 2018 at 1:45 AM, Peter Geoghegan wrote:
> On Mon, Jan 22, 2018 at 3:52 AM, Amit Kapila wrote:
>> The difference is that nodeGather.c doesn't have any logic like the
>> one you have in _bt_leader_heapscan where the patch waits for each
>> worker to increment nparticipantsdone. For
On Thu, Jan 18, 2018 at 9:22 AM, Robert Haas wrote:
> But, hang on a minute. Why do the workers need to wait for the leader
> anyway? Can't they just exit once they're done sorting? I think the
> original reason why this ended up in the patch is that we needed the
> leader to assume ownership o
On Mon, Jan 22, 2018 at 3:52 AM, Amit Kapila wrote:
> The difference is that nodeGather.c doesn't have any logic like the
> one you have in _bt_leader_heapscan where the patch waits for each
> worker to increment nparticipantsdone. For Gather node, we do such a
> thing (wait for all workers to fi
On Mon, Jan 22, 2018 at 10:36 AM, Peter Geoghegan wrote:
> On Sun, Jan 21, 2018 at 6:34 PM, Amit Kapila wrote:
>>> Why is this okay for Gather nodes, though? nodeGather.c looks at
>>> pcxt->nworkers_launched during initialization, and appears to at least
>>> trust it to indicate that more than ze
On Sun, Jan 21, 2018 at 6:34 PM, Amit Kapila wrote:
>> Why is this okay for Gather nodes, though? nodeGather.c looks at
>> pcxt->nworkers_launched during initialization, and appears to at least
>> trust it to indicate that more than zero actually-launched workers
>> will also show up when "nworker
On Mon, Jan 22, 2018 at 12:50 AM, Peter Geoghegan wrote:
> On Sat, Jan 20, 2018 at 8:38 PM, Amit Kapila wrote:
>> It would have been better if there were some comments besides that
>> field, but I think it has been covered at another place in the code.
>> See comments in LaunchParallelWorkers().
On Sat, Jan 20, 2018 at 8:38 PM, Amit Kapila wrote:
> It would have been better if there were some comments besides that
> field, but I think it has been covered at another place in the code.
> See comments in LaunchParallelWorkers().
>
> /*
> * Start workers.
> *
> * The caller must be able to to
On Sun, Jan 21, 2018 at 1:39 AM, Peter Geoghegan wrote:
> On Fri, Jan 19, 2018 at 9:29 PM, Amit Kapila wrote:
>
> Actually, though it doesn't really look like it from the way things
> are structured within nbtsort.c, I don't need to wait for workers to
> start up (call the WaitForParallelWorkerTo
On Fri, Jan 19, 2018 at 9:29 PM, Amit Kapila wrote:
> I think I can see why this patch needs that. Is it mainly for the
> work you are doing in _bt_leader_heapscan where you are waiting for
> all the workers to be finished?
Yes, though it's also needed for the leader tuplesort. It needs to be
ab
On Sat, Jan 20, 2018 at 7:03 AM, Peter Geoghegan wrote:
> On Thu, Jan 18, 2018 at 5:53 PM, Peter Geoghegan wrote:
>
> Attached patch details:
>
> * The patch synchronizes processes used the approach just described.
> Note that this allowed me to remove several #include statements within
> tupleso
On Sat, Jan 20, 2018 at 10:20 AM, Peter Geoghegan wrote:
> On Fri, Jan 19, 2018 at 8:44 PM, Amit Kapila wrote:
>> Right, but I think using parallel_leader_participation, you can do it
>> reliably and probably write some regression tests which can complete
>> in a predictable time.
>
> Do what rel
On Fri, Jan 19, 2018 at 8:44 PM, Amit Kapila wrote:
> Right, but I think using parallel_leader_participation, you can do it
> reliably and probably write some regression tests which can complete
> in a predictable time.
Do what reliably? Guarantee that the leader will not participate as a
worker,
On Sat, Jan 20, 2018 at 8:33 AM, Peter Geoghegan wrote:
> On Fri, Jan 19, 2018 at 6:52 PM, Amit Kapila wrote:
>
>> BTW, is there any other way for "parallel create index" to force that
>> the work is done by workers? I am insisting on having something which
>> can test the code path in workers b
On Fri, Jan 19, 2018 at 6:52 PM, Amit Kapila wrote:
> Or reverse is also possible which means the workers won't get chance
> to run the plan in which case we can use parallel_leader_participation
> = off to test workers behavior. As said before, I see only that as
> the reason to keep parallel_le
On Sat, Jan 20, 2018 at 2:57 AM, Thomas Munro
wrote:
> On Sat, Jan 20, 2018 at 6:32 AM, Robert Haas wrote:
>> On Fri, Jan 19, 2018 at 12:16 PM, Peter Geoghegan wrote:
>>> Clarity on what I should do about parallel_leader_participation in the
>>> next revision would be useful at this point. You s
On Sat, Jan 20, 2018 at 6:32 AM, Robert Haas wrote:
> On Fri, Jan 19, 2018 at 12:16 PM, Peter Geoghegan wrote:
>> Clarity on what I should do about parallel_leader_participation in the
>> next revision would be useful at this point. You seem to either want
>> me to remove it from consideration en
On Fri, Jan 19, 2018 at 12:16 PM, Peter Geoghegan wrote:
> On Fri, Jan 19, 2018 at 4:52 AM, Robert Haas wrote:
>>> Other than that, looks good to me. It's a simple patch with a clear purpose.
>>
>> Committed.
>
> Cool.
>
> Clarity on what I should do about parallel_leader_participation in the
> n
On Fri, Jan 19, 2018 at 4:52 AM, Robert Haas wrote:
>> Other than that, looks good to me. It's a simple patch with a clear purpose.
>
> Committed.
Cool.
Clarity on what I should do about parallel_leader_participation in the
next revision would be useful at this point. You seem to either want
me
On Thu, Jan 18, 2018 at 2:05 PM, Peter Geoghegan wrote:
> Review of your patch:
>
> * SerializedReindexState could use some comments. At least a one liner
> stating its basic purpose.
Added a comment.
> * The "System index reindexing support" comment block could do with a
> passing acknowledgeme
On Thu, Jan 18, 2018 at 9:22 AM, Robert Haas wrote:
> I just went back to the thread on "parallel.c oblivion of
> worker-startup failures" and refreshed my memory about what's going on
> over there. What's going on over there is (1) currently,
> nworkers_launched can be over-reported in a scenari
On Thu, Jan 18, 2018 at 10:17 AM, Robert Haas wrote:
>> Should I go ahead and restore builds on catalogs, and remove those
>> comments, on the assumption that your patch will be committed before
>> mine? Obviously parallel index builds on catalogs don't matter. OTOH,
>> why not? Perhaps it's like
On Thu, Jan 18, 2018 at 10:27 AM, Robert Haas wrote:
> On Thu, Jan 18, 2018 at 1:14 PM, Peter Geoghegan wrote:
>> That seems pretty far fetched.
>
> I don't think it is, and there are plenty of other examples. All you
> need is a query plan that involves significant CPU work both below the
> Gat
On Thu, Jan 18, 2018 at 6:14 AM, Amit Kapila wrote:
> I see your point. OTOH, I think we should have something for testing
> purpose as that helps in catching the bugs and makes it easy to write
> tests that cover worker part of the code.
This is about the question of whether or not we want to a
On Thu, Jan 18, 2018 at 1:14 PM, Peter Geoghegan wrote:
> That seems pretty far fetched.
I don't think it is, and there are plenty of other examples. All you
need is a query plan that involves significant CPU work both below the
Gather node and above the Gather node. It's not difficult to find
On Wed, Jan 17, 2018 at 10:22 PM, Peter Geoghegan wrote:
> If you think it's worth the cycles, then I have no objection. I will
> point out that this means that everything that I say about
> ReindexIsProcessingIndex() no longer applies, because the relevant
> state will now be propagated. It doesn
On Thu, Jan 18, 2018 at 6:21 AM, Robert Haas wrote:
> Amit's reply to this part drew my attention to it. I think this is
> entirely false. Consider an aggregate that doesn't support partial
> aggregation, and a plan that looks like this:
>
> Aggregate
> -> Gather
> -> Parallel Seq Scan
> F
On Thu, Jan 18, 2018 at 5:49 AM, Thomas Munro
wrote:
> I'm mostly away from my computer this week -- sorry about that,
Yeah, seriously. Since when it is OK for hackers to ever be away from
their computers? :-)
> The idea I mentioned would only work if nworkers_launched is never
> over-reported
On Wed, Jan 17, 2018 at 10:22 PM, Peter Geoghegan wrote:
> As I said in a prior e-mail, even parallel query's use of
> parallel_leader_participation is consistent with what I propose here,
> practically speaking, because a partial path without leader
> participation will always lose to a serial se
On Thu, Jan 18, 2018 at 8:52 AM, Peter Geoghegan wrote:
> On Wed, Jan 17, 2018 at 10:40 AM, Robert Haas wrote:
>
>>> (It might make sense to allow this if parallel_leader_participation
>>> was *purely* a testing GUC, only for use by by backend hackers, but
>>> AFAICT it isn't.)
>>
>> As applied t
On Thu, Jan 18, 2018 at 4:19 PM, Thomas Munro
wrote:
> Hi,
>
> I'm mostly away from my computer this week -- sorry about that, but
> here are a couple of quick answers to questions directed at me:
>
> On Thu, Jan 18, 2018 at 4:22 PM, Peter Geoghegan wrote:
>> On Wed, Jan 17, 2018 at 10:40 AM, Rob
Hi,
I'm mostly away from my computer this week -- sorry about that, but
here are a couple of quick answers to questions directed at me:
On Thu, Jan 18, 2018 at 4:22 PM, Peter Geoghegan wrote:
> On Wed, Jan 17, 2018 at 10:40 AM, Robert Haas wrote:
>>> While it certainly did occur to me that that
On Wed, Jan 17, 2018 at 6:20 PM, Robert Haas wrote:
> I had forgotten about the previous discussion. The sketch in my
> previous email supposed that we would use dynamic barriers since the
> whole point, after all, is to handle the fact that we don't know how
> many participants will really show
On Wed, Jan 17, 2018 at 10:40 AM, Robert Haas wrote:
>> While it certainly did occur to me that that was kind of weird, and I
>> struggled with it on my own for a little while, I ultimately agreed
>> with Thomas that it added something to have ltsConcatWorkerTapes()
>> call some buffile function i
On Wed, Jan 17, 2018 at 7:00 PM, Peter Geoghegan wrote:
> There seems to be some yak shaving involved in getting the barrier
> abstraction to do exactly what is required, as Thomas went into at the
> time. How should that prerequisite work be structured? For example,
> should a patch be spun off f
1 - 100 of 147 matches
Mail list logo