Agree, it isn't productive this way.
I can't seem to find it, but was there a DISCUSS thread for this branch-merge?
I usually recommend addressing issues on a DISCUSS thread instead of fighting
things over a VOTE.
+Vinod
> On Dec 13, 2018, at 10:09 AM, Konstantin Shvachko
> wrote:
>
> This vote failed due to Daryn Sharp's veto.
> The concern is being addressed by HDFS-13873. I will start a new vote once
> this is committed.
>
> Note for Daryn. Your non-responsive handling of the veto makes a bad
> precedence and is a bad example of communication on the lists from a
> respected member of this community. Please check your availability for
> followup discussions if you choose to get involved with important decisions.
>
> On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko
> wrote:
>
>> Hi Daryn,
>>
>> Wanted to backup Chen's earlier response to your concerns about rotating
>> calls in the call queue.
>> Our design
>> 1. targets directly the livelock problem by rejecting calls on the
>> Observer that are not likely to be responded in timely matter: HDFS-13873.
>> 2. The call queue rotation is only done on Observers, and never on the
>> active NN, so it stays free of attacks like you suggest.
>>
>> If this is a satisfactory mitigation for the problem could you please
>> reconsider your -1, so that people could continue voting on this thread.
>>
>> Thanks,
>> --Konst
>>
>> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp wrote:
>>
>>> -1 pending additional info. After a cursory scan, I have serious
>>> concerns regarding the design. This seems like a feature that should have
>>> been purely implemented in hdfs w/o touching the common IPC layer.
>>>
>>> The biggest issue in the alignment context. It's purpose appears to be
>>> for allowing handlers to reinsert calls back into the call queue. That's
>>> completely unacceptable. A buggy or malicious client can easily cause
>>> livelock in the IPC layer with handlers only looping on calls that never
>>> satisfy the condition. Why is this not implemented via RetriableExceptions?
>>>
>>> On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang
>>> wrote:
>>>
Great work guys.
Wonder if we can elaborate what's impact of not having #2 fixed, and why
#2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not
doesn't
know about ObserverNodes trying to convert them to SBNs.
Thanks.
--Yongjun
On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko
wrote:
> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch
HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale
read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC
workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of
HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new
feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not
doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and
Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec
12.
>
> Thanks,
> --Konstantin
>
>>>
>>>
>>> --
>>>
>>> Daryn
>>>
>>
-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org