Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-23 Thread Tsz Wo (Nicholas), Sze
gt;> 9820 by default.  It is a new feature that NN listen to two ports
    >> simultaneously.  The feature has other benefits, e.g. one of the ports is
    >> reserved to some high priority applications so that it can have a better
    >> response time.  It is compatible to both 2.x and 3.0.0. Of course, users
    >> could choose to set it back to one of the ports in the conf.
    >>        3) Revert the NN RPC port back to 8020.  We need to ask where
    >> should the revert happen?3.1) Revert it in 3.0.1 as proposed by
    >> HDFS-12990.  However, this is an incompatible change between dot releases
    >> 3.0.0 and 3.0.1 and it violates our policy.  Being compatible is very
    >> important.  Users expect 3.0.0 and 3.0.1 are compatible.  How could we
    >> explain 3.0.0 and 3.0.1 are incompatible due to convenience?3.2) Revert 
it
    >> in 4.0.0.  There is no compatibility issue since 3.0.0 and 4.0.0 are
    >> allowed to have incompatible changes according to our policy.
    >>        Since compatibility is more important than convenience, Solution
    >> #3.1 is impermissible.  For the remaining solutions, both #1 and #2 are
    >> fine to me.
    >>        Thanks.Tsz-Wo
    >>
    >>
    >>              On Friday, January 12, 2018, 12:26:47 PM GMT+8, Chris
    >> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
    >>        On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com
    >> <mailto:szets...@yahoo.com>> wrote:
    >>
    >>            The question is: how are we going to fix it?
    >>
    >>
    >>        What do you propose? -C
    >>
    >>
    >>
    >>
    >>            No incompatible changes are allowed between 3.0.0 and 3.0.1.
    >> Dot releases only allow bug fixes.
    >>
    >>
    >>        We may not like the statement above but it is our compatibility
    >> policy.  We should either follow the policy or revise it.
    >>
    >>        Some more questions:
    >>                  - What if someone is already using 3.0.0 and has changed
    >> all the scripts to 9820?  Just let them fail?
    >>              - Compared to 2.x, 3.0.0 has many incompatible changes. Are
    >> we going to have other incompatible changes in the future minor and dot
    >> releases? What is the criteria to decide which incompatible changes are
    >> allowed?
    >>              - I hate that we have prematurely released 3.0.0 and make
    >> 3.0.1 incompatible to 3.0.0. If the "bug" is that serious, why not fixing
    >> it in 4.0.0 and declare 3.x as dead?
    >>              - It seems obvious that no one has seriously tested it so
    >> that the problem is not uncovered until now. Are there bugs in our 
current
    >> release procedure?
    >>
    >>        ThanksTsz-Wo
    >>
    >>
    >>              On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris
    >> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
    >>              Isn't this limited to reverting the 8020 -> 9820 change? -C
    >>
    >>        On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com
    >> <mailto:ey...@hortonworks.com>> wrote:
    >>
    >>            The fix in HDFS-9427 can potentially bring in new customers
    >> because less
    >>            chance for new comer to encountering “port already in use”
    >> problem.  If we
    >>            make change according to HDFS-12990, then this incompatible
    >> change does not
    >>            make incompatible change compatible.  Other ports are not
    >> reverted
    >>            according to HDFS-12990.  User will encounter the bad taste
    >> in the mouth
    >>            that HDFS-9427 attempt to solve.  Please do consider both
    >> negative side
    >>            effects of reverting as well as incompatible minor release
    >> change.  Thanks
    >>
    >>            Regards,
    >>            Eric
    >>
    >>            From: larry mccay <lmc...@apache.org > lmc...@apache.org>>
    >>            Date: Wednesday, January 10, 2018 at 10:53 AM
    >>            To: Daryn Sharp <da...@oath.com <mailto:da...@oath.com>>
    >>            Cc: "Aaron T. Myers" <a...@apache.org 
<mailto:a...@apache.org>>,
    >> Eric Yang <ey...@hortonworks.com <mailto:ey...@hortonworks.com>>,
    >>            Chris Douglas <cdoug...@apache.org > cdoug...@apache.org>>, Hadoop Co

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-22 Thread Eric Yang
 could we
>> explain 3.0.0 and 3.0.1 are incompatible due to convenience?3.2) Revert 
it
>> in 4.0.0.  There is no compatibility issue since 3.0.0 and 4.0.0 are
>> allowed to have incompatible changes according to our policy.
>> Since compatibility is more important than convenience, Solution
>> #3.1 is impermissible.  For the remaining solutions, both #1 and #2 are
>> fine to me.
>> Thanks.Tsz-Wo
>>
>>
>>   On Friday, January 12, 2018, 12:26:47 PM GMT+8, Chris
>> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
>> On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com
>> <mailto:szets...@yahoo.com>> wrote:
>>
>>The question is: how are we going to fix it?
>>
>>
>> What do you propose? -C
>>
>>
>>
>>
>> No incompatible changes are allowed between 3.0.0 and 3.0.1.
>> Dot releases only allow bug fixes.
>>
>>
>> We may not like the statement above but it is our compatibility
>> policy.  We should either follow the policy or revise it.
>>
>> Some more questions:
>>  - What if someone is already using 3.0.0 and has changed
>> all the scripts to 9820?  Just let them fail?
>>  - Compared to 2.x, 3.0.0 has many incompatible changes. Are
>> we going to have other incompatible changes in the future minor and dot
>> releases? What is the criteria to decide which incompatible changes are
>> allowed?
>>  - I hate that we have prematurely released 3.0.0 and make
>> 3.0.1 incompatible to 3.0.0. If the "bug" is that serious, why not fixing
>> it in 4.0.0 and declare 3.x as dead?
>>  - It seems obvious that no one has seriously tested it so
>> that the problem is not uncovered until now. Are there bugs in our 
current
>> release procedure?
>>
>> ThanksTsz-Wo
>>
>>
>>   On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris
>> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
>>  Isn't this limited to reverting the 8020 -> 9820 change? -C
>>
>> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com
>> <mailto:ey...@hortonworks.com>> wrote:
>>
>> The fix in HDFS-9427 can potentially bring in new customers
>> because less
>> chance for new comer to encountering “port already in use”
>> problem.  If we
>> make change according to HDFS-12990, then this incompatible
    >> change does not
>> make incompatible change compatible.  Other ports are not
>> reverted
>> according to HDFS-12990.  User will encounter the bad taste
>> in the mouth
>> that HDFS-9427 attempt to solve.  Please do consider both
>> negative side
>> effects of reverting as well as incompatible minor release
>> change.  Thanks
>>
>> Regards,
>> Eric
>>
>> From: larry mccay <lmc...@apache.org > lmc...@apache.org>>
>> Date: Wednesday, January 10, 2018 at 10:53 AM
>> To: Daryn Sharp <da...@oath.com <mailto:da...@oath.com>>
>> Cc: "Aaron T. Myers" <a...@apache.org 
<mailto:a...@apache.org>>,
>> Eric Yang <ey...@hortonworks.com <mailto:ey...@hortonworks.com>>,
>> Chris Douglas <cdoug...@apache.org > cdoug...@apache.org>>, Hadoop Common <
>> common-dev@hadoop.apache.org <mailto:common-...@hadoop.apac
>> he.org>>
>> Subject: Re: When are incompatible changes acceptable
>> (HDFS-12990)
>>
>> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com
>> <mailto:da...@oath.com>>
>> da...@oath.com <mailto:da...@oath.com>>> wrote:
>>
>> I fully agree the port changes should be reverted.  Although
>> "incompatible", the potential impact to existing 2.x deploys
>> is huge.  I'd
>> rather inconvenience 3.0 deploys that compromise <1%
>> customers.  An
>>

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-22 Thread Xiao Chen
 statement above but it is our compatibility
>> policy.  We should either follow the policy or revise it.
>>
>> Some more questions:
>>  - What if someone is already using 3.0.0 and has changed
>> all the scripts to 9820?  Just let them fail?
>>  - Compared to 2.x, 3.0.0 has many incompatible changes. Are
>> we going to have other incompatible changes in the future minor and dot
>> releases? What is the criteria to decide which incompatible changes are
>> allowed?
>>  - I hate that we have prematurely released 3.0.0 and make
>> 3.0.1 incompatible to 3.0.0. If the "bug" is that serious, why not fixing
>> it in 4.0.0 and declare 3.x as dead?
>>  - It seems obvious that no one has seriously tested it so
>> that the problem is not uncovered until now. Are there bugs in our current
>> release procedure?
>>
>> ThanksTsz-Wo
>>
>>
>>   On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris
>> Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
>>  Isn't this limited to reverting the 8020 -> 9820 change? -C
>>
>> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com
>> <mailto:ey...@hortonworks.com>> wrote:
>>
>> The fix in HDFS-9427 can potentially bring in new customers
>> because less
>> chance for new comer to encountering “port already in use”
>> problem.  If we
>> make change according to HDFS-12990, then this incompatible
>> change does not
>> make incompatible change compatible.  Other ports are not
>> reverted
>> according to HDFS-12990.  User will encounter the bad taste
>> in the mouth
>> that HDFS-9427 attempt to solve.  Please do consider both
>> negative side
>> effects of reverting as well as incompatible minor release
>> change.  Thanks
>>
>> Regards,
>> Eric
>>
>> From: larry mccay <lmc...@apache.org > lmc...@apache.org>>
>> Date: Wednesday, January 10, 2018 at 10:53 AM
>> To: Daryn Sharp <da...@oath.com <mailto:da...@oath.com>>
>> Cc: "Aaron T. Myers" <a...@apache.org <mailto:a...@apache.org>>,
>> Eric Yang <ey...@hortonworks.com <mailto:ey...@hortonworks.com>>,
>> Chris Douglas <cdoug...@apache.org > cdoug...@apache.org>>, Hadoop Common <
>> common-dev@hadoop.apache.org <mailto:common-...@hadoop.apac
>> he.org>>
>> Subject: Re: When are incompatible changes acceptable
>> (HDFS-12990)
>>
>> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com
>> <mailto:da...@oath.com>>
>> da...@oath.com <mailto:da...@oath.com>>> wrote:
>>
>> I fully agree the port changes should be reverted.  Although
>> "incompatible", the potential impact to existing 2.x deploys
>> is huge.  I'd
>> rather inconvenience 3.0 deploys that compromise <1%
>> customers.  An
>> incompatible change to revert an incompatible change is called
>> compatibility.
>>
>> +1
>>
>>
>>
>>
>> Most importantly, consider that there is no good upgrade path
>> existing
>> deploys, esp. large and/or multi-cluster environments.  It’s
>> only feasible
>> for first-time deploys or simple single-cluster upgrades
>> willing to take
>> downtime.  Let's consider a few reasons why:
>>
>>
>>
>> 1. RU is completely broken.  Running jobs will fail.  If MR
>> on hdfs
>> bundles the configs, there's no way to transparently
>> coordinate the switch
>> to the new bundle with the port changed.  Job submissions
>> will fail.
>>
>>
>>
>> 2. Users generally do not add the rpc port number to uris so
>> unless their
>> configs are updated they will contact the wrong port.
>> Seamlessly
>> coordinating the conf change without massive failures is
>> impossible.
>>
>>
>>
>> 3. Even if client confs are updated, they will break in a
>> multi-cluster
>> env with NNs using different ports.  Users/services will be
>> forced to add
>> the port.  The cited hive 

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-21 Thread Akira Ajisaka
ications so that it can have a better response time.  It is 
compatible to both 2.x and 3.0.0. Of course, users could choose to set it back 
to one of the ports in the conf.
3) Revert the NN RPC port back to 8020.  We need to ask where should 
the revert happen?3.1) Revert it in 3.0.1 as proposed by HDFS-12990.  However, 
this is an incompatible change between dot releases 3.0.0 and 3.0.1 and it 
violates our policy.  Being compatible is very important.  Users expect 3.0.0 
and 3.0.1 are compatible.  How could we explain 3.0.0 and 3.0.1 are 
incompatible due to convenience?3.2) Revert it in 4.0.0.  There is no 
compatibility issue since 3.0.0 and 4.0.0 are allowed to have incompatible 
changes according to our policy.
Since compatibility is more important than convenience, Solution #3.1 
is impermissible.  For the remaining solutions, both #1 and #2 are fine to me.
Thanks.Tsz-Wo


      On Friday, January 12, 2018, 12:26:47 PM GMT+8, Chris Douglas 
<cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com 
<mailto:szets...@yahoo.com>> wrote:

   The question is: how are we going to fix it?


What do you propose? -C




No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot 
releases only allow bug fixes.


We may not like the statement above but it is our compatibility policy. 
 We should either follow the policy or revise it.

Some more questions:
         - What if someone is already using 3.0.0 and has changed all 
the scripts to 9820?  Just let them fail?
     - Compared to 2.x, 3.0.0 has many incompatible changes. Are we 
going to have other incompatible changes in the future minor and dot releases? 
What is the criteria to decide which incompatible changes are allowed?
     - I hate that we have prematurely released 3.0.0 and make 3.0.1 incompatible 
to 3.0.0. If the "bug" is that serious, why not fixing it in 4.0.0 and declare 
3.x as dead?
     - It seems obvious that no one has seriously tested it so that the 
problem is not uncovered until now. Are there bugs in our current release 
procedure?

ThanksTsz-Wo


      On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris Douglas 
<cdoug...@apache.org <mailto:cdoug...@apache.org>> wrote:
     Isn't this limited to reverting the 8020 -> 9820 change? -C

On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com 
<mailto:ey...@hortonworks.com>> wrote:

The fix in HDFS-9427 can potentially bring in new customers because 
less
chance for new comer to encountering “port already in use” problem. 
 If we
make change according to HDFS-12990, then this incompatible change 
does not
make incompatible change compatible.  Other ports are not reverted
according to HDFS-12990.  User will encounter the bad taste in the 
mouth
that HDFS-9427 attempt to solve.  Please do consider both negative 
side
effects of reverting as well as incompatible minor release change.  
Thanks

Regards,
Eric

From: larry mccay <lmc...@apache.org <mailto:lmc...@apache.org>>
Date: Wednesday, January 10, 2018 at 10:53 AM
To: Daryn Sharp <da...@oath.com <mailto:da...@oath.com>>
Cc: "Aaron T. Myers" <a...@apache.org <mailto:a...@apache.org>>, Eric Yang 
<ey...@hortonworks.com <mailto:ey...@hortonworks.com>>,
    Chris Douglas <cdoug...@apache.org <mailto:cdoug...@apache.org>>, 
Hadoop Common <
common-dev@hadoop.apache.org <mailto:common-dev@hadoop.apache.org>>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com 
<mailto:da...@oath.com>mailto:da...@oath.com>>> wrote:

I fully agree the port changes should be reverted.  Although
"incompatible", the potential impact to existing 2.x deploys is 
huge.  I'd
rather inconvenience 3.0 deploys that compromise <1% customers.  An
incompatible change to revert an incompatible change is called
compatibility.

+1




Most importantly, consider that there is no good upgrade path 
existing
deploys, esp. large and/or multi-cluster environments.  It’s only 
feasible
for first-time deploys or simple single-cluster upgrades willing to 
take
downtime.  Let's consider a few reasons why:



1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
bundles the configs, there's no way to transparently coordinate the 
switch
to the new bundl

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-19 Thread Chris Douglas
e allowed to have incompatible
>> changes according to our policy.
>> Since compatibility is more important than convenience, Solution #3.1 is
>> impermissible.  For the remaining solutions, both #1 and #2 are fine to me.
>> Thanks.Tsz-Wo
>>
>>
>>  On Friday, January 12, 2018, 12:26:47 PM GMT+8, Chris Douglas
>> <cdoug...@apache.org> wrote:
>> On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com> wrote:
>>
>>   The question is: how are we going to fix it?
>>
>>
>> What do you propose? -C
>>
>>
>>
>>
>>> No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot releases
>>> only allow bug fixes.
>>
>>
>> We may not like the statement above but it is our compatibility policy.
>> We should either follow the policy or revise it.
>>
>> Some more questions:
>> - What if someone is already using 3.0.0 and has changed all the
>> scripts to 9820?  Just let them fail?
>> - Compared to 2.x, 3.0.0 has many incompatible changes. Are we going
>> to have other incompatible changes in the future minor and dot releases?
>> What is the criteria to decide which incompatible changes are allowed?
>> - I hate that we have prematurely released 3.0.0 and make 3.0.1
>> incompatible to 3.0.0. If the "bug" is that serious, why not fixing it in
>> 4.0.0 and declare 3.x as dead?
>> - It seems obvious that no one has seriously tested it so that the
>> problem is not uncovered until now. Are there bugs in our current release
>> procedure?
>>
>> ThanksTsz-Wo
>>
>>
>>  On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris Douglas
>> <cdoug...@apache.org> wrote:
>> Isn't this limited to reverting the 8020 -> 9820 change? -C
>>
>> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:
>>
>>> The fix in HDFS-9427 can potentially bring in new customers because less
>>> chance for new comer to encountering “port already in use” problem.  If
>>> we
>>> make change according to HDFS-12990, then this incompatible change does
>>> not
>>> make incompatible change compatible.  Other ports are not reverted
>>> according to HDFS-12990.  User will encounter the bad taste in the mouth
>>> that HDFS-9427 attempt to solve.  Please do consider both negative side
>>> effects of reverting as well as incompatible minor release change.
>>> Thanks
>>>
>>> Regards,
>>> Eric
>>>
>>> From: larry mccay <lmc...@apache.org>
>>> Date: Wednesday, January 10, 2018 at 10:53 AM
>>> To: Daryn Sharp <da...@oath.com>
>>> Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>,
>>> Chris Douglas <cdoug...@apache.org>, Hadoop Common <
>>> common-dev@hadoop.apache.org>
>>> Subject: Re: When are incompatible changes acceptable (HDFS-12990)
>>>
>>> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com>> da...@oath.com>> wrote:
>>>
>>> I fully agree the port changes should be reverted.  Although
>>> "incompatible", the potential impact to existing 2.x deploys is huge.
>>> I'd
>>> rather inconvenience 3.0 deploys that compromise <1% customers.  An
>>> incompatible change to revert an incompatible change is called
>>> compatibility.
>>>
>>> +1
>>>
>>>
>>>
>>>
>>> Most importantly, consider that there is no good upgrade path existing
>>> deploys, esp. large and/or multi-cluster environments.  It’s only
>>> feasible
>>> for first-time deploys or simple single-cluster upgrades willing to take
>>> downtime.  Let's consider a few reasons why:
>>>
>>>
>>>
>>> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
>>> bundles the configs, there's no way to transparently coordinate the
>>> switch
>>> to the new bundle with the port changed.  Job submissions will fail.
>>>
>>>
>>>
>>> 2. Users generally do not add the rpc port number to uris so unless their
>>> configs are updated they will contact the wrong port.  Seamlessly
>>> coordinating the conf change without massive failures is impossible.
>>>
>>>
>>>
>>> 3. Even if client confs are updated, they will break in a multi-cluster
>>> env with NNs using different ports.  Users/services will be forced to add
>>> the 

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-19 Thread Akira Ajisaka
e minor release change.  Thanks

Regards,
Eric

From: larry mccay <lmc...@apache.org>
Date: Wednesday, January 10, 2018 at 10:53 AM
To: Daryn Sharp <da...@oath.com>
Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>,
Chris Douglas <cdoug...@apache.org>, Hadoop Common <
common-dev@hadoop.apache.org>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com> wrote:

I fully agree the port changes should be reverted.  Although
"incompatible", the potential impact to existing 2.x deploys is huge.  I'd
rather inconvenience 3.0 deploys that compromise <1% customers.  An
incompatible change to revert an incompatible change is called
compatibility.

+1




Most importantly, consider that there is no good upgrade path existing
deploys, esp. large and/or multi-cluster environments.  It’s only feasible
for first-time deploys or simple single-cluster upgrades willing to take
downtime.  Let's consider a few reasons why:



1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
bundles the configs, there's no way to transparently coordinate the switch
to the new bundle with the port changed.  Job submissions will fail.



2. Users generally do not add the rpc port number to uris so unless their
configs are updated they will contact the wrong port.  Seamlessly
coordinating the conf change without massive failures is impossible.



3. Even if client confs are updated, they will break in a multi-cluster
env with NNs using different ports.  Users/services will be forced to add
the port.  The cited hive "issue" is not a bug since it's the only way to
work in a multi-port env.



4. Coordinating the port add/change of uris is systems everywhere (you
know something will be missed), updating of confs, restarting all services,
requiring customers to redeploy their workflows in sync with the NN
upgrade, will cause mass disruption and downtime that will be unacceptable
for production environments.



This is a solution to a non-existent problem.  Ports can be bound by
multiple processes but only 1 can listen.  Maybe multiple listeners is an
issue for compute nodes but not responsibly managed service nodes.  Ie. Who
runs arbitrary services on the NNs that bind to random ports?  Besides, the
default port is and was ephemeral so it solved nothing.



This either standardizes ports to a particular customer's ports or is a
poorly thought out whim.  In either case, the needs of the many outweigh
the needs of the few/none (3.0 users).  The only logical conclusion is
revert.  If a particular site wants to change default ports and deal with
the massive fallout, they can explicitly change the ports themselves.



Daryn

On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <a...@apache.org> wrote:
On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com> wrote:


While I agree the original port change was unnecessary, I don’t think
Hadoop NN port change is a bad thing.

I worked for a Hadoop distro that NN RPC port was default to port 9000.
When we migrate from BigInsights to IOP and now to HDP, we have to move
customer Hive metadata to new NN RPC port.  It only took one developer
(myself) to write the tool for the migration.  The incurring workload is
not as bad as most people anticipated because Hadoop depends on
configuration file for referencing namenode.  Most of the code can work
transparently.  It helped to harden the downstream testing tools to be

more

robust.



While there are of course ways to deal with this, the question really
should be whether or not it's a desirable thing to do to our users.




We will never know how many people are actively working on Hadoop 3.0.0.
Perhaps, couple hundred developers or thousands.



You're right that we can't know for sure, but I strongly suspect that this
is a substantial overestimate. Given how conservative Hadoop operators tend
to be, I view it as exceptionally unlikely that many deployments have been
created on or upgraded to Hadoop 3.0.0 since it was released less than a
month ago.

Further, I hope you'll agree that the number of
users/developers/deployments/applications which are currently on Hadoop 2.x
is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so
quickly. When all of those users upgrade to any 3.x version, they will
encounter this needless incompatible change and be forced to work around
it.



I think the switch back may have saved few developers work, but there
could be more people getting impacted at unexpected minor release change

in

the future.  I recommend keeping current values to avoid rule bending and
future frustrations.



That we allow this incompatible change now does not mean that we are
categorically allowing more incompatible changes in the future. My point is
that we should in all instances evaluate the merit of any incompatible
change on a case

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-17 Thread Tsz Wo (Nicholas), Sze
 (Re-sent. Just found that my previous email seems not delivered to common-dev.)
   
> > The question is: how are we going to fix it?>> What do you propose? -C
First of all, let's state clearly what is the problem about.  Please help me 
out if I have missed anything.
The problem reported by HDFS-12990 is that HDFS-9427 has changed NN default RPC 
port from 8020 to 9820.  HDFS-12990 claimed, “the NN RPC port change is painful 
for downstream on migrating to Hadoop 3.”
Note 1: This isn't a problem for HA cluster.Note 2: The port is configurable.  
User can set it to any value.Note 3: HDFS-9427 has also changed many other 
HTTP/RPC ports as shown below
Namenode ports: 50470 --> 9871, 50070 --> 9870, 8020 --> 9820Secondary NN 
ports: 50091 --> 9869, 50090 --> 9868Datanode ports: 50020 --> 9867, 50010 --> 
9866, 50475 --> 9865, 50075 --> 9864 
The other port changes probably also affect downstream projects and give them a 
“painful” experience.  For example, NN UI and WebHDFS use a different port.
The problem is related convenience but not anything serious like a security bug.
There are a few possible solutions:1) Considered that the port changes are not 
limited to NN RPC and the default port value should not be hardcoded.  Also, 
downstream projects probably need to fix other hardcoded ports (e.g. WebHDFS) 
anyway.  Let’s just keep all the port changes and document them clearly about 
the changes (we may throw an exception if some applications try to connect to 
the old ports.)  In this way, 3.0.1 is compatible with 3.0.0.
2) Further change the NN RPC so that NN listens to both 8020 and 9820 by 
default.  It is a new feature that NN listen to two ports simultaneously.  The 
feature has other benefits, e.g. one of the ports is reserved to some high 
priority applications so that it can have a better response time.  It is 
compatible to both 2.x and 3.0.0. Of course, users could choose to set it back 
to one of the ports in the conf.
3) Revert the NN RPC port back to 8020.  We need to ask where should the revert 
happen?3.1) Revert it in 3.0.1 as proposed by HDFS-12990.  However, this is an 
incompatible change between dot releases 3.0.0 and 3.0.1 and it violates our 
policy.  Being compatible is very important.  Users expect 3.0.0 and 3.0.1 are 
compatible.  How could we explain 3.0.0 and 3.0.1 are incompatible due to 
convenience?3.2) Revert it in 4.0.0.  There is no compatibility issue since 
3.0.0 and 4.0.0 are allowed to have incompatible changes according to our 
policy.
Since compatibility is more important than convenience, Solution #3.1 is 
impermissible.  For the remaining solutions, both #1 and #2 are fine to me.
Thanks.Tsz-Wo


On Friday, January 12, 2018, 12:26:47 PM GMT+8, Chris Douglas 
<cdoug...@apache.org> wrote:  
On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com> wrote:

 
The question is: how are we going to fix it?


What do you propose? -C




> No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot releases 
> only allow bug fixes.

We may not like the statement above but it is our compatibility policy.  We 
should either follow the policy or revise it.

Some more questions:
   
   - What if someone is already using 3.0.0 and has changed all the scripts to 
9820?  Just let them fail?
   - Compared to 2.x, 3.0.0 has many incompatible changes. Are we going to have 
other incompatible changes in the future minor and dot releases? What is the 
criteria to decide which incompatible changes are allowed?
   - I hate that we have prematurely released 3.0.0 and make 3.0.1 incompatible 
to 3.0.0. If the "bug" is that serious, why not fixing it in 4.0.0 and declare 
3.x as dead?
   - It seems obvious that no one has seriously tested it so that the problem 
is not uncovered until now. Are there bugs in our current release procedure?

ThanksTsz-Wo


On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris Douglas 
<cdoug...@apache.org> wrote:  
 
 Isn't this limited to reverting the 8020 -> 9820 change? -C

On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:

> The fix in HDFS-9427 can potentially bring in new customers because less
> chance for new comer to encountering “port already in use” problem.  If we
> make change according to HDFS-12990, then this incompatible change does not
> make incompatible change compatible.  Other ports are not reverted
> according to HDFS-12990.  User will encounter the bad taste in the mouth
> that HDFS-9427 attempt to solve.  Please do consider both negative side
> effects of reverting as well as incompatible minor release change.  Thanks
>
> Regards,
> Eric
>
> From: larry mccay <lmc...@apache.org>
> Date: Wednesday, January 10, 2018 at 10:53 AM
> To: Daryn Sharp <da...@oath.com>
> Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.co

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-11 Thread Chris Douglas
On Thu, Jan 11, 2018 at 6:34 PM Tsz Wo Sze <szets...@yahoo.com> wrote:

> The question is: how are we going to fix it?
>

What do you propose? -C

> No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot releases
> only allow bug fixes.
>
> We may not like the statement above but it is our compatibility policy.
> We should either follow the policy or revise it.
>
> Some more questions:
>
>- What if someone is already using 3.0.0 and has changed all the
>scripts to 9820?  Just let them fail?
>- Compared to 2.x, 3.0.0 has many incompatible changes. Are we going
>to have other incompatible changes in the future minor and dot releases?
>What is the criteria to decide which incompatible changes are allowed?
>- I hate that we have prematurely released 3.0.0 and make 3.0.1
>incompatible to 3.0.0. If the "bug" is that serious, why not fixing it in
>4.0.0 and declare 3.x as dead?
>- It seems obvious that no one has seriously tested it so that the
>problem is not uncovered until now. Are there bugs in our current release
>procedure?
>
>
> Thanks
> Tsz-Wo
>
>
>
> On Thursday, January 11, 2018, 11:36:33 AM GMT+8, Chris Douglas <
> cdoug...@apache.org> wrote:
>
>
> Isn't this limited to reverting the 8020 -> 9820 change? -C
>
> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:
>
> > The fix in HDFS-9427 can potentially bring in new customers because less
> > chance for new comer to encountering “port already in use” problem.  If
> we
> > make change according to HDFS-12990, then this incompatible change does
> not
> > make incompatible change compatible.  Other ports are not reverted
> > according to HDFS-12990.  User will encounter the bad taste in the mouth
> > that HDFS-9427 attempt to solve.  Please do consider both negative side
> > effects of reverting as well as incompatible minor release change.
> Thanks
> >
> > Regards,
> > Eric
> >
> > From: larry mccay <lmc...@apache.org>
> > Date: Wednesday, January 10, 2018 at 10:53 AM
> > To: Daryn Sharp <da...@oath.com>
> > Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com
> >,
> > Chris Douglas <cdoug...@apache.org>, Hadoop Common <
> > common-dev@hadoop.apache.org>
> > Subject: Re: When are incompatible changes acceptable (HDFS-12990)
> >
> > On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com > da...@oath.com>> wrote:
> >
> > I fully agree the port changes should be reverted.  Although
> > "incompatible", the potential impact to existing 2.x deploys is huge.
> I'd
> > rather inconvenience 3.0 deploys that compromise <1% customers.  An
> > incompatible change to revert an incompatible change is called
> > compatibility.
> >
> > +1
> >
> >
> >
> >
> > Most importantly, consider that there is no good upgrade path existing
> > deploys, esp. large and/or multi-cluster environments.  It’s only
> feasible
> > for first-time deploys or simple single-cluster upgrades willing to take
> > downtime.  Let's consider a few reasons why:
> >
> >
> >
> > 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> > bundles the configs, there's no way to transparently coordinate the
> switch
> > to the new bundle with the port changed.  Job submissions will fail.
> >
> >
> >
> > 2. Users generally do not add the rpc port number to uris so unless their
> > configs are updated they will contact the wrong port.  Seamlessly
> > coordinating the conf change without massive failures is impossible.
> >
> >
> >
> > 3. Even if client confs are updated, they will break in a multi-cluster
> > env with NNs using different ports.  Users/services will be forced to add
> > the port.  The cited hive "issue" is not a bug since it's the only way to
> > work in a multi-port env.
> >
> >
> >
> > 4. Coordinating the port add/change of uris is systems everywhere (you
> > know something will be missed), updating of confs, restarting all
> services,
> > requiring customers to redeploy their workflows in sync with the NN
> > upgrade, will cause mass disruption and downtime that will be
> unacceptable
> > for production environments.
> >
> >
> >
> > This is a solution to a non-existent problem.  Ports can be bound by
> > multiple processes but only 1 can listen.  Maybe multiple listeners is an
> > issue for compute nodes but not responsibly m

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-11 Thread Aaron T. Myers
Yes indeed, that's the proposal being discussed on HDFS-12990 - just to
revert the default NN RPC port change, and none of the other port changes.
The other default port changes actually do have some technical benefit, and
I believe are far less likely to be embedded in databases, scripts, tests,
etc. in real deployments.

Best,
Aaron

On Thu, Jan 11, 2018 at 11:42 AM, larry mccay <lmc...@apache.org> wrote:

> No, the proposal was to only fix the NN port change - as I understood it.
>
> On Thu, Jan 11, 2018 at 2:01 PM, Eric Yang <ey...@hortonworks.com> wrote:
>
> > If I am reading this correctly, Daryn and Larry are in favor of complete
> > revert instead of namenode only.  Please charm in if I am wrong.  This is
> > the reason that I try to explore each perspective to understand the cost
> of
> > each options.  It appears that we have a fragment of opinions, and only
> one
> > choice will serve the need of majority of the community.  It would be
> good
> > for a PMC to call the vote at reasonable pace to address this issue to
> > reduce the pain point from either side of oppositions.
> >
> >
> >
> > Regards,
> >
> > Eric
> >
> >
> >
> > *From: *Chris Douglas <cdoug...@apache.org>
> > *Date: *Wednesday, January 10, 2018 at 7:36 PM
> > *To: *Eric Yang <ey...@hortonworks.com>
> > *Cc: *"Aaron T. Myers" <a...@apache.org>, Daryn Sharp <da...@oath.com>,
> > Hadoop Common <common-dev@hadoop.apache.org>, larry mccay <
> > lmc...@apache.org>
> >
> > *Subject: *Re: When are incompatible changes acceptable (HDFS-12990)
> >
> >
> >
> > Isn't this limited to reverting the 8020 -> 9820 change? -C
> >
> >
> >
> > On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:
> >
> > The fix in HDFS-9427 can potentially bring in new customers because less
> > chance for new comer to encountering “port already in use” problem.  If
> we
> > make change according to HDFS-12990, then this incompatible change does
> not
> > make incompatible change compatible.  Other ports are not reverted
> > according to HDFS-12990.  User will encounter the bad taste in the mouth
> > that HDFS-9427 attempt to solve.  Please do consider both negative side
> > effects of reverting as well as incompatible minor release change.
> Thanks
> >
> > Regards,
> > Eric
> >
> > From: larry mccay <lmc...@apache.org>
> > Date: Wednesday, January 10, 2018 at 10:53 AM
> > To: Daryn Sharp <da...@oath.com>
> > Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com
> >,
> > Chris Douglas <cdoug...@apache.org>, Hadoop Common <
> > common-dev@hadoop.apache.org>
> > Subject: Re: When are incompatible changes acceptable (HDFS-12990)
> >
> > On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com daryn@
> > oath.com>> wrote:
> >
> > I fully agree the port changes should be reverted.  Although
> > "incompatible", the potential impact to existing 2.x deploys is huge.
> I'd
> > rather inconvenience 3.0 deploys that compromise <1% customers.  An
> > incompatible change to revert an incompatible change is called
> > compatibility.
> >
> > +1
> >
> >
> >
> >
> > Most importantly, consider that there is no good upgrade path existing
> > deploys, esp. large and/or multi-cluster environments.  It’s only
> feasible
> > for first-time deploys or simple single-cluster upgrades willing to take
> > downtime.  Let's consider a few reasons why:
> >
> >
> >
> > 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> > bundles the configs, there's no way to transparently coordinate the
> switch
> > to the new bundle with the port changed.  Job submissions will fail.
> >
> >
> >
> > 2. Users generally do not add the rpc port number to uris so unless their
> > configs are updated they will contact the wrong port.  Seamlessly
> > coordinating the conf change without massive failures is impossible.
> >
> >
> >
> > 3. Even if client confs are updated, they will break in a multi-cluster
> > env with NNs using different ports.  Users/services will be forced to add
> > the port.  The cited hive "issue" is not a bug since it's the only way to
> > work in a multi-port env.
> >
> >
> >
> > 4. Coordinating the port add/change of uris is systems everywhere (you
> > know something will be missed), updating of

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-11 Thread larry mccay
No, the proposal was to only fix the NN port change - as I understood it.

On Thu, Jan 11, 2018 at 2:01 PM, Eric Yang <ey...@hortonworks.com> wrote:

> If I am reading this correctly, Daryn and Larry are in favor of complete
> revert instead of namenode only.  Please charm in if I am wrong.  This is
> the reason that I try to explore each perspective to understand the cost of
> each options.  It appears that we have a fragment of opinions, and only one
> choice will serve the need of majority of the community.  It would be good
> for a PMC to call the vote at reasonable pace to address this issue to
> reduce the pain point from either side of oppositions.
>
>
>
> Regards,
>
> Eric
>
>
>
> *From: *Chris Douglas <cdoug...@apache.org>
> *Date: *Wednesday, January 10, 2018 at 7:36 PM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *"Aaron T. Myers" <a...@apache.org>, Daryn Sharp <da...@oath.com>,
> Hadoop Common <common-dev@hadoop.apache.org>, larry mccay <
> lmc...@apache.org>
>
> *Subject: *Re: When are incompatible changes acceptable (HDFS-12990)
>
>
>
> Isn't this limited to reverting the 8020 -> 9820 change? -C
>
>
>
> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:
>
> The fix in HDFS-9427 can potentially bring in new customers because less
> chance for new comer to encountering “port already in use” problem.  If we
> make change according to HDFS-12990, then this incompatible change does not
> make incompatible change compatible.  Other ports are not reverted
> according to HDFS-12990.  User will encounter the bad taste in the mouth
> that HDFS-9427 attempt to solve.  Please do consider both negative side
> effects of reverting as well as incompatible minor release change.  Thanks
>
> Regards,
> Eric
>
> From: larry mccay <lmc...@apache.org>
> Date: Wednesday, January 10, 2018 at 10:53 AM
> To: Daryn Sharp <da...@oath.com>
> Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>,
> Chris Douglas <cdoug...@apache.org>, Hadoop Common <
> common-dev@hadoop.apache.org>
> Subject: Re: When are incompatible changes acceptable (HDFS-12990)
>
> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com<mailto:daryn@
> oath.com>> wrote:
>
> I fully agree the port changes should be reverted.  Although
> "incompatible", the potential impact to existing 2.x deploys is huge.  I'd
> rather inconvenience 3.0 deploys that compromise <1% customers.  An
> incompatible change to revert an incompatible change is called
> compatibility.
>
> +1
>
>
>
>
> Most importantly, consider that there is no good upgrade path existing
> deploys, esp. large and/or multi-cluster environments.  It’s only feasible
> for first-time deploys or simple single-cluster upgrades willing to take
> downtime.  Let's consider a few reasons why:
>
>
>
> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> bundles the configs, there's no way to transparently coordinate the switch
> to the new bundle with the port changed.  Job submissions will fail.
>
>
>
> 2. Users generally do not add the rpc port number to uris so unless their
> configs are updated they will contact the wrong port.  Seamlessly
> coordinating the conf change without massive failures is impossible.
>
>
>
> 3. Even if client confs are updated, they will break in a multi-cluster
> env with NNs using different ports.  Users/services will be forced to add
> the port.  The cited hive "issue" is not a bug since it's the only way to
> work in a multi-port env.
>
>
>
> 4. Coordinating the port add/change of uris is systems everywhere (you
> know something will be missed), updating of confs, restarting all services,
> requiring customers to redeploy their workflows in sync with the NN
> upgrade, will cause mass disruption and downtime that will be unacceptable
> for production environments.
>
>
>
> This is a solution to a non-existent problem.  Ports can be bound by
> multiple processes but only 1 can listen.  Maybe multiple listeners is an
> issue for compute nodes but not responsibly managed service nodes.  Ie. Who
> runs arbitrary services on the NNs that bind to random ports?  Besides, the
> default port is and was ephemeral so it solved nothing.
>
>
>
> This either standardizes ports to a particular customer's ports or is a
> poorly thought out whim.  In either case, the needs of the many outweigh
> the needs of the few/none (3.0 users).  The only logical conclusion is
> revert.  If a particular site wants to change default ports and deal with
> the 

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-11 Thread Eric Yang
If I am reading this correctly, Daryn and Larry are in favor of complete revert 
instead of namenode only.  Please charm in if I am wrong.  This is the reason 
that I try to explore each perspective to understand the cost of each options.  
It appears that we have a fragment of opinions, and only one choice will serve 
the need of majority of the community.  It would be good for a PMC to call the 
vote at reasonable pace to address this issue to reduce the pain point from 
either side of oppositions.

Regards,
Eric

From: Chris Douglas <cdoug...@apache.org>
Date: Wednesday, January 10, 2018 at 7:36 PM
To: Eric Yang <ey...@hortonworks.com>
Cc: "Aaron T. Myers" <a...@apache.org>, Daryn Sharp <da...@oath.com>, Hadoop 
Common <common-dev@hadoop.apache.org>, larry mccay <lmc...@apache.org>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

Isn't this limited to reverting the 8020 -> 9820 change? -C

On Wed, Jan 10, 2018 at 6:13 PM Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:
The fix in HDFS-9427 can potentially bring in new customers because less chance 
for new comer to encountering “port already in use” problem.  If we make change 
according to HDFS-12990, then this incompatible change does not make 
incompatible change compatible.  Other ports are not reverted according to 
HDFS-12990.  User will encounter the bad taste in the mouth that HDFS-9427 
attempt to solve.  Please do consider both negative side effects of reverting 
as well as incompatible minor release change.  Thanks

Regards,
Eric

From: larry mccay <lmc...@apache.org<mailto:lmc...@apache.org>>
Date: Wednesday, January 10, 2018 at 10:53 AM
To: Daryn Sharp <da...@oath.com<mailto:da...@oath.com>>
Cc: "Aaron T. Myers" <a...@apache.org<mailto:a...@apache.org>>, Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>>, Chris Douglas 
<cdoug...@apache.org<mailto:cdoug...@apache.org>>, Hadoop Common 
<common-dev@hadoop.apache.org<mailto:common-dev@hadoop.apache.org>>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp 
<da...@oath.com<mailto:da...@oath.com><mailto:da...@oath.com<mailto:da...@oath.com>>>
 wrote:

I fully agree the port changes should be reverted.  Although "incompatible", 
the potential impact to existing 2.x deploys is huge.  I'd rather inconvenience 
3.0 deploys that compromise <1% customers.  An incompatible change to revert an 
incompatible change is called compatibility.

+1




Most importantly, consider that there is no good upgrade path existing deploys, 
esp. large and/or multi-cluster environments.  It’s only feasible for 
first-time deploys or simple single-cluster upgrades willing to take downtime.  
Let's consider a few reasons why:



1. RU is completely broken.  Running jobs will fail.  If MR on hdfs bundles the 
configs, there's no way to transparently coordinate the switch to the new 
bundle with the port changed.  Job submissions will fail.



2. Users generally do not add the rpc port number to uris so unless their 
configs are updated they will contact the wrong port.  Seamlessly coordinating 
the conf change without massive failures is impossible.



3. Even if client confs are updated, they will break in a multi-cluster env 
with NNs using different ports.  Users/services will be forced to add the port. 
 The cited hive "issue" is not a bug since it's the only way to work in a 
multi-port env.



4. Coordinating the port add/change of uris is systems everywhere (you know 
something will be missed), updating of confs, restarting all services, 
requiring customers to redeploy their workflows in sync with the NN upgrade, 
will cause mass disruption and downtime that will be unacceptable for 
production environments.



This is a solution to a non-existent problem.  Ports can be bound by multiple 
processes but only 1 can listen.  Maybe multiple listeners is an issue for 
compute nodes but not responsibly managed service nodes.  Ie. Who runs 
arbitrary services on the NNs that bind to random ports?  Besides, the default 
port is and was ephemeral so it solved nothing.



This either standardizes ports to a particular customer's ports or is a poorly 
thought out whim.  In either case, the needs of the many outweigh the needs of 
the few/none (3.0 users).  The only logical conclusion is revert.  If a 
particular site wants to change default ports and deal with the massive 
fallout, they can explicitly change the ports themselves.



Daryn

On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers 
<a...@apache.org<mailto:a...@apache.org><mailto:a...@apache.org<mailto:a...@apache.org>>>
 wrote:
On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com><mailt

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread Chris Douglas
Isn't this limited to reverting the 8020 -> 9820 change? -C

On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:

> The fix in HDFS-9427 can potentially bring in new customers because less
> chance for new comer to encountering “port already in use” problem.  If we
> make change according to HDFS-12990, then this incompatible change does not
> make incompatible change compatible.  Other ports are not reverted
> according to HDFS-12990.  User will encounter the bad taste in the mouth
> that HDFS-9427 attempt to solve.  Please do consider both negative side
> effects of reverting as well as incompatible minor release change.  Thanks
>
> Regards,
> Eric
>
> From: larry mccay <lmc...@apache.org>
> Date: Wednesday, January 10, 2018 at 10:53 AM
> To: Daryn Sharp <da...@oath.com>
> Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>,
> Chris Douglas <cdoug...@apache.org>, Hadoop Common <
> common-dev@hadoop.apache.org>
> Subject: Re: When are incompatible changes acceptable (HDFS-12990)
>
> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com da...@oath.com>> wrote:
>
> I fully agree the port changes should be reverted.  Although
> "incompatible", the potential impact to existing 2.x deploys is huge.  I'd
> rather inconvenience 3.0 deploys that compromise <1% customers.  An
> incompatible change to revert an incompatible change is called
> compatibility.
>
> +1
>
>
>
>
> Most importantly, consider that there is no good upgrade path existing
> deploys, esp. large and/or multi-cluster environments.  It’s only feasible
> for first-time deploys or simple single-cluster upgrades willing to take
> downtime.  Let's consider a few reasons why:
>
>
>
> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> bundles the configs, there's no way to transparently coordinate the switch
> to the new bundle with the port changed.  Job submissions will fail.
>
>
>
> 2. Users generally do not add the rpc port number to uris so unless their
> configs are updated they will contact the wrong port.  Seamlessly
> coordinating the conf change without massive failures is impossible.
>
>
>
> 3. Even if client confs are updated, they will break in a multi-cluster
> env with NNs using different ports.  Users/services will be forced to add
> the port.  The cited hive "issue" is not a bug since it's the only way to
> work in a multi-port env.
>
>
>
> 4. Coordinating the port add/change of uris is systems everywhere (you
> know something will be missed), updating of confs, restarting all services,
> requiring customers to redeploy their workflows in sync with the NN
> upgrade, will cause mass disruption and downtime that will be unacceptable
> for production environments.
>
>
>
> This is a solution to a non-existent problem.  Ports can be bound by
> multiple processes but only 1 can listen.  Maybe multiple listeners is an
> issue for compute nodes but not responsibly managed service nodes.  Ie. Who
> runs arbitrary services on the NNs that bind to random ports?  Besides, the
> default port is and was ephemeral so it solved nothing.
>
>
>
> This either standardizes ports to a particular customer's ports or is a
> poorly thought out whim.  In either case, the needs of the many outweigh
> the needs of the few/none (3.0 users).  The only logical conclusion is
> revert.  If a particular site wants to change default ports and deal with
> the massive fallout, they can explicitly change the ports themselves.
>
>
>
> Daryn
>
> On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers <a...@apache.org a...@apache.org>> wrote:
> On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com ey...@hortonworks.com>> wrote:
>
> > While I agree the original port change was unnecessary, I don’t think
> > Hadoop NN port change is a bad thing.
> >
> > I worked for a Hadoop distro that NN RPC port was default to port 9000.
> > When we migrate from BigInsights to IOP and now to HDP, we have to move
> > customer Hive metadata to new NN RPC port.  It only took one developer
> > (myself) to write the tool for the migration.  The incurring workload is
> > not as bad as most people anticipated because Hadoop depends on
> > configuration file for referencing namenode.  Most of the code can work
> > transparently.  It helped to harden the downstream testing tools to be
> more
> > robust.
> >
>
> While there are of course ways to deal with this, the question really
> should be whether or not it's a desirable thing to do to our users.
>
>
> >
> > We will 

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread Eric Yang
The fix in HDFS-9427 can potentially bring in new customers because less chance 
for new comer to encountering “port already in use” problem.  If we make change 
according to HDFS-12990, then this incompatible change does not make 
incompatible change compatible.  Other ports are not reverted according to 
HDFS-12990.  User will encounter the bad taste in the mouth that HDFS-9427 
attempt to solve.  Please do consider both negative side effects of reverting 
as well as incompatible minor release change.  Thanks

Regards,
Eric

From: larry mccay <lmc...@apache.org>
Date: Wednesday, January 10, 2018 at 10:53 AM
To: Daryn Sharp <da...@oath.com>
Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>, 
Chris Douglas <cdoug...@apache.org>, Hadoop Common 
<common-dev@hadoop.apache.org>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp 
<da...@oath.com<mailto:da...@oath.com>> wrote:

I fully agree the port changes should be reverted.  Although "incompatible", 
the potential impact to existing 2.x deploys is huge.  I'd rather inconvenience 
3.0 deploys that compromise <1% customers.  An incompatible change to revert an 
incompatible change is called compatibility.

+1




Most importantly, consider that there is no good upgrade path existing deploys, 
esp. large and/or multi-cluster environments.  It’s only feasible for 
first-time deploys or simple single-cluster upgrades willing to take downtime.  
Let's consider a few reasons why:



1. RU is completely broken.  Running jobs will fail.  If MR on hdfs bundles the 
configs, there's no way to transparently coordinate the switch to the new 
bundle with the port changed.  Job submissions will fail.



2. Users generally do not add the rpc port number to uris so unless their 
configs are updated they will contact the wrong port.  Seamlessly coordinating 
the conf change without massive failures is impossible.



3. Even if client confs are updated, they will break in a multi-cluster env 
with NNs using different ports.  Users/services will be forced to add the port. 
 The cited hive "issue" is not a bug since it's the only way to work in a 
multi-port env.



4. Coordinating the port add/change of uris is systems everywhere (you know 
something will be missed), updating of confs, restarting all services, 
requiring customers to redeploy their workflows in sync with the NN upgrade, 
will cause mass disruption and downtime that will be unacceptable for 
production environments.



This is a solution to a non-existent problem.  Ports can be bound by multiple 
processes but only 1 can listen.  Maybe multiple listeners is an issue for 
compute nodes but not responsibly managed service nodes.  Ie. Who runs 
arbitrary services on the NNs that bind to random ports?  Besides, the default 
port is and was ephemeral so it solved nothing.



This either standardizes ports to a particular customer's ports or is a poorly 
thought out whim.  In either case, the needs of the many outweigh the needs of 
the few/none (3.0 users).  The only logical conclusion is revert.  If a 
particular site wants to change default ports and deal with the massive 
fallout, they can explicitly change the ports themselves.



Daryn

On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers 
<a...@apache.org<mailto:a...@apache.org>> wrote:
On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:

> While I agree the original port change was unnecessary, I don’t think
> Hadoop NN port change is a bad thing.
>
> I worked for a Hadoop distro that NN RPC port was default to port 9000.
> When we migrate from BigInsights to IOP and now to HDP, we have to move
> customer Hive metadata to new NN RPC port.  It only took one developer
> (myself) to write the tool for the migration.  The incurring workload is
> not as bad as most people anticipated because Hadoop depends on
> configuration file for referencing namenode.  Most of the code can work
> transparently.  It helped to harden the downstream testing tools to be more
> robust.
>

While there are of course ways to deal with this, the question really
should be whether or not it's a desirable thing to do to our users.


>
> We will never know how many people are actively working on Hadoop 3.0.0.
> Perhaps, couple hundred developers or thousands.


You're right that we can't know for sure, but I strongly suspect that this
is a substantial overestimate. Given how conservative Hadoop operators tend
to be, I view it as exceptionally unlikely that many deployments have been
created on or upgraded to Hadoop 3.0.0 since it was released less than a
month ago.

Further, I hope you'll agree that the number of
users/developers/deployments/applications which are currently on Hadoop 2.x
is *vastly* great

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread larry mccay
On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp  wrote:

> I fully agree the port changes should be reverted.  Although
> "incompatible", the potential impact to existing 2.x deploys is huge.  I'd
> rather inconvenience 3.0 deploys that compromise <1% customers.  An
> incompatible change to revert an incompatible change is called
> compatibility.
>

+1


>
> Most importantly, consider that there is no good upgrade path existing
> deploys, esp. large and/or multi-cluster environments.  It’s only feasible
> for first-time deploys or simple single-cluster upgrades willing to take
> downtime.  Let's consider a few reasons why:
>
>
> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> bundles the configs, there's no way to transparently coordinate the switch
> to the new bundle with the port changed.  Job submissions will fail.
>
>
> 2. Users generally do not add the rpc port number to uris so unless their
> configs are updated they will contact the wrong port.  Seamlessly
> coordinating the conf change without massive failures is impossible.
>
>
> 3. Even if client confs are updated, they will break in a multi-cluster
> env with NNs using different ports.  Users/services will be forced to add
> the port.  The cited hive "issue" is not a bug since it's the only way to
> work in a multi-port env.
>
>
> 4. Coordinating the port add/change of uris is systems everywhere (you
> know something will be missed), updating of confs, restarting all services,
> requiring customers to redeploy their workflows in sync with the NN
> upgrade, will cause mass disruption and downtime that will be unacceptable
> for production environments.
>
>
> This is a solution to a non-existent problem.  Ports can be bound by
> multiple processes but only 1 can listen.  Maybe multiple listeners is an
> issue for compute nodes but not responsibly managed service nodes.  Ie. Who
> runs arbitrary services on the NNs that bind to random ports?  Besides, the
> default port is and was ephemeral so it solved nothing.
>
>
> This either standardizes ports to a particular customer's ports or is a
> poorly thought out whim.  In either case, the needs of the many outweigh
> the needs of the few/none (3.0 users).  The only logical conclusion is
> revert.  If a particular site wants to change default ports and deal with
> the massive fallout, they can explicitly change the ports themselves.
>
>
> Daryn
>
> On Tue, Jan 9, 2018 at 11:22 PM, Aaron T. Myers  wrote:
>
>> On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang  wrote:
>>
>> > While I agree the original port change was unnecessary, I don’t think
>> > Hadoop NN port change is a bad thing.
>> >
>> > I worked for a Hadoop distro that NN RPC port was default to port 9000.
>> > When we migrate from BigInsights to IOP and now to HDP, we have to move
>> > customer Hive metadata to new NN RPC port.  It only took one developer
>> > (myself) to write the tool for the migration.  The incurring workload is
>> > not as bad as most people anticipated because Hadoop depends on
>> > configuration file for referencing namenode.  Most of the code can work
>> > transparently.  It helped to harden the downstream testing tools to be
>> more
>> > robust.
>> >
>>
>> While there are of course ways to deal with this, the question really
>> should be whether or not it's a desirable thing to do to our users.
>>
>>
>> >
>> > We will never know how many people are actively working on Hadoop 3.0.0.
>> > Perhaps, couple hundred developers or thousands.
>>
>>
>> You're right that we can't know for sure, but I strongly suspect that this
>> is a substantial overestimate. Given how conservative Hadoop operators
>> tend
>> to be, I view it as exceptionally unlikely that many deployments have been
>> created on or upgraded to Hadoop 3.0.0 since it was released less than a
>> month ago.
>>
>> Further, I hope you'll agree that the number of
>> users/developers/deployments/applications which are currently on Hadoop
>> 2.x
>> is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so
>> quickly. When all of those users upgrade to any 3.x version, they will
>> encounter this needless incompatible change and be forced to work around
>> it.
>>
>>
>> > I think the switch back may have saved few developers work, but there
>> > could be more people getting impacted at unexpected minor release
>> change in
>> > the future.  I recommend keeping current values to avoid rule bending
>> and
>> > future frustrations.
>> >
>>
>> That we allow this incompatible change now does not mean that we are
>> categorically allowing more incompatible changes in the future. My point
>> is
>> that we should in all instances evaluate the merit of any incompatible
>> change on a case-by-case basis. This is not an exceptional circumstance -
>> we've made incompatible changes in the past when appropriate, e.g.
>> breaking
>> some clients to address a security issue. I and others believe that in

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread Eric Yang
See comments inline.

Regards,
Eric

From: <a...@cloudera.com> on behalf of "Aaron T. Myers" <a...@apache.org>
Date: Wednesday, January 10, 2018 at 9:21 AM
To: Eric Yang <ey...@hortonworks.com>
Cc: Chris Douglas <cdoug...@apache.org>, larry mccay <lmc...@apache.org>, 
Hadoop Common <common-dev@hadoop.apache.org>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

Hey Eric,

Comments inline.

On Wed, Jan 10, 2018 at 9:06 AM, Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:
Hi Aaron,

Correct me if I am wrong, the port change is only required when making a new 
cluster due to the default value.  Existing cluster does not need to make the 
switch, if Hadoop configuration contains user defined number.

Certainly true that a port change isn't required, and if it's already properly 
being set everywhere throughout a deployment (i.e. all clients, client 
applications, scripts, etc.) it won't be an issue. I'm most worried about 
*client* configs throughout a large deployment, which would be difficult 
(impossible?) to coordinate an update to. Entirely possible, if not likely, 
that many clients are inadvertently relying on the default port, so when they 
start using the updated software they'll break because of the default port 
change.

Ambari, and Cloudera Manager are already handling user defined ports correctly. 
 Some QA tools may need to change, but it is a good exercise to run on 
non-standard port.

Sites which are using Ambari or Cloudera Manager are more likely to work, but 
again, I worry about client configs and other places that might have hard-coded 
the port number, e.g. in Hive or in scripts.

I will also say that Hadoop users which are *not* using Ambari or CM should be 
considered as well. Sites like this are perhaps the most likely to break 
because of this change.

Agree.
I gave my vote to keep the setting, and fully respect the community’s decision 
in this matter.

Thanks, Eric. I understand your argument to be that changing this default port 
might not be so bad, but it also sounds like you wouldn't object if others 
conclude that it's best to change it back. Is that right?

The decision is in the hands of Apache Hadoop community.  This is not a 
decision that can be made by one individual, one company or another.  Let’s 
start a voting thread to make sure that the decision was made by Hadoop 
community correctly.

Best,
Aaron



Regards,
Eric

From: <a...@cloudera.com<mailto:a...@cloudera.com>> on behalf of "Aaron T. 
Myers" <a...@apache.org<mailto:a...@apache.org>>
Date: Tuesday, January 9, 2018 at 9:22 PM
To: Eric Yang <ey...@hortonworks.com<mailto:ey...@hortonworks.com>>
Cc: Chris Douglas <cdoug...@apache.org<mailto:cdoug...@apache.org>>, larry 
mccay <lmc...@apache.org<mailto:lmc...@apache.org>>, Hadoop Common 
<common-dev@hadoop.apache.org<mailto:common-dev@hadoop.apache.org>>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:
While I agree the original port change was unnecessary, I don’t think Hadoop NN 
port change is a bad thing.

I worked for a Hadoop distro that NN RPC port was default to port 9000.  When 
we migrate from BigInsights to IOP and now to HDP, we have to move customer 
Hive metadata to new NN RPC port.  It only took one developer (myself) to write 
the tool for the migration.  The incurring workload is not as bad as most 
people anticipated because Hadoop depends on configuration file for referencing 
namenode.  Most of the code can work transparently.  It helped to harden the 
downstream testing tools to be more robust.

While there are of course ways to deal with this, the question really should be 
whether or not it's a desirable thing to do to our users.


We will never know how many people are actively working on Hadoop 3.0.0.  
Perhaps, couple hundred developers or thousands.

You're right that we can't know for sure, but I strongly suspect that this is a 
substantial overestimate. Given how conservative Hadoop operators tend to be, I 
view it as exceptionally unlikely that many deployments have been created on or 
upgraded to Hadoop 3.0.0 since it was released less than a month ago.

Further, I hope you'll agree that the number of 
users/developers/deployments/applications which are currently on Hadoop 2.x is 
*vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so quickly. 
When all of those users upgrade to any 3.x version, they will encounter this 
needless incompatible change and be forced to work around it.

I think the switch back may have saved few developers work, but there could be 
more people getting impacted at unexpected minor release change in the future.  
I recommend keeping current values to avoid rule bending and future 

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread Aaron T. Myers
Hey Eric,

Comments inline.

On Wed, Jan 10, 2018 at 9:06 AM, Eric Yang <ey...@hortonworks.com> wrote:

> Hi Aaron,
>
>
>
> Correct me if I am wrong, the port change is only required when making a
> new cluster due to the default value.  Existing cluster does not need to
> make the switch, if Hadoop configuration contains user defined number.
>

Certainly true that a port change isn't required, and if it's already
properly being set everywhere throughout a deployment (i.e. all clients,
client applications, scripts, etc.) it won't be an issue. I'm most worried
about *client* configs throughout a large deployment, which would be
difficult (impossible?) to coordinate an update to. Entirely possible, if
not likely, that many clients are inadvertently relying on the default
port, so when they start using the updated software they'll break because
of the default port change.


> Ambari, and Cloudera Manager are already handling user defined ports
> correctly.  Some QA tools may need to change, but it is a good exercise to
> run on non-standard port.
>

Sites which are using Ambari or Cloudera Manager are more likely to work,
but again, I worry about client configs and other places that might have
hard-coded the port number, e.g. in Hive or in scripts.

I will also say that Hadoop users which are *not* using Ambari or CM should
be considered as well. Sites like this are perhaps the most likely to break
because of this change.


> I gave my vote to keep the setting, and fully respect the community’s
> decision in this matter.
>

Thanks, Eric. I understand your argument to be that changing this default
port might not be so bad, but it also sounds like you wouldn't object if
others conclude that it's best to change it back. Is that right?

Best,
Aaron



>
>
> Regards,
>
> Eric
>
>
>
> *From: *<a...@cloudera.com> on behalf of "Aaron T. Myers" <a...@apache.org>
> *Date: *Tuesday, January 9, 2018 at 9:22 PM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *Chris Douglas <cdoug...@apache.org>, larry mccay <lmc...@apache.org>,
> Hadoop Common <common-dev@hadoop.apache.org>
> *Subject: *Re: When are incompatible changes acceptable (HDFS-12990)
>
>
>
> On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang <ey...@hortonworks.com> wrote:
>
> While I agree the original port change was unnecessary, I don’t think
> Hadoop NN port change is a bad thing.
>
> I worked for a Hadoop distro that NN RPC port was default to port 9000.
> When we migrate from BigInsights to IOP and now to HDP, we have to move
> customer Hive metadata to new NN RPC port.  It only took one developer
> (myself) to write the tool for the migration.  The incurring workload is
> not as bad as most people anticipated because Hadoop depends on
> configuration file for referencing namenode.  Most of the code can work
> transparently.  It helped to harden the downstream testing tools to be more
> robust.
>
>
>
> While there are of course ways to deal with this, the question really
> should be whether or not it's a desirable thing to do to our users.
>
>
>
>
> We will never know how many people are actively working on Hadoop 3.0.0.
> Perhaps, couple hundred developers or thousands.
>
>
>
> You're right that we can't know for sure, but I strongly suspect that this
> is a substantial overestimate. Given how conservative Hadoop operators tend
> to be, I view it as exceptionally unlikely that many deployments have been
> created on or upgraded to Hadoop 3.0.0 since it was released less than a
> month ago.
>
>
>
> Further, I hope you'll agree that the number of
> users/developers/deployments/applications which are currently on Hadoop
> 2.x is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0
> so quickly. When all of those users upgrade to any 3.x version, they will
> encounter this needless incompatible change and be forced to work around it.
>
>
>
> I think the switch back may have saved few developers work, but there
> could be more people getting impacted at unexpected minor release change in
> the future.  I recommend keeping current values to avoid rule bending and
> future frustrations.
>
>
>
> That we allow this incompatible change now does not mean that we are
> categorically allowing more incompatible changes in the future. My point is
> that we should in all instances evaluate the merit of any incompatible
> change on a case-by-case basis. This is not an exceptional circumstance -
> we've made incompatible changes in the past when appropriate, e.g. breaking
> some clients to address a security issue. I and others believe that in this
> case the benefits greatly outweigh the downsides of changing this back to
> what it 

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread Eric Yang
Hi Aaron,

Correct me if I am wrong, the port change is only required when making a new 
cluster due to the default value.  Existing cluster does not need to make the 
switch, if Hadoop configuration contains user defined number. Ambari, and 
Cloudera Manager are already handling user defined ports correctly.  Some QA 
tools may need to change, but it is a good exercise to run on non-standard 
port.  I gave my vote to keep the setting, and fully respect the community’s 
decision in this matter.

Regards,
Eric

From: <a...@cloudera.com> on behalf of "Aaron T. Myers" <a...@apache.org>
Date: Tuesday, January 9, 2018 at 9:22 PM
To: Eric Yang <ey...@hortonworks.com>
Cc: Chris Douglas <cdoug...@apache.org>, larry mccay <lmc...@apache.org>, 
Hadoop Common <common-dev@hadoop.apache.org>
Subject: Re: When are incompatible changes acceptable (HDFS-12990)

On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang 
<ey...@hortonworks.com<mailto:ey...@hortonworks.com>> wrote:
While I agree the original port change was unnecessary, I don’t think Hadoop NN 
port change is a bad thing.

I worked for a Hadoop distro that NN RPC port was default to port 9000.  When 
we migrate from BigInsights to IOP and now to HDP, we have to move customer 
Hive metadata to new NN RPC port.  It only took one developer (myself) to write 
the tool for the migration.  The incurring workload is not as bad as most 
people anticipated because Hadoop depends on configuration file for referencing 
namenode.  Most of the code can work transparently.  It helped to harden the 
downstream testing tools to be more robust.

While there are of course ways to deal with this, the question really should be 
whether or not it's a desirable thing to do to our users.


We will never know how many people are actively working on Hadoop 3.0.0.  
Perhaps, couple hundred developers or thousands.

You're right that we can't know for sure, but I strongly suspect that this is a 
substantial overestimate. Given how conservative Hadoop operators tend to be, I 
view it as exceptionally unlikely that many deployments have been created on or 
upgraded to Hadoop 3.0.0 since it was released less than a month ago.

Further, I hope you'll agree that the number of 
users/developers/deployments/applications which are currently on Hadoop 2.x is 
*vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so quickly. 
When all of those users upgrade to any 3.x version, they will encounter this 
needless incompatible change and be forced to work around it.

I think the switch back may have saved few developers work, but there could be 
more people getting impacted at unexpected minor release change in the future.  
I recommend keeping current values to avoid rule bending and future 
frustrations.

That we allow this incompatible change now does not mean that we are 
categorically allowing more incompatible changes in the future. My point is 
that we should in all instances evaluate the merit of any incompatible change 
on a case-by-case basis. This is not an exceptional circumstance - we've made 
incompatible changes in the past when appropriate, e.g. breaking some clients 
to address a security issue. I and others believe that in this case the 
benefits greatly outweigh the downsides of changing this back to what it has 
always been.

Best,
Aaron


Regards,
Eric

On 1/9/18, 11:21 AM, "Chris Douglas" 
<cdoug...@apache.org<mailto:cdoug...@apache.org>> wrote:

Particularly since 9820 isn't in the contiguous range of ports in
HDFS-9427, is there any value in this change?

Let's change it back to prevent the disruption to users, but
downstream projects should treat this as a bug in their tests. Please
open JIRAs in affected projects. -C


On Tue, Jan 9, 2018 at 5:18 AM, larry mccay 
<lmc...@apache.org<mailto:lmc...@apache.org>> wrote:
> On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers 
<a...@apache.org<mailto:a...@apache.org>> wrote:
>
>> Thanks a lot for the response, Larry. Comments inline.
>>
>> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay 
<lmc...@apache.org<mailto:lmc...@apache.org>> wrote:
>>
>>> Question...
>>>
>>> Can this be addressed in some way during or before upgrade that allows 
it
>>> to only affect new installs?
>>> Even a config based workaround prior to upgrade might make this a change
>>> less disruptive.
>>>
>>> If part of the upgrade process includes a step (maybe even a script) to
>>> set the NN RPC port explicitly beforehand then it would allow existing
>>> deployments and related clients to remain whole - otherwise it will 
uptake
>>> the new default port.
>>>
>>
>> Perhaps something like this could be done,

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-09 Thread Aaron T. Myers
On Tue, Jan 9, 2018 at 3:15 PM, Eric Yang  wrote:

> While I agree the original port change was unnecessary, I don’t think
> Hadoop NN port change is a bad thing.
>
> I worked for a Hadoop distro that NN RPC port was default to port 9000.
> When we migrate from BigInsights to IOP and now to HDP, we have to move
> customer Hive metadata to new NN RPC port.  It only took one developer
> (myself) to write the tool for the migration.  The incurring workload is
> not as bad as most people anticipated because Hadoop depends on
> configuration file for referencing namenode.  Most of the code can work
> transparently.  It helped to harden the downstream testing tools to be more
> robust.
>

While there are of course ways to deal with this, the question really
should be whether or not it's a desirable thing to do to our users.


>
> We will never know how many people are actively working on Hadoop 3.0.0.
> Perhaps, couple hundred developers or thousands.


You're right that we can't know for sure, but I strongly suspect that this
is a substantial overestimate. Given how conservative Hadoop operators tend
to be, I view it as exceptionally unlikely that many deployments have been
created on or upgraded to Hadoop 3.0.0 since it was released less than a
month ago.

Further, I hope you'll agree that the number of
users/developers/deployments/applications which are currently on Hadoop 2.x
is *vastly* greater than anyone who might have jumped on Hadoop 3.0.0 so
quickly. When all of those users upgrade to any 3.x version, they will
encounter this needless incompatible change and be forced to work around it.


> I think the switch back may have saved few developers work, but there
> could be more people getting impacted at unexpected minor release change in
> the future.  I recommend keeping current values to avoid rule bending and
> future frustrations.
>

That we allow this incompatible change now does not mean that we are
categorically allowing more incompatible changes in the future. My point is
that we should in all instances evaluate the merit of any incompatible
change on a case-by-case basis. This is not an exceptional circumstance -
we've made incompatible changes in the past when appropriate, e.g. breaking
some clients to address a security issue. I and others believe that in this
case the benefits greatly outweigh the downsides of changing this back to
what it has always been.

Best,
Aaron


>
> Regards,
> Eric
>
> On 1/9/18, 11:21 AM, "Chris Douglas"  wrote:
>
> Particularly since 9820 isn't in the contiguous range of ports in
> HDFS-9427, is there any value in this change?
>
> Let's change it back to prevent the disruption to users, but
> downstream projects should treat this as a bug in their tests. Please
> open JIRAs in affected projects. -C
>
>
> On Tue, Jan 9, 2018 at 5:18 AM, larry mccay  wrote:
> > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers 
> wrote:
> >
> >> Thanks a lot for the response, Larry. Comments inline.
> >>
> >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay 
> wrote:
> >>
> >>> Question...
> >>>
> >>> Can this be addressed in some way during or before upgrade that
> allows it
> >>> to only affect new installs?
> >>> Even a config based workaround prior to upgrade might make this a
> change
> >>> less disruptive.
> >>>
> >>> If part of the upgrade process includes a step (maybe even a
> script) to
> >>> set the NN RPC port explicitly beforehand then it would allow
> existing
> >>> deployments and related clients to remain whole - otherwise it
> will uptake
> >>> the new default port.
> >>>
> >>
> >> Perhaps something like this could be done, but I think there are
> downsides
> >> to anything like this. For example, I'm sure there are plenty of
> >> applications written on top of Hadoop that have tests which
> hard-code the
> >> port number. Nothing we do in a setup script will help here. If we
> don't
> >> change the default port back to what it was, these tests will
> likely all
> >> have to be updated.
> >>
> >>
> >
> > I may not have made my point clear enough.
> > What I meant to say is to fix the default port but direct folks to
> > explicitly set the port they are using in a deployment (the current
> > default) so that it doesn't change out from under them - unless they
> are
> > fine with it changing.
> >
> >
> >>
> >>> Meta note: we shouldn't be so pedantic about policy that we can't
> back
> >>> out something that is considered a bug or even mistake.
> >>>
> >>
> >> This is my bigger point. Rigidly adhering to the compat guidelines
> in this
> >> instance helps almost no one, while hurting many folks.
> >>
> >> We basically made a mistake when we decided to change the default
> NN port
>  

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-09 Thread Eric Yang
While I agree the original port change was unnecessary, I don’t think Hadoop NN 
port change is a bad thing.

I worked for a Hadoop distro that NN RPC port was default to port 9000.  When 
we migrate from BigInsights to IOP and now to HDP, we have to move customer 
Hive metadata to new NN RPC port.  It only took one developer (myself) to write 
the tool for the migration.  The incurring workload is not as bad as most 
people anticipated because Hadoop depends on configuration file for referencing 
namenode.  Most of the code can work transparently.  It helped to harden the 
downstream testing tools to be more robust.

We will never know how many people are actively working on Hadoop 3.0.0.  
Perhaps, couple hundred developers or thousands.  I think the switch back may 
have saved few developers work, but there could be more people getting impacted 
at unexpected minor release change in the future.  I recommend keeping current 
values to avoid rule bending and future frustrations.

Regards,
Eric

On 1/9/18, 11:21 AM, "Chris Douglas"  wrote:

Particularly since 9820 isn't in the contiguous range of ports in
HDFS-9427, is there any value in this change?

Let's change it back to prevent the disruption to users, but
downstream projects should treat this as a bug in their tests. Please
open JIRAs in affected projects. -C


On Tue, Jan 9, 2018 at 5:18 AM, larry mccay  wrote:
> On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers  wrote:
>
>> Thanks a lot for the response, Larry. Comments inline.
>>
>> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay  wrote:
>>
>>> Question...
>>>
>>> Can this be addressed in some way during or before upgrade that allows 
it
>>> to only affect new installs?
>>> Even a config based workaround prior to upgrade might make this a change
>>> less disruptive.
>>>
>>> If part of the upgrade process includes a step (maybe even a script) to
>>> set the NN RPC port explicitly beforehand then it would allow existing
>>> deployments and related clients to remain whole - otherwise it will 
uptake
>>> the new default port.
>>>
>>
>> Perhaps something like this could be done, but I think there are 
downsides
>> to anything like this. For example, I'm sure there are plenty of
>> applications written on top of Hadoop that have tests which hard-code the
>> port number. Nothing we do in a setup script will help here. If we don't
>> change the default port back to what it was, these tests will likely all
>> have to be updated.
>>
>>
>
> I may not have made my point clear enough.
> What I meant to say is to fix the default port but direct folks to
> explicitly set the port they are using in a deployment (the current
> default) so that it doesn't change out from under them - unless they are
> fine with it changing.
>
>
>>
>>> Meta note: we shouldn't be so pedantic about policy that we can't back
>>> out something that is considered a bug or even mistake.
>>>
>>
>> This is my bigger point. Rigidly adhering to the compat guidelines in 
this
>> instance helps almost no one, while hurting many folks.
>>
>> We basically made a mistake when we decided to change the default NN port
>> with little upside, even between major versions. We discovered this very
>> quickly, and we have an opportunity to fix it now and in so doing likely
>> disrupt very, very few users and downstream applications. If we don't
>> change it, we'll be causing difficulty for our users, downstream
>> developers, and ourselves, potentially for years.
>>
>
> Agreed.
>
>
>>
>> Best,
>> Aaron
>>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org





Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-09 Thread Chris Douglas
Particularly since 9820 isn't in the contiguous range of ports in
HDFS-9427, is there any value in this change?

Let's change it back to prevent the disruption to users, but
downstream projects should treat this as a bug in their tests. Please
open JIRAs in affected projects. -C


On Tue, Jan 9, 2018 at 5:18 AM, larry mccay  wrote:
> On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers  wrote:
>
>> Thanks a lot for the response, Larry. Comments inline.
>>
>> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay  wrote:
>>
>>> Question...
>>>
>>> Can this be addressed in some way during or before upgrade that allows it
>>> to only affect new installs?
>>> Even a config based workaround prior to upgrade might make this a change
>>> less disruptive.
>>>
>>> If part of the upgrade process includes a step (maybe even a script) to
>>> set the NN RPC port explicitly beforehand then it would allow existing
>>> deployments and related clients to remain whole - otherwise it will uptake
>>> the new default port.
>>>
>>
>> Perhaps something like this could be done, but I think there are downsides
>> to anything like this. For example, I'm sure there are plenty of
>> applications written on top of Hadoop that have tests which hard-code the
>> port number. Nothing we do in a setup script will help here. If we don't
>> change the default port back to what it was, these tests will likely all
>> have to be updated.
>>
>>
>
> I may not have made my point clear enough.
> What I meant to say is to fix the default port but direct folks to
> explicitly set the port they are using in a deployment (the current
> default) so that it doesn't change out from under them - unless they are
> fine with it changing.
>
>
>>
>>> Meta note: we shouldn't be so pedantic about policy that we can't back
>>> out something that is considered a bug or even mistake.
>>>
>>
>> This is my bigger point. Rigidly adhering to the compat guidelines in this
>> instance helps almost no one, while hurting many folks.
>>
>> We basically made a mistake when we decided to change the default NN port
>> with little upside, even between major versions. We discovered this very
>> quickly, and we have an opportunity to fix it now and in so doing likely
>> disrupt very, very few users and downstream applications. If we don't
>> change it, we'll be causing difficulty for our users, downstream
>> developers, and ourselves, potentially for years.
>>
>
> Agreed.
>
>
>>
>> Best,
>> Aaron
>>

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-09 Thread larry mccay
On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers  wrote:

> Thanks a lot for the response, Larry. Comments inline.
>
> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay  wrote:
>
>> Question...
>>
>> Can this be addressed in some way during or before upgrade that allows it
>> to only affect new installs?
>> Even a config based workaround prior to upgrade might make this a change
>> less disruptive.
>>
>> If part of the upgrade process includes a step (maybe even a script) to
>> set the NN RPC port explicitly beforehand then it would allow existing
>> deployments and related clients to remain whole - otherwise it will uptake
>> the new default port.
>>
>
> Perhaps something like this could be done, but I think there are downsides
> to anything like this. For example, I'm sure there are plenty of
> applications written on top of Hadoop that have tests which hard-code the
> port number. Nothing we do in a setup script will help here. If we don't
> change the default port back to what it was, these tests will likely all
> have to be updated.
>
>

I may not have made my point clear enough.
What I meant to say is to fix the default port but direct folks to
explicitly set the port they are using in a deployment (the current
default) so that it doesn't change out from under them - unless they are
fine with it changing.


>
>> Meta note: we shouldn't be so pedantic about policy that we can't back
>> out something that is considered a bug or even mistake.
>>
>
> This is my bigger point. Rigidly adhering to the compat guidelines in this
> instance helps almost no one, while hurting many folks.
>
> We basically made a mistake when we decided to change the default NN port
> with little upside, even between major versions. We discovered this very
> quickly, and we have an opportunity to fix it now and in so doing likely
> disrupt very, very few users and downstream applications. If we don't
> change it, we'll be causing difficulty for our users, downstream
> developers, and ourselves, potentially for years.
>

Agreed.


>
> Best,
> Aaron
>


Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-08 Thread Daniel Templeton
The intent of the compat guidelines is to prevent developers from making 
incompatible "improvements" at inconvenient times.  The guidelines offer 
some wiggle room for the cases where something truly broken is being 
fixed, especially for the sake of compatibility.  I would say that in 
this case, making the change is the right thing to do, but there needs 
to be a plan for how to deal with the fact that the 3.0.0 release will 
forever be an odd outlier with respect to NN ports.


Daniel

On 1/8/18 8:28 PM, Aaron T. Myers wrote:

Thanks a lot for the response, Larry. Comments inline.

On Mon, Jan 8, 2018 at 6:44 PM, larry mccay  wrote:


Question...

Can this be addressed in some way during or before upgrade that allows it
to only affect new installs?
Even a config based workaround prior to upgrade might make this a change
less disruptive.

If part of the upgrade process includes a step (maybe even a script) to
set the NN RPC port explicitly beforehand then it would allow existing
deployments and related clients to remain whole - otherwise it will uptake
the new default port.


Perhaps something like this could be done, but I think there are downsides
to anything like this. For example, I'm sure there are plenty of
applications written on top of Hadoop that have tests which hard-code the
port number. Nothing we do in a setup script will help here. If we don't
change the default port back to what it was, these tests will likely all
have to be updated.



Meta note: we shouldn't be so pedantic about policy that we can't back out
something that is considered a bug or even mistake.


This is my bigger point. Rigidly adhering to the compat guidelines in this
instance helps almost no one, while hurting many folks.

We basically made a mistake when we decided to change the default NN port
with little upside, even between major versions. We discovered this very
quickly, and we have an opportunity to fix it now and in so doing likely
disrupt very, very few users and downstream applications. If we don't
change it, we'll be causing difficulty for our users, downstream
developers, and ourselves, potentially for years.

Best,
Aaron




-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-08 Thread Aaron T. Myers
Thanks a lot for the response, Larry. Comments inline.

On Mon, Jan 8, 2018 at 6:44 PM, larry mccay  wrote:

> Question...
>
> Can this be addressed in some way during or before upgrade that allows it
> to only affect new installs?
> Even a config based workaround prior to upgrade might make this a change
> less disruptive.
>
> If part of the upgrade process includes a step (maybe even a script) to
> set the NN RPC port explicitly beforehand then it would allow existing
> deployments and related clients to remain whole - otherwise it will uptake
> the new default port.
>

Perhaps something like this could be done, but I think there are downsides
to anything like this. For example, I'm sure there are plenty of
applications written on top of Hadoop that have tests which hard-code the
port number. Nothing we do in a setup script will help here. If we don't
change the default port back to what it was, these tests will likely all
have to be updated.


>
> Meta note: we shouldn't be so pedantic about policy that we can't back out
> something that is considered a bug or even mistake.
>

This is my bigger point. Rigidly adhering to the compat guidelines in this
instance helps almost no one, while hurting many folks.

We basically made a mistake when we decided to change the default NN port
with little upside, even between major versions. We discovered this very
quickly, and we have an opportunity to fix it now and in so doing likely
disrupt very, very few users and downstream applications. If we don't
change it, we'll be causing difficulty for our users, downstream
developers, and ourselves, potentially for years.

Best,
Aaron


Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-08 Thread larry mccay
Question...

Can this be addressed in some way during or before upgrade that allows it
to only affect new installs?
Even a config based workaround prior to upgrade might make this a change
less disruptive.

If part of the upgrade process includes a step (maybe even a script) to set
the NN RPC port explicitly beforehand then it would allow existing
deployments and related clients to remain whole - otherwise it will uptake
the new default port.

Meta note: we shouldn't be so pedantic about policy that we can't back out
something that is considered a bug or even mistake.

On Mon, Jan 8, 2018 at 9:17 PM, Aaron T. Myers  wrote:

> Hello all,
>
> Over in HDFS-12990 [1],
> we're having some discussion about whether or not it's ever acceptable to
> make an incompatible change in a minor or dot release. In general this is
> of course undesirable and should be avoided in almost all cases. However, I
> believe that each instance of someone desiring to make an incompatible
> change should be evaluated on a case-by-case basis to consider the costs
> and benefits of making that change. For example, I believe that we've
> historically made incompatible changes in minor or dot releases which would
> break older clients for security reasons.
>
> In this particular case linked above,  I believe that given that Hadoop
> 3.0.0 was just released, and thus very few folks are likely to have
> deployed it, it would benefit a large number of existing deployments and
> downstream applications to change the default NN RPC port number back to
> what it was in all previously-released versions of Apache Hadoop. I'd like
> to make this change in 3.0.1, and there is no question that doing so would
> should be considered an incompatible change between 3.0.0 and 3.0.1.
> However, I believe this incompatible change is warranted given the
> circumstances.
>
> Would like to hear others' thoughts on this.
>
> Thanks,
> Aaron
>
> [1] For some background, it used to be the case that many of Hadoop's
> default service ports were in the ephemeral range. This could potentially
> cause a service to fail to start up on a given host if some other process
> had happened to have already bound to said port. As part of that effort, we
> also changed the default NN RPC port from 8020 to 9820. Even though 8020
> wasn't in the ephemeral range, we moved it to 9820 to be close to the new
> range of the rest of the ports. At the time this change was made, though, I
> and others didn't realize the substantial downsides that doing so would
> introduce, for example the Hive metastore will put full HDFS paths
> including the port into its database, which can be a substantial upgrade
> headache.
>


When are incompatible changes acceptable (HDFS-12990)

2018-01-08 Thread Aaron T. Myers
Hello all,

Over in HDFS-12990 [1],
we're having some discussion about whether or not it's ever acceptable to
make an incompatible change in a minor or dot release. In general this is
of course undesirable and should be avoided in almost all cases. However, I
believe that each instance of someone desiring to make an incompatible
change should be evaluated on a case-by-case basis to consider the costs
and benefits of making that change. For example, I believe that we've
historically made incompatible changes in minor or dot releases which would
break older clients for security reasons.

In this particular case linked above,  I believe that given that Hadoop
3.0.0 was just released, and thus very few folks are likely to have
deployed it, it would benefit a large number of existing deployments and
downstream applications to change the default NN RPC port number back to
what it was in all previously-released versions of Apache Hadoop. I'd like
to make this change in 3.0.1, and there is no question that doing so would
should be considered an incompatible change between 3.0.0 and 3.0.1.
However, I believe this incompatible change is warranted given the
circumstances.

Would like to hear others' thoughts on this.

Thanks,
Aaron

[1] For some background, it used to be the case that many of Hadoop's
default service ports were in the ephemeral range. This could potentially
cause a service to fail to start up on a given host if some other process
had happened to have already bound to said port. As part of that effort, we
also changed the default NN RPC port from 8020 to 9820. Even though 8020
wasn't in the ephemeral range, we moved it to 9820 to be close to the new
range of the rest of the ports. At the time this change was made, though, I
and others didn't realize the substantial downsides that doing so would
introduce, for example the Hive metastore will put full HDFS paths
including the port into its database, which can be a substantial upgrade
headache.