Re: HADOOP-14163 proposal for new hadoop.apache.org

2018-08-31 Thread larry mccay
+1 from me

On Fri, Aug 31, 2018, 5:30 AM Steve Loughran  wrote:

>
>
> > On 31 Aug 2018, at 09:07, Elek, Marton  wrote:
> >
> > Bumping this thread at last time.
> >
> > I have the following proposal:
> >
> > 1. I will request a new git repository hadoop-site.git and import the
> new site to there (which has exactly the same content as the existing site).
> >
> > 2. I will ask infra to use the new repository as the source of
> hadoop.apache.org
> >
> > 3. I will sync manually all of the changes in the next two months back
> to the svn site from the git (release announcements, new committers)
> >
> > IN CASE OF ANY PROBLEM we can switch back to the svn without any problem.
> >
> > If no-one objects within three days, I'll assume lazy consensus and
> start with this plan. Please comment if you have objections.
> >
> > Again: it allows immediate fallback at any time as svn repo will be kept
> as is (+ I will keep it up-to-date in the next 2 months)
> >
> > Thanks,
> > Marton
>
> sounds good to me
>
> +1
>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-31 Thread larry mccay
create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: ” where the
’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?


On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lmc...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <myo...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <myo...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mcc

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-25 Thread larry mccay
Terrific additions, Mike!
I will spin a new revision and incorporate your additions.

#8 is a great topic - given that Hadoop is insecure by default.
Actual movement to Secure by Default would be a challenge both technically
(given the need for kerberos) and discussion-wise.
Asking whether you have considered any settings of configurations that can
be secure by default is an interesting idea.

Can you provide an example though?


On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <myo...@cloudera.com> wrote:

> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lmc...@apache.org> wrote:
>
>> New Revision...
>>
>
> These lists are wonderful. I appreciate the split between the Tech Preview
> and the GA Readiness lists, with the emphasis on the former being "don't
> enable by default" or at least "don't enable if security is on".  I don't
> have any comments on that part.
>
> Additions inline below. If some of the additions are items covered by
> existing frameworks that any code would use, please forgive my ignorance.
> Also, my points aren't as succinct as yours. Feel free to reword.
>
> *GA Readiness Security Audit*
>> At this point, we are merging full or partial security model
>> implementations.
>> Let's inventory what is covered by the model at this point and whether
>> there are future merges required to be full.
>>
>> *1. UIs*
>>
>> 1.1. What sort of validation is being done on any accepted user input?
>> (pointers to code would be appreciated)
>> 1.2. What explicit protections have been built in for (pointers to code
>> would be appreciated):
>>   1.2.1. cross site scripting
>>   1.2.2. cross site request forgery
>>   1.2.3. click jacking (X-Frame-Options)
>>
>
> 1.2.4 If using cookies, is the secure flag for cookies
> <https://www.owasp.org/index.php/SecureFlag> turned on?
>
>
>> 1.3. What sort of authentication is required for access to the UIs?
>>   1.3.1. Kerberos
>> 1.3.1.1. has TGT renewal been accounted for
>> 1.3.1.2. SPNEGO support?
>> 1.3.1.3. Delegation token?
>>   1.3.2. Proxy User ACL?
>> 1.4. What authorization is available for determining who can access what
>> capabilities of the UIs for either viewing, modifying data and/or related
>> processes?
>> 1.5. Is there any input that will ultimately be persisted in
>> configuration for executing shell commands or processes?
>> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
>> 1.7. Is there TLS/SSL support?
>>
>
> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
> 1.7.2 Is it possible to configure support for HTTP Strict Transport
> Security
> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
> (HSTS)?
> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP address
> Z", etc)
>
>
>> *2. REST APIs*
>>
>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>> impersonation capabilities?
>> 2.2. What explicit protections have been built in for:
>>   2.2.1. cross site scripting (XSS)
>>   2.2.2. cross site request forgery (CSRF)
>>   2.2.3. XML External Entity (XXE)
>> 2.3. What is being used for authentication - Hadoop Auth Module?
>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>> endpoints) or are they part of existing processes?
>> 2.5. Is there TLS/SSL support?
>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>> APIs?
>> 2.7. What authorization enforcement points are there within the REST APIs?
>>
>
> The TLS and audit comments above apply here, too.
>
>
>> *3. Encryption*
>>
>> 3.1. Is there any support for encryption of persisted data?
>> 3.2. If so, is KMS and the hadoop key command used for key management?
>> 3.3. KMS interaction with Proxy Users?
>>
>
> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
> any other in computer science. Standard cryptographic libraries should
> always be used. Does this work attempt to create an encryption scheme or
> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
> be dragons. Even normal-looking use of cryptography must be carefully
> reviewed.
> 3.5 If you need random bits for a security purpose, such as for a session
> token or a cryptographic key, you need a cryptographically approved place
> to acquire said bits. Use the SecureRandom class.
>
> *4. Configuration*
>>
>> 4.1. Are there any passwords or secrets being added to configuration?
>> 4.2. If so, are they

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-21 Thread larry mccay
New Revision...

This revision acknowledges the reality that we often have multiple phases
of feature lifecycle and that we need to account for each phase.
It has also been made more generic.
I have created a Tech Preview Security Audit list and a GA Readiness
Security Audit list.
I've also included suggested items into the GA Readiness list.

It has also been suggested that we publish the information as part of docs
so that the state of such features can be easily determined from these
pages. We can discuss this aspect as well.

Thoughts?

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?


--


*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
  1.3.1. Kerberos
1.3.1.1. has TGT renewal been accounted for
1.3.1.2. SPNEGO support?
1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes?
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support?

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?




On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lmc...@apache.org> wrote:

> Hi Marton -
>
> I don't think there is any denying that it would be great to have such
> documentation for all of those reasons.
> If it is a natural extension of getting the checklist information as an
> assertion of security state when merging then we can certainly include it.
>
> I think that backfilling all such information across the project is a
> different topic altogether and wouldn't want to expand the scope of this
> discussion in that direction.
>
> Thanks for the great thoughts on this!
>
> thanks,
>
> --larry
>
>
>
>
>
> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <h...@anzix.net&g

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-21 Thread larry mccay
Hi Marton -

I don't think there is any denying that it would be great to have such
documentation for all of those reasons.
If it is a natural extension of getting the checklist information as an
assertion of security state when merging then we can certainly include it.

I think that backfilling all such information across the project is a
different topic altogether and wouldn't want to expand the scope of this
discussion in that direction.

Thanks for the great thoughts on this!

thanks,

--larry





On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <h...@anzix.net> wrote:

>
>
> On 10/21/2017 02:41 AM, larry mccay wrote:
>
>>
>> "We might want to start a security section for Hadoop wiki for each of the
>>> services and components.
>>> This helps to track what has been completed."
>>>
>>
>> Do you mean to keep the audit checklist for each service and component
>> there?
>> Interesting idea, I wonder what sort of maintenance that implies and
>> whether we want to take on that burden even though it would be great
>> information to have for future reviewers.
>>
>
> I think we should care about the maintenance of the documentation anyway.
> We also need to maintain all the other documentations. I think it could be
> even part of the generated docs and not the wiki.
>
> I also suggest to fill this list about the current trunk/3.0 as a first
> step.
>
> 1. It would be a very usefull documentation for the end-users (some
> answers could link the existing documentation, it exists, but I am not sure
> if all the answers are in the current documentation.)
>
> 2. It would be a good example who the questions could be answered.
>
> 3. It would help to check, if something is missing from the list.
>
> 4. There are future branches where some of the components are not touched.
> For example, no web ui or no REST service. A prefilled list could help to
> check if the branch doesn't break any old security functionality on trunk.
>
> 5. It helps to document the security features in one place. If we have a
> list for the existing functionality in the same format, it would be easy to
> merge the new documentation of the new features as they will be reported in
> the same form. (So it won't be so hard to maintain the list...).
>
> Marton
>


Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-20 Thread larry mccay
Hi Eric -

Thanks for the additional item suggestions!

"We might want to start a security section for Hadoop wiki for each of the
services and components.
This helps to track what has been completed."

Do you mean to keep the audit checklist for each service and component
there?
Interesting idea, I wonder what sort of maintenance that implies and
whether we want to take on that burden even though it would be great
information to have for future reviewers.

"How do we want to enforce security completeness?  Most features will not
meet all security requirements on merge day."

This is a really important question and point.
Maybe we should have started with goals and intents before the actual list.

My high level goals:

1. To have a holistic idea of what a given feature (or merge) is bringing
to the table in terms of attack surface
2. To understand the level of security that intended for the feature in its
endstate (GA)
3. To fully understand the stated level of security that is in place at the
time of each merge
4. To ensure that a merge meets some minimal bar for not adding security
vulnerabilities to deployments of a release or even builds from trunk. Not
the least of which is whether it is enabled by default and what it means to
disabled.
5. To be as unobtrusive to the branch committers as possible while still
communicating what we need for security review.
6. To have a reasonable checklist of security concerns that may or may not
apply to each merge but should be at least thought about in the final
security model design for the particular feature.

I think that feature merges often span multiple branch merges with security
coming in phases or other aspects of the feature.
This intent should maybe be part of the checklist itself so that we can
assess the audit with the level of scrutiny appropriate for the current
merge.

I will work on another revision of the list and incorporate your
suggestions as well.

thanks!

--larry

On Fri, Oct 20, 2017 at 7:42 PM, Eric Yang <ey...@hortonworks.com> wrote:

> The check list looks good.  Some more items to add:
>
> Kerberos
>   TGT renewal
>   SPNEGO support
>   Delegation token
> Proxy User ACL
>
> CVE tracking list
>
> We might want to start a security section for Hadoop wiki for each of the
> services and components.
> This helps to track what has been completed.
>
> How do we want to enforce security completeness?  Most features will not
> meet all security requirements on merge day.
>
> Regards,
> Eric
>
> On 10/20/17, 12:41 PM, "larry mccay" <lmc...@apache.org> wrote:
>
> Adding security@hadoop list as well...
>
> On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lmc...@apache.org>
> wrote:
>
> > All -
> >
> > Given the maturity of Hadoop at this point, I would like to propose
> that
> > we start doing explicit security audits of features at merge time.
> >
> > There are a few reasons that I think this is a good place/time to do
> the
> > review:
> >
> > 1. It represents a specific snapshot of where the feature stands as a
> > whole. This means that we can more easily identity the attack
> surface of a
> > given feature.
> > 2. We can identify any security gaps that need to be fixed before a
> > release that carries the feature can be considered ready.
> > 3. We - in extreme cases - can block a feature from merging until
> some
> > baseline of security coverage is achieved.
> > 4. The folks that are interested and able to review security aspects
> can't
> > scale for every iteration over every JIRA but can review the
> checklist and
> > follow pointers for specific areas of interest.
> >
> > I have provided an impromptu security audit checklist on the DISCUSS
> > thread for merging Ozone - HDFS-7240 into trunk.
> >
> > I don't want to pick on it particularly but I think it is a good way
> to
> > bootstrap this audit process and figure out how to incorporate it
> without
> > being too intrusive.
> >
> > The questions that I provided below are a mix of general questions
> that
> > could be on a standard checklist that you provide along with the
> merge
> > thread and some that are specific to what I read about ozone in the
> > excellent docs provided. So, we should consider some subset of the
> > following as a proposal for a general checklist.
> >
> > Perhaps, a shared document can be created to iterate over the list
> to fine
> > tune it?
> >
> > Any thoughts on this, any additional datapoints to collect, etc?
> >
> >

Re: 答复: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-10-20 Thread larry mccay
All -

I broke this list of questions out into a separate DISCUSS thread where we
can iterate over how a security audit process at merge time might look and
whether it is even something that we want to take on.

I will try and continue discussion on that thread and drive that to some
conclusion before bringing it into any particular merge discussion.

thanks,

--larry

On Fri, Oct 20, 2017 at 12:37 PM, larry mccay <lmc...@apache.org> wrote:

> I previously sent this same email from my work email and it doesn't seem
> to have gone through - resending from apache account (apologizing up from
> for the length)
>
> For such sizable merges in Hadoop, I would like to start doing security
> audits in order to have an initial idea of the attack surface, the
> protections available for known threats, what sort of configuration is
> being used to launch processes, etc.
>
> I dug into the architecture documents while in the middle of this list -
> nice docs!
> I do intend to try and make a generic check list like this for such
> security audits in the future so a lot of this is from that but I tried to
> also direct specific questions from those docs as well.
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>
> On Fri, Oct 20, 2017 at 11:19 AM, Anu Engineer <aengin...@hortonworks.com>
> wrote:
>
>> Hi Steve,
>>
>> In addition to everything Weiwei mentioned (chapter 3 of user guide), if
>> you really want to drill down to REST protocol you might want to apply this
>> patch and build ozone.
>>
>> https://issues.a

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-20 Thread larry mccay
Adding security@hadoop list as well...

On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lmc...@apache.org> wrote:

> All -
>
> Given the maturity of Hadoop at this point, I would like to propose that
> we start doing explicit security audits of features at merge time.
>
> There are a few reasons that I think this is a good place/time to do the
> review:
>
> 1. It represents a specific snapshot of where the feature stands as a
> whole. This means that we can more easily identity the attack surface of a
> given feature.
> 2. We can identify any security gaps that need to be fixed before a
> release that carries the feature can be considered ready.
> 3. We - in extreme cases - can block a feature from merging until some
> baseline of security coverage is achieved.
> 4. The folks that are interested and able to review security aspects can't
> scale for every iteration over every JIRA but can review the checklist and
> follow pointers for specific areas of interest.
>
> I have provided an impromptu security audit checklist on the DISCUSS
> thread for merging Ozone - HDFS-7240 into trunk.
>
> I don't want to pick on it particularly but I think it is a good way to
> bootstrap this audit process and figure out how to incorporate it without
> being too intrusive.
>
> The questions that I provided below are a mix of general questions that
> could be on a standard checklist that you provide along with the merge
> thread and some that are specific to what I read about ozone in the
> excellent docs provided. So, we should consider some subset of the
> following as a proposal for a general checklist.
>
> Perhaps, a shared document can be created to iterate over the list to fine
> tune it?
>
> Any thoughts on this, any additional datapoints to collect, etc?
>
> thanks!
>
> --larry
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>


[DISCUSS] Feature Branch Merge and Security Audits

2017-10-20 Thread larry mccay
All -

Given the maturity of Hadoop at this point, I would like to propose that we
start doing explicit security audits of features at merge time.

There are a few reasons that I think this is a good place/time to do the
review:

1. It represents a specific snapshot of where the feature stands as a
whole. This means that we can more easily identity the attack surface of a
given feature.
2. We can identify any security gaps that need to be fixed before a release
that carries the feature can be considered ready.
3. We - in extreme cases - can block a feature from merging until some
baseline of security coverage is achieved.
4. The folks that are interested and able to review security aspects can't
scale for every iteration over every JIRA but can review the checklist and
follow pointers for specific areas of interest.

I have provided an impromptu security audit checklist on the DISCUSS thread
for merging Ozone - HDFS-7240 into trunk.

I don't want to pick on it particularly but I think it is a good way to
bootstrap this audit process and figure out how to incorporate it without
being too intrusive.

The questions that I provided below are a mix of general questions that
could be on a standard checklist that you provide along with the merge
thread and some that are specific to what I read about ozone in the
excellent docs provided. So, we should consider some subset of the
following as a proposal for a general checklist.

Perhaps, a shared document can be created to iterate over the list to fine
tune it?

Any thoughts on this, any additional datapoints to collect, etc?

thanks!

--larry

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data or affecting
object stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
headers?
1.6. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is
dependent on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
paging available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what
does this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only
bucket owners can read and write - but there are ACL APIs on the Bucket
Level are they meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still
an open issue?


Re: 答复: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-10-20 Thread larry mccay
I previously sent this same email from my work email and it doesn't seem to
have gone through - resending from apache account (apologizing up from for
the length)

For such sizable merges in Hadoop, I would like to start doing security
audits in order to have an initial idea of the attack surface, the
protections available for known threats, what sort of configuration is
being used to launch processes, etc.

I dug into the architecture documents while in the middle of this list -
nice docs!
I do intend to try and make a generic check list like this for such
security audits in the future so a lot of this is from that but I tried to
also direct specific questions from those docs as well.

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data or affecting
object stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
headers?
1.6. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is
dependent on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
paging available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what
does this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only
bucket owners can read and write - but there are ACL APIs on the Bucket
Level are they meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still
an open issue?

On Fri, Oct 20, 2017 at 11:19 AM, Anu Engineer 
wrote:

> Hi Steve,
>
> In addition to everything Weiwei mentioned (chapter 3 of user guide), if
> you really want to drill down to REST protocol you might want to apply this
> patch and build ozone.
>
> https://issues.apache.org/jira/browse/HDFS-12690
>
> This will generate an Open API (https://www.openapis.org ,
> http://swagger.io) based specification which can be accessed from KSM UI
> or just as a json file.
> Unfortunately, this patch is still at code review stage, so you will have
> to apply the patch and build it yourself.
>
> Thanks
> Anu
>
>
> On 10/20/17, 6:09 AM, "Yang Weiwei"  wrote:
>
> Hi Steve
>
>
> The code is available in HDFS-7240 feature branch, public git repo
> here.
>
> I am not sure if there is a "public" API for object stores, but the
> design doc 12799549/ozone_user_v0.pdf> uses most common syntax so I believe it
> should be compliance. You can find the rest API doc here<
> https://github.com/apache/hadoop/blob/HDFS-7240/
> 

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-10-20 Thread Larry McCay
For such sizable merges in Hadoop, I would like to start doing security audits 
in order to have an initial idea of the attack surface, the protections 
available for known threats, what sort of configuration is being used to launch 
processes, etc.

I dug into the architecture documents while in the middle of this list - nice 
docs!
I do intend to try and make a generic check list like this for such security 
audits in the future so a lot of this is from that but I tried to also direct 
specific questions from those docs as well.

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space 
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input? 
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code would 
be appreciated):
   1.2.1. cross site scripting
   1.2.2. cross site request forgery 
   1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what 
capabilities of the UIs for either viewing, modifying data or affecting object 
stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded headers?
1.6. Is there any input that will ultimately be persisted in configuration for 
executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas impersonation 
capabilities?
2.2. What explicit protections have been built in for:
   2.2.1. cross site scripting (XSS)
   2.2.2. cross site request forgery (CSRF)
   2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST 
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what 
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is dependent 
on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there paging 
available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what does 
this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only bucket 
owners can read and write - but there are ACL APIs on the Bucket Level are they 
meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I 
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for 
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or shell 
out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still an 
open issue?

> On Oct 20, 2017, at 7:49 AM, Steve Loughran  wrote:
> 
> 
> Wow, big piece of work
> 
> 1. Where is a PR/branch on github with rendered docs for us to look at?
> 2. Have you made any public APi changes related to object stores? That's 
> probably something I'll have opinions on more than implementation details.
> 
> thanks
> 
>> On 19 Oct 2017, at 02:54, Yang Weiwei  wrote:
>> 
>> Hello everyone,
>> 
>> 
>> I would like to start this thread to discuss merging Ozone (HDFS-7240) to 
>> trunk. This feature implements an object store which can co-exist with HDFS. 
>> Ozone is disabled by default. We have tested Ozone with cluster sizes 
>> varying from 1 to 100 data nodes.
>> 
>> 
>> 
>> The merge payload includes the following:
>> 
>> 1.  All services, management scripts
>> 2.  Object store APIs, exposed via both REST and RPC
>> 3.  Master service UIs, command line interfaces
>> 4.  Pluggable pipeline Integration
>> 5.  Ozone File System (Hadoop compatible file system implementation, passes 
>> all FileSystem contract tests)
>> 6.  Corona - a load generator for Ozone.
>> 7.  Essential documentation added to Hadoop site.
>> 8.  Version specific Ozone Documentation, accessible via service UI.
>> 9.  Docker support for ozone, which enables faster development cycles.
>> 
>> 
>> To build Ozone and run ozone 

Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread larry mccay
Hi Jonathan -

Thank you for bringing this up for discussion!

I would personally like to see a specific security review of features like
this - especially ones that allow for remote access to configuration.
I'll take a look at the JIRA and see whether I can come up with any
concerns or questions and I would urge others to give it a pass from a
security perspective as well.

In addition, here are a couple questions of the top of my head:

Is this feature extending the existing YARM RM REST API?
When it isn't enabled what is the API behavior?
Does it implement the trusted proxy pattern for proxies to be able to
impersonate users and most importantly to dictate what proxies would be
allowed to impersonate an admin for this API - which I assume will be
required?

--larry

On Fri, Sep 29, 2017 at 2:44 PM, Andrew Wang 
wrote:

> Hi Jonathan,
>
> I'm okay with putting this into branch-3.0 for GA if it can be merged
> within the next two weeks. Even though beta1 has slipped by a month, I want
> to stick to the targeted GA data of Nov 1st as much as possible. Of course,
> let's not sacrifice quality or stability for speed; if something's not
> ready, let's defer it to 3.1.0.
>
> Subru, have you been able to review this feature from the 2.9.0
> perspective? It'd add confidence if you think it's immediately ready for
> merging to branch-2 for 2.9.0.
>
> Thanks,
> Andrew
>
> On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
> wrote:
>
> > Hi everyone,
> >
> > Starting this thread to discuss merging API-based scheduler configuration
> > to trunk/branch-2. The feature adds the framework for allowing users to
> > modify scheduler configuration via REST or CLI using a configurable
> backend
> > (leveldb/zk are currently supported), and adds capacity scheduler support
> > for this. The umbrella JIRA is YARN-5734. All the required work for this
> > feature is done and committed to branch YARN-5734, and a full diff has
> been
> > generated at YARN-7241.
> >
> > Regarding compatibility, this feature is configurable and turned off by
> > default.
> >
> > The feature has been tested locally on a couple RMs (since it is an RM
> > only change), with queue addition/removal/updates tested on single RM
> > (leveldb) and two RMs (zk). Also we verified the original configuration
> > update mechanism (via refreshQueues) is unaffected when the feature is
> > off/not configured.
> >
> > Our original plan was to merge this to trunk (which is what the YARN-7241
> > diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
> > what are your thoughts on also merging this to branch-3.0?
> >
> > Thanks!
> >
> > Jonathan Hung
> >
>


Re: [DISCUSS] Looking to Apache Hadoop 3.1 release

2017-09-06 Thread larry mccay
Hi Wangda -

Thank you for starting this conversation!

+1000 for a faster release cadence.
Quicker releases make turning around security fixes so much easier.

When we consider alpha features, let’s please ensure that they are not
delivered in a state that has known security issues and also make sure that
they are disabled by default. IMO - it is not a feature - alpha or
otherwise - unless it has some reasonable assurance of being secure. Please
don't see this as calling out any particular feature. I just think we need
to be very explicit about security expectations. Maybe this is already well
understood.

Thank you for this proposed plan and for volunteering!

—larry

On Wed, Sep 6, 2017 at 7:22 PM, Anu Engineer 
wrote:

> Hi Wangda,
>
> We are planning to start the Ozone merge discussion by the end of this
> month. I am hopeful that it will be merged pretty soon after that.
> Please add Ozone to the list of features that are being tracked for Apache
> Hadoop 3.1.
>
> We would love to release Ozone as an alpha feature in Hadoop 3.1.
>
> Thanks
> Anu
>
>
> On 9/6/17, 2:26 PM, "Arun Suresh"  wrote:
>
> >Thanks for starting this Wangda.
> >
> >I would also like to add:
> >- YARN-5972: Support Pausing/Freezing of opportunistic containers
> >
> >Cheers
> >-Arun
> >
> >On Wed, Sep 6, 2017 at 1:49 PM, Steve Loughran 
> >wrote:
> >
> >>
> >> > On 6 Sep 2017, at 19:13, Wangda Tan  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > As we discussed on [1], there were proposals from Steve / Vinod etc to
> >> have
> >> > a faster cadence of releases and to start thinking of a Hadoop 3.1
> >> release
> >> > earlier than March 2018 as is currently proposed.
> >> >
> >> > I think this is a good idea. I'd like to start the process sooner, and
> >> > establish timeline etc so that we can be ready when 3.0.0 GA is out.
> With
> >> > this we can also establish faster cadence for future Hadoop 3.x
> releases.
> >> >
> >> > To this end, I propose to target Hadoop 3.1.0 for a release by mid Jan
> >> > 2018. (About 4.5 months from now and 2.5 months after 3.0-GA, instead
> of
> >> > 6.5 months from now).
> >> >
> >> > I'd also want to take this opportunity to come up with a more
> elaborate
> >> > release plan to avoid some of the confusion we had with 3.0 beta.
> General
> >> > proposal for the timeline (per this other proposal [2])
> >> > - Feature freeze date - all features should be merged by Dec 15, 2017.
> >> > - Code freeze date - blockers/critical only, no more improvements and
> non
> >> > blocker/critical bug-fixes: Jan 1, 2018.
> >> > - Release date: Jan 15, 2018
> >> >
> >> > Following is a list of features on my radar which could be candidates
> >> for a
> >> > 3.1 release:
> >> > - YARN-5734, Dynamic scheduler queue configuration. (Owner: Jonathan
> >> Hung)
> >> > - YARN-5881, Add absolute resource configuration to CapacityScheduler.
> >> > (Owner: Sunil)
> >> > - YARN-5673, Container-executor rewrite for better security,
> >> extensibility
> >> > and portability. (Owner Varun Vasudev)
> >> > - YARN-6223, GPU isolation. (Owner: Wangda)
> >> >
> >> > And from email [3] mentioned by Andrew, there’re several other HDFS
> >> > features want to be released with 3.1 as well, assuming they fit the
> >> > timelines:
> >> > - Storage Policy Satisfier
> >> > - HDFS tiered storage
> >> >
> >> > Please let me know if I missed any features targeted to 3.1 per this
> >> > timeline.
> >>
> >>
> >> HADOOP-13786 : S3Guard committer, which also adds resilience to failures
> >> talking to S3 (we barely have any today),
> >>
> >> >
> >> > And I want to volunteer myself as release manager of 3.1.0 release.
> >> Please
> >> > let me know if you have any suggestions/concerns.
> >>
> >> well volunteered :)
> >>
> >> >
> >> > Thanks,
> >> > Wangda Tan
> >> >
> >> > [1] http://markmail.org/message/hwar5f5ap654ck5o?q=
> >> > Branch+merges+and+3%2E0%2E0-beta1+scope
> >> > [2] http://markmail.org/message/hwar5f5ap654ck5o?q=Branch+
> >> > merges+and+3%2E0%2E0-beta1+scope#query:Branch%20merges%
> >> > 20and%203.0.0-beta1%20scope+page:1+mid:2hqqkhl2dymcikf5+state:results
> >> > [3] http://markmail.org/message/h35obzqrh3ag6dgn?q=Branch+merge
> >> > s+and+3%2E0%2E0-beta1+scope
>


Re: Apache Hadoop 2.8.2 Release Plan

2017-09-01 Thread larry mccay
If we do "fix" this in 2.8.2 we should seriously consider not doing so in
3.0.
This is a very poor practice.

I can see an argument for backward compatibility in 2.8.x line though.

On Fri, Sep 1, 2017 at 1:41 PM, Steve Loughran 
wrote:

> One thing we need to consider is
>
> HADOOP-14439: regression: secret stripping from S3x URIs breaks some
> downstream code
>
> Hadoop 2.8 has a best-effort attempt to strip out secrets from the
> toString() value of an s3a or s3n path where someone has embedded them in
> the URI; this has caused problems in some uses, specifically: when people
> use secrets this way (bad) and assume that you can round trip paths to
> string and back
>
> Should we fix this? If so, Hadoop 2.8.2 is the time to do it
>
>
> > On 1 Sep 2017, at 11:14, Junping Du  wrote:
> >
> > HADOOP-14814 get committed and HADOOP-9747 get push out to 2.8.3, so we
> are clean on blocker/critical issues now.
> > I finish practice of going through JACC report and no more incompatible
> public API changes get found between 2.8.2 and 2.7.4. Also I check commit
> history and fixed 10+ commits which are missing from branch-2.8.2 for some
> reason. So, the current branch-2.8.2 should be good to go for RC stage, and
> I will kick off our first RC tomorrow.
> > In the meanwhile, please don't land any commits to branch-2.8.2 since
> now. If some issues really belong to blocker, please ping me on the JIRA
> before doing any commits. branch-2.8 is still open for landing. Thanks for
> your cooperation!
> >
> >
> > Thanks,
> >
> > Junping
> >
> > 
> > From: Junping Du 
> > Sent: Wednesday, August 30, 2017 12:35 AM
> > To: Brahma Reddy Battula; common-...@hadoop.apache.org;
> hdfs-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> > Subject: Re: Apache Hadoop 2.8.2 Release Plan
> >
> > Thanks Brahma for comment on this thread. To be clear, I always update
> branch version just before RC kicking off.
> >
> > For 2.8.2 release, I don't have plan to involve big top or other
> third-party test tools. As always, we will rely on test/verify efforts from
> community especially from large deployed production cluster - as far as I
> know,  there are already several companies. like: Yahoo!, Alibaba, etc.
> already deploy 2.8 release in large production clusters for months which
> give me more confidence on 2.8.2.
> >
> >
> > Here is more update on 2.8.2 release:
> >
> > Blocker issues:
> >
> >-  A new blocker YARN-7076 get reported and fixed by Jian He through
> last weekend.
> >
> >-  Another new blocker - HADOOP-14814 get identified from my latest
> jdiff run against 2.7.4. The simple fix on an incompatible API change
> should get commit soon.
> >
> >
> > Critical issues:
> >
> >-  YARN-7083 already get committed. Thanks Jason for reporting the
> issue and delivering the fix.
> >
> >-  YARN-6091 get push out from 2.8.2 as issue is not a regression and
> pending for a while.
> >
> >-  Daryn is actively working on HADOOP-9747 for a while, and the
> patch are getting close to be committed. However, according to Daryn, the
> patch seems to cause some regression in some corner cases in secured
> environment (Storm auto tgt, etc.). May need some additional watch/review
> on this JIRA's fixes.
> >
> >
> >
> > My monitoring JACC report between 2.8.2 and 2.7.4 will get finish
> tomorrow. If everything is going smoothly, I am planning to kick off RC0
> around holiday (this weekend).
> >
> >
> >
> > Thanks,
> >
> >
> >
> > ​Junping
> >
> >
> >
> > 
> > From: Brahma Reddy Battula 
> > Sent: Tuesday, August 29, 2017 8:42 AM
> > To: Junping Du; common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> > Subject: Re: Apache Hadoop 2.8.2 Release Plan
> >
> >
> > Hi All
> >
> > Update on 2.8.2 release status
> > we are down to 3 critical issues ( YARN-6091,YARN-7083,HADOOP-9747),all
> are patch available and closer to commit.
> > Junping is closing tracking this.
> >
> > Todo:
> >
> > 1) Update pom.xml ..?  currently it's with 2.8.3
> > https://github.com/apache/hadoop/blob/branch-2.8.2/pom.xml#L21
> > 2) Wiki
> is outdated, need to update the wiki..?
> > 3) As this is going to stable release,are we planing enable Big top for
> 2.8.2 testing Or Dynamometer testing (anybody from linked-in can help)..?
> >
> > @Junping Du,Please correct me,if I am wrong.
> >
> >
> > --Brahma Reddy Battula
> > 
> > From: Junping Du 
> > Sent: Monday, August 7, 2017 2:44 PM
> > To: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org
> > Subject: Re: Apache Hadoop 2.8.2 

Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-22 Thread larry mccay
+1 (non-binding)

- verified signatures
- built from source and ran tests
- deployed pseudo cluster
- ran basic tests for hdfs, wordcount, credential provider API and related
commands
- tested webhdfs with knox


On Wed, Mar 22, 2017 at 7:21 AM, Ravi Prakash  wrote:

> Thanks for all the effort Junping!
>
> +1 (binding)
> + Verified signature and MD5, SHA1, SHA256 checksum of tarball
> + Verified SHA ID in git corresponds to RC3 tag
> + Verified wordcount for one small text file produces same output as
> hadoop-2.7.3.
> + HDFS Namenode UI looks good.
>
> I agree none of the issues reported so far are blockers. Looking forward to
> another great release.
>
> Thanks
> Ravi
>
> On Tue, Mar 21, 2017 at 8:10 PM, Junping Du  wrote:
>
> > Thanks all for response with verification work and vote!
> >
> >
> > Sounds like we are hitting several issues here, although none seems to be
> > blockers so far. Given the large commit set - 2000+ commits first landed
> in
> > branch-2 release, we may should follow 2.7.0 practice that to claim this
> > release is not for production cluster, just like Vinod's suggestion in
> > previous email. We should quickly come up with 2.8.1 release in next 1
> or 2
> > month for production deployment.
> >
> >
> > We will close the vote in next 24 hours. For people who haven't vote,
> > please keep on verification work and report any issues if founded - I
> will
> > check if another round of RC is needed based on your findings. Thanks!
> >
> >
> > Thanks,
> >
> >
> > Junping
> >
> >
> > 
> > From: Kuhu Shukla 
> > Sent: Tuesday, March 21, 2017 3:17 PM
> > Cc: Junping Du; common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org
> ;
> > yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
> > Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)
> >
> >
> > +1 (non-binding)
> >
> > - Verified signatures.
> > - Downloaded and built from source tar.gz.
> > - Deployed a pseudo-distributed cluster on Mac Sierra.
> > - Ran example Sleep job successfully.
> > - Deployed latest Apache Tez 0.9 and ran sample Tez orderedwordcount
> > successfully.
> >
> > Thank you Junping and everyone else who worked on getting this release
> out.
> >
> > Warm Regards,
> > Kuhu
> > On Tuesday, March 21, 2017, 3:42:46 PM CDT, Eric Badger
> >  wrote:
> > +1 (non-binding)
> >
> > - Verified checksums and signatures of all files
> > - Built from source on MacOS Sierra via JDK 1.8.0 u65
> > - Deployed single-node cluster
> > - Successfully ran a few sample jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tuesday, March 21, 2017 2:56 PM, John Zhuge 
> > wrote:
> >
> >
> >
> > +1. Thanks for the great effort, Junping!
> >
> >
> >   - Verified checksums and signatures of the tarballs
> >   - Built source code with Java 1.8.0_66-b17 on Mac OS X 10.12.3
> >   - Built source and native code with Java 1.8.0_111 on Centos 7.2.1511
> >   - Cloud connectors:
> >   - s3a: integration tests, basic fs commands
> >   - adl: live unit tests, basic fs commands. See notes below.
> >   - Deployed a pseudo cluster, passed the following sanity tests in
> >   both insecure and SSL mode:
> >   - HDFS: basic dfs, distcp, ACL commands
> >   - KMS and HttpFS: basic tests
> >   - MapReduce wordcount
> >   - balancer start/stop
> >
> >
> > Needs the following JIRAs to pass all ADL tests:
> >
> >   - HADOOP-14205. No FileSystem for scheme: adl. Contributed by John
> Zhuge.
> >   - HDFS-11132. Allow AccessControlException in contract tests when
> >   getFileStatus on subdirectory of existing files. Contributed by
> > Vishwajeet
> >   Dusane
> >   - HADOOP-13928. TestAdlFileContextMainOperatio
> nsLive.testGetFileContext1
> >   runtime error. (John Zhuge via lei)
> >
> >
> > On Mon, Mar 20, 2017 at 10:31 AM, John Zhuge 
> wrote:
> >
> > > Yes, it only affects ADL. There is a workaround of adding these 2
> > > properties to core-site.xml:
> > >
> > >  
> > >fs.adl.impl
> > >org.apache.hadoop.fs.adl.AdlFileSystem
> > >  
> > >
> > >  
> > >fs.AbstractFileSystem.adl.impl
> > >org.apache.hadoop.fs.adl.Adl
> > >  
> > >
> > > I have the initial patch ready but hitting these live unit test
> failures:
> > >
> > > Failed tests:
> > >
> > > TestAdlFileSystemContractLive.runTest:60->FileSystemContractBaseTest.
> > > testListStatus:257
> > > expected:<1> but was:<10>
> > >
> > > Tests in error:
> > >
> > > TestAdlFileContextMainOperationsLive>FileContextMainOperationsBaseT
> est.
> > > testMkdirsFailsForSubdirectoryOfExistingFile:254
> > > » AccessControl
> > >
> > > TestAdlFileSystemContractLive.runTest:60->FileSystemContractBaseTest.
> > > testMkdirsFailsForSubdirectoryOfExistingFile:190
> > > » AccessControl
> > >
> > >
> > > Stay tuned...
> > >
> > > John Zhuge
> > > Software Engineer, Cloudera
> > >
> > > On Mon, Mar 20, 2017 at 10:02 AM, 

Re: [VOTE] Release Apache Hadoop 2.6.5 (RC1)

2016-10-07 Thread larry mccay
+1 (non-binding)


* Downloaded and verified signatures

* Built from source

* Deployed a standalone cluster

* Tested HDFS commands and job submit

* Tested webhdfs through Apache Knox

On Fri, Oct 7, 2016 at 10:35 PM, Karthik Kambatla 
wrote:

> Thanks for putting the RC together, Sangjin.
>
> +1 (binding)
>
> Built from source, deployed pseudo distributed cluster and ran some example
> MR jobs.
>
> On Fri, Oct 7, 2016 at 6:01 PM, Yongjun Zhang  wrote:
>
> > Hi Sangjin,
> >
> > Thanks a lot for your work here.
> >
> > My +1 (binding).
> >
> > - Downloaded both binary and src tarballs
> > - Verified md5 checksum and signature for both
> > - Build from source tarball
> > - Deployed 2 pseudo clusters, one with the released tarball and the other
> > with what I built from source, and did the following on both:
> > - Run basic HDFS operations, and distcp jobs
> > - Run pi job
> > - Examined HDFS webui, YARN webui.
> >
> > Best,
> >
> > --Yongjun
> >
> > > > > * verified basic HDFS operations and Pi job.
> > > > > * Did a sanity check for RM and NM UI.
> >
> >
> > On Fri, Oct 7, 2016 at 5:08 PM, Sangjin Lee  wrote:
> >
> > > I'm casting my vote: +1 (binding)
> > >
> > > Regards,
> > > Sangjin
> > >
> > > On Fri, Oct 7, 2016 at 3:12 PM, Andrew Wang 
> > > wrote:
> > >
> > > > Thanks to Chris and Sangjin for working on this release.
> > > >
> > > > +1 binding
> > > >
> > > > * Verified signatures
> > > > * Built from source tarball
> > > > * Started HDFS and did some basic ops
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > On Fri, Oct 7, 2016 at 2:50 PM, Wangda Tan 
> > wrote:
> > > >
> > > > > Thanks Sangjin for cutting this release!
> > > > >
> > > > > +1 (Binding)
> > > > >
> > > > > - Downloaded binary tar ball and setup a single node cluster.
> > > > > - Submit a few applications and which can successfully run.
> > > > >
> > > > > Thanks,
> > > > > Wangda
> > > > >
> > > > >
> > > > > On Fri, Oct 7, 2016 at 10:33 AM, Zhihai Xu  >
> > > > wrote:
> > > > >
> > > > > > Thanks Sangjin for creating release 2.6.5 RC1.
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > * Downloaded and built from source
> > > > > > * Verified md5 checksums and signature
> > > > > > * Deployed a pseudo cluster
> > > > > > * verified basic HDFS operations and Pi job.
> > > > > > * Did a sanity check for RM and NM UI.
> > > > > >
> > > > > > Thanks
> > > > > > zhihai
> > > > > >
> > > > > > On Fri, Oct 7, 2016 at 8:16 AM, Sangjin Lee 
> > > wrote:
> > > > > >
> > > > > > > Thanks Masatake!
> > > > > > >
> > > > > > > Today's the last day for this vote, and I'd like to ask you to
> > try
> > > > out
> > > > > > the
> > > > > > > RC and vote on this today. So far there has been no binding
> vote.
> > > > > Thanks
> > > > > > > again.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Sangjin
> > > > > > >
> > > > > > > On Fri, Oct 7, 2016 at 6:45 AM, Masatake Iwasaki <
> > > > > > > iwasak...@oss.nttdata.co.jp> wrote:
> > > > > > >
> > > > > > > > +1(non-binding)
> > > > > > > >
> > > > > > > > * verified signature and md5.
> > > > > > > > * built with -Pnative on CentOS6 and OpenJDK7.
> > > > > > > > * built documentation and skimmed the contents.
> > > > > > > > * built rpms by bigtop and ran smoke-tests of hdfs, yarn and
> > > > > mapreduce
> > > > > > on
> > > > > > > > 3-node cluster.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Masatake Iwasaki
> > > > > > > >
> > > > > > > > On 10/3/16 09:12, Sangjin Lee wrote:
> > > > > > > >
> > > > > > > >> Hi folks,
> > > > > > > >>
> > > > > > > >> I have pushed a new release candidate (R1) for the Apache
> > Hadoop
> > > > > 2.6.5
> > > > > > > >> release (the next maintenance release in the 2.6.x release
> > > line).
> > > > > RC1
> > > > > > > >> contains fixes to CHANGES.txt, and is otherwise identical to
> > > RC0.
> > > > > > > >>
> > > > > > > >> Below are the details of this release candidate:
> > > > > > > >>
> > > > > > > >> The RC is available for validation at:
> > > > > > > >> http://home.apache.org/~sjlee/hadoop-2.6.5-RC1/.
> > > > > > > >>
> > > > > > > >> The RC tag in git is release-2.6.5-RC1 and its git commit is
> > > > > > > >> e8c9fe0b4c252caf2ebf1464220599650f119997.
> > > > > > > >>
> > > > > > > >> The maven artifacts are staged via repository.apache.org
> at:
> > > > > > > >> https://repository.apache.org/content/repositories/
> > > > > > > orgapachehadoop-1050/.
> > > > > > > >>
> > > > > > > >> You can find my public key at
> > > > > > > >> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS.
> > > > > > > >>
> > > > > > > >> Please try the release and vote. The vote will run for the
> > > usual 5
> > > > > > > days. I
> > > > > > > >> would greatly appreciate your timely vote. Thanks!
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Sangjin
> > > > > > > 

Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread larry mccay
I believe it was described as some previous audit entries have been
superseded by new ones and that the order may no longer be the same for
other entries.

For what it’s worth, I agree with the assertion that this is a backward
incompatible output - especially for audit logs.

On Thu, Aug 18, 2016 at 11:32 AM, Steve Loughran 
wrote:

>
> > On 18 Aug 2016, at 14:57, Junping Du  wrote:
> >
> > I think Allen's previous comments are very misleading.
> > In my understanding, only incompatible API (RPC, CLIs, WebService, etc.)
> shouldn't land on branch-2, but other incompatible behaviors (logs,
> audit-log, daemon's restart, etc.) should get flexible for landing.
> Otherwise, how could 52 issues ( https://s.apache.org/xJk5) marked with
> incompatible-changes could get landed on branch-2 after 2.2.0 release? Most
> of them are already released.
> >
> > Thanks,
> >
> > Junping
>
>
> Don't get AW started on compatiblity; it'll only upset him.
>
> One thing he does care about is the ability of programs to consume the
> output of commands and logs —and for that even the output of commands and
> logs need to continue to be parseable
>
> https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/
> Compatibility.html#Command_Line_Interface_CLI
>
> " Changing the path of a command, removing or renaming command line
> options, the order of arguments, or the command return code and output
> break compatibility and may adversely affect users."
>
> I believe Allen is particularly concerned that a minor point release is
> going in as incompatible, on the basis the audit log output will change
> —that's the log that is explicitly designed for machine processing, hooking
> up to flume & kafka, etc. As example, Spotify spoke at a Hadoop Summit
> conference about how they used it to identify files which hadn't been used
> for a long time; inferring an atime attribute from the access history.
>
> What has changed in the output?
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-25 Thread larry mccay
Oops - make that:

+1 (non-binding)

On Sun, Jul 24, 2016 at 4:07 PM, larry mccay <lmc...@apache.org> wrote:

> +1 binding
>

> * downloaded and built from source
> * checked LICENSE and NOTICE files
> * verified signatures
> * ran standalone tests
> * installed pseudo-distributed instance on my mac
> * ran through HDFS and mapreduce tests
> * tested credential command
> * tested webhdfs access through Apache Knox
>
>
> On Fri, Jul 22, 2016 at 10:15 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
>
>> Hi all,
>>
>> I've created a release candidate RC0 for Apache Hadoop 2.7.3.
>>
>> As discussed before, this is the next maintenance release to follow up
>> 2.7.2.
>>
>> The RC is available for validation at:
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>>
>> The RC tag in git is: release-2.7.3-RC0
>>
>> The maven artifacts are available via repository.apache.org <
>> http://repository.apache.org/> at
>> https://repository.apache.org/content/repositories/orgapachehadoop-1040/
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1040/
>> >
>>
>> The release-notes are inside the tar-balls at location
>> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
>> hosted this at
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
>> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html>
>> for your quick perusal.
>>
>> As you may have noted, a very long fix-cycle for the License & Notice
>> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
>> to slip by quite a bit. This release's related discussion thread is linked
>> below: [1].
>>
>> Please try the release and vote; the vote will run for the usual 5 days.
>>
>> Thanks,
>> Vinod
>>
>> [1]: 2.7.3 release plan:
>> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
>> http://markmail.org/thread/6yv2fyrs4jlepmmr>
>
>
>


Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-24 Thread larry mccay
+1 binding

* downloaded and built from source
* checked LICENSE and NOTICE files
* verified signatures
* ran standalone tests
* installed pseudo-distributed instance on my mac
* ran through HDFS and mapreduce tests
* tested credential command
* tested webhdfs access through Apache Knox


On Fri, Jul 22, 2016 at 10:15 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> Hi all,
>
> I've created a release candidate RC0 for Apache Hadoop 2.7.3.
>
> As discussed before, this is the next maintenance release to follow up
> 2.7.2.
>
> The RC is available for validation at:
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>
> The RC tag in git is: release-2.7.3-RC0
>
> The maven artifacts are available via repository.apache.org <
> http://repository.apache.org/> at
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/ <
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/>
>
> The release-notes are inside the tar-balls at location
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> hosted this at
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for
> your quick perusal.
>
> As you may have noted, a very long fix-cycle for the License & Notice
> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
> to slip by quite a bit. This release's related discussion thread is linked
> below: [1].
>
> Please try the release and vote; the vote will run for the usual 5 days.
>
> Thanks,
> Vinod
>
> [1]: 2.7.3 release plan:
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
> http://markmail.org/thread/6yv2fyrs4jlepmmr>


Re: Why there are so many revert operations on trunk?

2016-06-07 Thread larry mccay
-1 needs not be a taken as a derogatory statement being a number should
actually make it less emotional.
It is dangerous to a community to become oversensitive to it.

I generally see language such as "I am -1 on this until this particular
thing is fixed" or that it violates some common pattern or precedence set
in the project. This is perfectly reasonable language and there is no
reason to make the reviewer provide an alternative.

So, I am giving my -1 to any proposal for rule changes on -1 votes. :)


On Tue, Jun 7, 2016 at 1:15 PM, Ravi Prakash  wrote:

> +1 on being more respectful. We seem to be having a lot of distasteful
> discussions recently. If we fight each other, we are only helping our
> competitors win (and trust me, its out there).
>
> I would also respectfully request people not to throw -1s around. I have
> faced this a few times and its really frustrating. Every one has opinions
> and some times different people can't fathom why someone else thinks the
> way they do. I am pretty sure none of us is acting with malicious intent,
> so perhaps a little more tolerance, faith and trust will help all of us
> improve Hadoop and the ecosystem much faster. That's not to say that
> sometimes -1s are not warranted, but we should look to it as an extreme
> measure. Unfortunately there is very little disincentive right now to vote
> -1 . Maybe we should modify the rules. if you vote -1 , you have to
> come up with an alternative implementation? (perhaps limit the amount of
> time you have to the amount already spent in producing the patch that you
> are against)?
>
> Just my 2 cents
> Ravi
>
>
> On Tue, Jun 7, 2016 at 7:54 AM, Junping Du  wrote:
>
> > - We need to at the least force a reset of expectations w.r.t how trunk
> > and small / medium / incompatible changes there are treated. We should
> hold
> > off making a release off trunk before this gets fully discussed in the
> > community and we all reach a consensus.
> >
> > +1. We should hold off any release work off trunk before we reach a
> > consensus. Or more and more developing work/features could be affected
> just
> > like Larry mentioned.
> >
> >
> > - Reverts (or revert and move to a feature-branch) shouldn’t have been
> > unequivocally done without dropping a note / informing everyone /
> building
> > consensus.
> >
> > Agree. To revert commits from other committers, I think we need to: 1)
> > provide technical evidence/reason that is solid as rack, like: break
> > functionality, tests, API compatibility, or significantly offend code
> > convention, etc. 2) Making consensus with related contributors/committers
> > based on these technical reasons/evidences. Unfortunately, I didn't see
> we
> > ever do either thing in this case.
> >
> >
> > - Freaking out on -1’s and reverts - we as a community need to be less
> > stigmatic about -1s / reverts.
> >
> > +1. As a community, I believe we all prefer to work in a more friendly
> > environment. In many cases, -1 without solid reason will frustrate people
> > who are doing contributions. I think we should restraint our -1 unless it
> > is really necessary.
> >
> >
> >
> > Thanks,
> >
> >
> > Junping
> >
> >
> > 
> > From: Vinod Kumar Vavilapalli 
> > Sent: Monday, June 06, 2016 9:36 PM
> > To: Andrew Wang
> > Cc: Junping Du; Aaron T. Myers; common-...@hadoop.apache.org;
> > hdfs-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Why there are so many revert operations on trunk?
> >
> > Folks,
> >
> > It is truly disappointing how we are escalating situations that can be
> > resolved through basic communication.
> >
> > Things that shouldn’t have happened
> > - After a few objections were raised, commits should have simply stopped
> > before restarting again but only after consensus
> > - Reverts (or revert and move to a feature-branch) shouldn’t have been
> > unequivocally done without dropping a note / informing everyone /
> building
> > consensus. And no, not even a release-manager gets this free pass. Not on
> > branch-2, not on trunk, not anywhere.
> > - Freaking out on -1’s and reverts - we as a community need to be less
> > stigmatic about -1s / reverts.
> >
> > Trunk releases:
> > This is the other important bit about huge difference of expectations
> > between the two sides w.r.t trunk and branching. Till now, we’ve never
> made
> > releases out of trunk. So in-progress features that people deemed to not
> > need a feature branch could go into trunk without much trouble. Given
> that
> > we are now making releases off trunk, I can see (a) the RM saying "no,
> > don’t put in-progress stuff and (b) the contributors saying “no we don’t
> > want the overhead of a branch”. I’ve raised related topics (but only
> > focusing on incompatible changes) before -
> > http://markmail.org/message/m6x73t6srlchywsn - but we never decided
> > anything.
> 

Re: Why there are so many revert operations on trunk?

2016-06-06 Thread larry mccay
inline


On Mon, Jun 6, 2016 at 4:36 PM, Vinod Kumar Vavilapalli 
wrote:

> Folks,
>
> It is truly disappointing how we are escalating situations that can be
> resolved through basic communication.
>
> Things that shouldn’t have happened
> - After a few objections were raised, commits should have simply stopped
> before restarting again but only after consensus
> - Reverts (or revert and move to a feature-branch) shouldn’t have been
> unequivocally done without dropping a note / informing everyone / building
> consensus. And no, not even a release-manager gets this free pass. Not on
> branch-2, not on trunk, not anywhere.
> - Freaking out on -1’s and reverts - we as a community need to be less
> stigmatic about -1s / reverts.
>
>
Agreed.


> Trunk releases:
> This is the other important bit about huge difference of
> expectations between the two sides w.r.t trunk and branching. Till now,
> we’ve never made releases out of trunk. So in-progress features that people
> deemed to not need a feature branch could go into trunk without much
> trouble. Given that we are now making releases off trunk, I can see (a) the
> RM saying "no, don’t put in-progress stuff and (b) the contributors saying
> “no we don’t want the overhead of a branch”. I’ve raised related topics
> (but only focusing on incompatible changes) before -
> http://markmail.org/message/m6x73t6srlchywsn <
> http://markmail.org/message/m6x73t6srlchywsn> - but we never decided
> anything.
>
> We need to at the least force a reset of expectations w.r.t how trunk and
> small / medium / incompatible changes there are treated. We should hold off
> making a release off trunk before this gets fully discussed in the
> community and we all reach a consensus.
>

+1

In essence, by moving commits to a feature branch so that we can release
from trunk is creating a "trunk-branch". :)


> > * Without a user API, there's no way for people to use it, so not much
> > advantage to having it in a release
> >
> > Since the code is separate and probably won't break any existing code, I
> > won't -1 if you want to include this in a release without a user API, but
> > again, I question the utility of including code that can't be used.
>
> Clearly, there are two sides to this argument. One side claims the absence
> of user-facing public / stable APIs, and that for all purposes this is
> dead-code for everyone other than the few early adopters who want to
> experiment with it. The other argument is to not put this code before a
> user API. Again, I’d discuss with fellow community members before making
> what the other side perceives as unacceptable moves.
>
> From 2.8.0 perspective, it shouldn’t have landed there in the first place
> - I have been pushing for a release for a while with help only from a few
> members of the community. But if you say that it has no material impact on
> the user story, having a by-default switched-off feature that *doesn’t*
> destabilize the core release, I’d be willing to let it pass.
>
> +Vinod


Re: Why there are so many revert operations on trunk?

2016-06-06 Thread larry mccay
This seems like something that is going to probably happen again if we
continue to cut releases from trunk.
I know that this has been discussed at length in a separate thread but I
think it would be good to recognize that it is the core of the issue here.

Either we:

* need to define what will happen on trunk in such circumstances and
clearly communicate an action before taking it on the dev@ list or
* we need to not introduce this sort of thrashing on trunk by releasing
from it directly

My humble 2 cents...

--larry


On Mon, Jun 6, 2016 at 1:56 PM, Andrew Wang 
wrote:

> To clarify what happened here, I moved the commits to a feature branch, not
> just reverting the commits. The intent was to make it easy to merge back in
> later, and also to unblock the 2.8 and 3.0 releases we've been trying very
> hard to wrap up for weeks. This doesn't slow down development since you can
> keep committing to a branch, and I did the git work to make it easy to
> merge back in alter. I'm also happy to review the merge if the concern is
> getting three +1s.
>
> In the comments on HDFS-9924, you can see comments from a month ago raising
> concerns about the API and also that this significant expansion of the HDFS
> API is being done on release branches. There is an explicit -1 on continued
> commits to trunk, and a request to move the commits to a feature branch.
> Similar concerns have been raised by multiple contributors on that JIRA.
> Yet, the commits remained in release branches, and new patches continued to
> be committed to release branches.
>
> There's no need to attribute malicious intent to slow down feature
> development; for some reason I keep seeing this accusation thrown around
> when there are many people chiming in on HDFS-9924 with concerns about the
> feature. Considering how it's expanding the HDFS API, this is also the kind
> of work that should go through a merge vote anyway to get more eyes on it.
>
> We've been converging on the API requirements, but until the user-facing
> API is ready, I don't see the advantage of having this code in a release
> branch. As noted by the contributors on this JIRA, it's new separate code,
> so there's little to no overhead to keeping a feature branch in sync.
>
> So, to sum it up, I moved these commits to a branch because:
>
> * The discussion about the user API is still ongoing, and there is
> currently no user-facing API
> * We are very late in the 2.8 and 3.0 release cycles, trying to do blocker
> burndown
> * This code is separate and thus easy to keep in sync on a branch and merge
> in later
> * Without a user API, there's no way for people to use it, so not much
> advantage to having it in a release
>
> Since the code is separate and probably won't break any existing code, I
> won't -1 if you want to include this in a release without a user API, but
> again, I question the utility of including code that can't be used.
>
> Thanks,
> Andrew
>


Re: Guidance needed on HADOOP-13096 and HADOOP-13097

2016-05-06 Thread larry mccay
I agree with your rationale for not doing C now.
And those clean up tasks can more easily be discussed when separated from
this effort.


On Fri, May 6, 2016 at 3:11 PM, Allen Wittenauer <a...@apache.org> wrote:

> After thinking about it, I think you are correct here: I’m more
> inclined to do D w/follow-up JIRAs to fix this up. The hadoop and hdfs
> script functionality is being tested, so it isn’t like HADOOP-12930 is
> going in with zero unit tests. Never mind that large chunks of hadoop-tools
> gets modified to use this “for reals” as well. The yarn and mapred tests
> don’t really bring _that_ much to the table.
>
> I think the biggest worry about doing C inside the HADOOP-12930
> feature branch is that it seems like the wrong time/place to do it.  Making
> that big of a change to the build should probably be two separate,
> orthogonal JIRAs (one for YARN, one for MR) in their own right.  But I do
> think C is the correct, long-term path.  We should probably move hdfs and
> common scripts into separate dirs as well, honestly.
>
> Thanks for the feedback!
>
>
> > On May 5, 2016, at 7:22 PM, Larry McCay <lmc...@hortonworks.com> wrote:
> >
> > I would vote for C or D with a filed JIRA to clean up the maven
> structure as a separate effort.
> > Before moving to D, could you describe any reason to not go with C?
> >
> > On May 4, 2016, at 9:51 PM, Allen Wittenauer <a...@apache.org> wrote:
> >
> >>
> >>  When the sub-projects re-merged, maven work was done, whatever,
> the shell scripts for MR and YARN were placed (effectively) outside of the
> normal maven hierarchy.  In order to add unit tests to the shell scripts
> for these sub-projects, it means effectively turning
> hadoop-yarn-project/hadoop-yarn and hadoop-mapreduce-project into “real”
> modules so that mvn test works as expected.   Doing so will likely have
> some surprising consequences, such as anyone who modifies java code and the
> shell code in a patch will trigger _all_ of the unit tests in yarn.
> >>
> >>  I think we have four options:
> >>
> >> a) Continue forward turning these into real modules with src
> directories, etc and we live with the consequences
> >>
> >> b) Move the related bits into an existing module, making them similar
> to HDFS, common, tools
> >>
> >> c) Move the related bits into a new module, using the layout that maven
> really really wants
> >>
> >> d) Skip the unit tests; we don’t have them now
> >>
> >>  This is clearly more work than what I really wanted to cover in
> this branch, but given that there was a specific request to add unit test
> code for this functionality, I’m sort of stuck here.
> >>
> >>  Thoughts?
> >> -
> >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >>
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: Guidance needed on HADOOP-13096 and HADOOP-13097

2016-05-05 Thread Larry McCay
I would vote for C or D with a filed JIRA to clean up the maven structure as a 
separate effort.
Before moving to D, could you describe any reason to not go with C?

On May 4, 2016, at 9:51 PM, Allen Wittenauer  wrote:

> 
>   When the sub-projects re-merged, maven work was done, whatever, the 
> shell scripts for MR and YARN were placed (effectively) outside of the normal 
> maven hierarchy.  In order to add unit tests to the shell scripts for these 
> sub-projects, it means effectively turning hadoop-yarn-project/hadoop-yarn 
> and hadoop-mapreduce-project into “real” modules so that mvn test works as 
> expected.   Doing so will likely have some surprising consequences, such as 
> anyone who modifies java code and the shell code in a patch will trigger 
> _all_ of the unit tests in yarn.
> 
>   I think we have four options:
> 
> a) Continue forward turning these into real modules with src directories, etc 
> and we live with the consequences
> 
> b) Move the related bits into an existing module, making them similar to 
> HDFS, common, tools
> 
> c) Move the related bits into a new module, using the layout that maven 
> really really wants
> 
> d) Skip the unit tests; we don’t have them now
> 
>   This is clearly more work than what I really wanted to cover in this 
> branch, but given that there was a specific request to add unit test code for 
> this functionality, I’m sort of stuck here.
> 
>   Thoughts?
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> 
> 


-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: 2.7.1 status

2015-05-26 Thread larry mccay
Hi Vinod -

I think that https://issues.apache.org/jira/browse/HADOOP-11934 should also
be added to the blocker list.
This is a critical bug in our ability to protect the LDAP connection
password in LdapGroupsMapper.

thanks!

--larry

On Tue, May 26, 2015 at 3:32 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Tx for reporting this, Elliot.

 Made it a blocker, not with a deeper understanding of the problem. Can you
 please chime in with your opinion and perhaps code reviews?

 Thanks
 +Vinod

 On May 26, 2015, at 10:48 AM, Elliott Clark ecl...@apache.org wrote:

  HADOOP-12001 should probably be added to the blocker list since it's a
  regression that can keep ldap from working.