from:"larry mccay"

Re: way forward for Winutils excision from `FileSystem`

2022-11-30 Thread larry mccay

As Chris mentioned earlier, it would be wise to do this in pieces that can
be reviewed properly.
Bringing large refactorings in all at once, as Garret mentioned, is not
likely to just get a +1.

We do have a feature branch process and criteria and we could determine
specific criteria for such a critical piece of code going through such
changes.

The question of funding strikes me as odd on this list - though I
understand the motivation and relevance to you, for sure.
Feel free to look me up on LI and if there are contacts or anything that I
can help with let me know.

In the meantime, I would think this thread and list should discuss
solutions to problems and our ability to get there with reasonable effort
and practices that we are familiar with.

On Wed, Nov 30, 2022 at 12:46 PM Garret Wilson 
wrote:

> On 11/29/2022 8:16 AM, Gautham Banasandra wrote:
> > …
> > However, I don't see anyone stopping you from working on removing
> > winutils. I encourage you to put across a PR and I would be glad to
> review
> > the same.
>
> That's not how it works. This is an intense undertaking. If I spend six
> months with no income, just rewriting all the native `FileSystem`
> implementations, and you simply gave thumbs up to the PR, then yay,
> Apache would integrate my changes into the codebase? I hardly think so.
> There has to be official buy-in across the group and authorization to
> make such extensive changes. It's naive to say, "oh, just go rewrite it,
> and I'll review it and then it will be done".
>
> However I am interested in who funds your work. Do you work on Apache
> for free in your extra time? Or does some corporation pay you? If the
> latter, I'll be happy to submit my resume to them, so that they can fund
> me as well and I can start the work immediately. But as I've mentioned a
> couple of times already, financially I cannot justify sitting here
> rewriting Hadoop file systems without any income. If you find a creative
> way for it to be financially viable for me, I would love to do it.
> > One question I've is - how will you validate that your changes work fine
> > and don't regress the existing functionality, given that we don't yet
> have
> > a CI for Hadoop on Windows?
>
> It's tempting to start to give you a detailed answer here, because it's
> a legitimate question. The more general answer is that we would discuss
> and form a plan with the group; you'll likely find that 1) the existing
> code doesn't even have sufficient tests, and 2) the existing API isn't
> even sufficiently documented. But  your question was formulated in a way
> completely different than I conceptualize the issue. What I would be
> writing would be a completely native Java implementation of
> `FileSystem`. The tests accordingly should be written agnostic to the
> platform. If the tests run on Linux, they will run on Windows; if not,
> we need to file a bug against the JDK. I'm not even thinking in terms of
> a "CI for Hadoop for Windows". I just want to build the Java project,
> whether I'm running on Mac or Linux or Windows or whatever. (That was
> the point of my wanting to get rid of Winutils to begin with.)
>
> I also know that pragmatically whatever I do with the `FileSystem`
> implementation, something will initially break—not because of anything I
> did incorrectly, but because the Hadoop API is inadequate and people
> have therefore made a thousand brittle assumptions in their use of the
> API. Things will break already with or without my `FileSystem`
> implementation; that's why Hadoop is still using
> `DeprecatedRawLocalFileStatus`: someone made a new version but had to
> switch it off because something broke (HADOOP-9652
>  according to the
> comments).
>
> In summary, yes, if I ever get buy-in and funding to rewrite
> `FileSystem` for native Java, we need to have a discussion with the
> wider group to form a plan for improving the documentation and for
> testing. But whatever discussion or plan we do, things will eventually
> break because Hadoop doesn't have a well-documented API and doesn't
> cleanly separate the interface from the implementation. If I were to
> work on it, I would improve that situation so that things would be
> better documented and less brittle.
>
> In the meantime my Bare Naked Local FileSystem
>  is meeting
> my needs pragmatically, and I'm leaving this mailing list—not to be
> antisocial, but because the unrelated (mostly automated) chatter is
> distracting to my other work.
>
> Have a wonderful holiday season, and feel free to reach out directly.
>
> Best,
>
> Garret
>

Re: improving efficiency and reducing runtime using S3 read optimization

2021-08-25 Thread larry mccay

Hi Kumar -

This looks very promising and you should absolutely pursue contributing it
back!
Whether you initially merge into S3A or bring S3E in separately could be
determined through PR review or even on a DISCUSS thread here.

Congrats on what seem to be very positive results!

thanks,

--larry

On Wed, Aug 25, 2021 at 10:33 PM Bhalchandra Pandit
 wrote:

> Hi All,
> I work for Pinterest. I developed a technique for vastly improving read
> throughput when reading from the S3 file system. It not only helps the
> sequential read case (like reading a SequenceFile) but also significantly
> improves read throughput of a random access case (like reading Parquet).
> This technique has been very useful in significantly improving efficiency
> of the data processing jobs at Pinterest.
>
> I would like to contribute that feature to Apache Hadoop. More details on
> this technique are available in this blog I wrote recently:
>
> https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0
>
> I would like to know if you believe it to be a useful contribution. If so,
> I will follow the steps outlined on the how to contribute
> 
> page.
>
> Kumar
>

[jira] [Resolved] (HADOOP-16736) Best Big data hadoop training in pune

2019-11-28 Thread Larry McCay (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay resolved HADOOP-16736.
--
Resolution: Invalid

This is spam - resolving as Invalid.

> Best Big data hadoop training in pune
> -
>
> Key: HADOOP-16736
> URL: https://issues.apache.org/jira/browse/HADOOP-16736
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: surbhi nahta
>Priority: Major
>
> At SevenMentor, we are always striving to achieve value for our candidates. 
> We provide the *Best Big Data Hadoop Training in Pune* which includes all 
> recent technologies and tools. Any candidate from an IT background or having 
> basic knowledge of programming can enroll for this course. Freshers or 
> experienced candidates can join this course to understand Hadoop analytics 
> and development practically. Big Data is the data that can not be processed 
> by traditional database systems. Big data consist of data in the structured 
> ie. Rows and Columns format, semi-structured i.e.XML records and Unstructured 
> format i.e.Text records, Twitter Comments. Hadoop is a software framework for 
> writing and running distributed applications that process a large amount of 
> data. Hadoop framework consists of Storage area known as Hadoop Distributed 
> File System(HDFS) and processing part known as the MapReduce programming 
> model. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: CredentialProvider API

2019-04-24 Thread larry mccay

This is likely an issue only for issues where we need the password from
HDFS in order to access HDFS.
This should definitely be avoided by not having a static credential
provider path configured for startup that includes such a dependency.

For instance, the JIRA you cite is an example where we need to do group
lookup in order to determine whether you are allowed to access the HDFS
resource that provides the password required to do group lookup.

Storing passwords in credential stores within HDFS should be perfectly safe
for things like SSL that don't have a dependency on HDFS itself.

Those details are in the documentation page that you referenced but if they
need to be made more clear that completely makes sense.

On Wed, Apr 24, 2019 at 9:56 PM Karthik P  wrote:

> Team,
>
> Datanode is failed to restart after configuring credentials provider,
> storing credential into HDFS (jceks://hdfs@hostname
> :9001/credential/keys.jceks).
>
> Getting a StackOverFlow error in datanode jsvc.out file similar to
> HADOOP-11934 .
>
> As per the documentation link
> <
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Supported_Features
> >,
> we support storing credential in HDFS.
>
> *URI jceks://file|hdfs/path-to-keystore, is used to retrieve credentials
> from a Java keystore. The underlying use of the Hadoop filesystem
> abstraction allows credentials to be stored on the local filesystem or
> within HDFS.*
>
> Assume a scenario, where all of our data nodes were down and we configured
> hadoop.security.credential.provider.path to HDFS location. So when we try
> to get FileSystem.get() during datanode restart we end up doing recursive
> call if HDFS is inaccessible.
>
>
> /**
>  * Check and set 'configuration' if necessary.
>  *
>  * @param theObject object for which to set configuration
>  * @param conf Configuration
>  */
> public static void setConf(Object theObject, Configuration conf) {
>   if (conf != null) {
> if (theObject instanceof Configurable) {
>   ((Configurable) theObject).setConf(conf);
> }
> setJobConf(theObject, conf);
>   }
> }
>
>
> No issues if we store credential in LFS (localjceks://file). The problem
> only with jceks://hdfs/.
>
> Can I change Hadoop doc that we would not support storing credential in
> HDFS? Or Shall I handle this scenario only for statup issue?
>
>
> Thanks,
> Karthik
>

[jira] [Created] (HADOOP-16076) SPNEGO+SSL Client Connections with HttpClient Broken

2019-01-25 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-16076:


 Summary: SPNEGO+SSL Client Connections with HttpClient Broken
 Key: HADOOP-16076
 URL: https://issues.apache.org/jira/browse/HADOOP-16076
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, security
Affects Versions: 3.2.0
Reporter: Larry McCay
Assignee: Larry McCay


Client connections with HttpClient to a SPNEGO secured endpoint with TLS 
enabled break due to a misrepresentation of the SPN to include HTTPS instead of 
just HTTP.

The current use of HTTPClient 4.5.2 is affected by HTTPCLIENT-1712 and breaks 
SPNEGO with HTTPS endpoints since it include the httpS in the principal name.

We need to migrate to at least 4.5.3 as we have tested with that version and 
observed it fixing the issue. Need to do some due diligence to determine the 
cleanest version to upgrade to but will provide a patch in a day or so.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: HADOOP-14163 proposal for new hadoop.apache.org

2018-08-31 Thread larry mccay

+1 from me

On Fri, Aug 31, 2018, 5:30 AM Steve Loughran  wrote:

>
>
> > On 31 Aug 2018, at 09:07, Elek, Marton  wrote:
> >
> > Bumping this thread at last time.
> >
> > I have the following proposal:
> >
> > 1. I will request a new git repository hadoop-site.git and import the
> new site to there (which has exactly the same content as the existing site).
> >
> > 2. I will ask infra to use the new repository as the source of
> hadoop.apache.org
> >
> > 3. I will sync manually all of the changes in the next two months back
> to the svn site from the git (release announcements, new committers)
> >
> > IN CASE OF ANY PROBLEM we can switch back to the svn without any problem.
> >
> > If no-one objects within three days, I'll assume lazy consensus and
> start with this plan. Please comment if you have objections.
> >
> > Again: it allows immediate fallback at any time as svn repo will be kept
> as is (+ I will keep it up-to-date in the next 2 months)
> >
> > Thanks,
> > Marton
>
> sounds good to me
>
> +1
>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: [DISCUSS]: securing ASF Hadoop releases out of the box

2018-07-05 Thread larry mccay

+1 from me as well.

On Thu, Jul 5, 2018 at 5:19 PM, Steve Loughran 
wrote:

>
>
> > On 5 Jul 2018, at 23:15, Anu Engineer  wrote:
> >
> > +1, on the Non-Routable Idea. We like it so much that we added it to the
> Ozone roadmap.
> > https://issues.apache.org/jira/browse/HDDS-231
> >
> > If there is consensus on bringing this to Hadoop in general, we can
> build this feature in common.
> >
> > --Anu
> >
>
>
> +1 to out the box, everywhere. Web UIs included
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: [DISCUSS]: securing ASF Hadoop releases out of the box

2018-07-05 Thread larry mccay

Hi Steve -

This is a long overdue DISCUSS thread!

Perhaps the UIs can very visibly state (in red) "WARNING: UNSECURED UI
ACCESS - OPEN TO COMPROMISE" - maybe even force a click through the warning
to get to the page like SSL exceptions in the browser do?
Similar tactic for UI access without SSL?
A new AuthenticationFilter can be added to the filter chains that blocks
API calls unless explicitly configured to be open and obvious log a similar
message?

thanks,

--larry




On Wed, Jul 4, 2018 at 11:58 AM, Steve Loughran 
wrote:

> Bitcoins are profitable enough to justify writing malware to run on Hadoop
> clusters & schedule mining jobs: there have been a couple of incidents of
> this in the wild, generally going in through no security, well known
> passwords, open ports.
>
> Vendors of Hadoop-related products get to deal with their lockdown
> themselves, which they often do by installing kerberos from the outset,
> making users make up their own password for admin accounts, etc.
>
> The ASF releases though: we just provide something insecure out the box
> and some docs saying "use kerberos if you want security"
>
> What we can do here?
>
> Some things to think about
>
> * docs explaining IN CAPITAL LETTERS why you need to lock down your
> cluster to a private subnet or use Kerberos
> * Anything which can be done to make Kerberos easier (?). I see there are
> some oustanding patches for HADOOP-12649 which need review, but what else?
>
> Could we have Hadoop determine when it's coming up on an open network and
> start warning? And how?
>
> At the very least, single node hadoop should be locked down. You shouldn't
> have to bring up kerberos to run it like that. And for more sophisticated
> multinode deployments, should the scripts refuse to work without kerberos
> unless you pass in some argument like "--Dinsecure-clusters-permitted"
>
> Any other ideas?
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: [DISCUSS] Branch Proposal: HADOOP 15407: ABFS

2018-05-15 Thread larry mccay

This seems like a reasonable and effective use of a feature branch and
branch committers to me.


On Tue, May 15, 2018 at 11:34 AM, Steve Loughran 
wrote:

> Hi
>
> Chris Douglas I and I've have a proposal for a short-lived feature branch
> for the Azure ABFS connector to go into the hadoop-azure package. This will
> connect to the new azure storage service, which will ultimately replace the
> one used by wasb. It's a big patch and, like all storage connectors, will
> inevitably take time to stabilize (i.e: nobody ever get seek() right, even
> when we think we have).
>
> Thomas & Esfandiar will do the coding: they've already done the paperwork.
> Chris, myself & anyone else interested can be involved in the review and
> testing.
>
> Comments?
>
> -
>
> The initial HADOOP-15407 patch contains a new filesystem client for the
> forthcoming Azure ABFS, which is intended to replace Azure WASB as the
> Azure storage layer. The patch is large, as it contains the replacement
> client, tests, and generated code.
>
> We propose a feature branch, so the module can be broken into salient,
> reviewable chunks. Internal constraints prevented this feature from being
> developed in Apache, so we want to ensure that all the code is discussed,
> maintainable, and documented by the community before it merges.
>
> To effect this, we also propose adding two developers as branch
> committers: Thomas Marquardt tm...@microsoft.com r...@microsoft.com> Esfandiar Manii esma...@microsoft.com sma...@microsoft.com>
>
> Beyond normal feature branch activity and merge criteria for FS modules,
> we want to add another merge criterion for ABFS. Some of the client APIs
> are not GA. It seems reasonable to require that this client works with
> public endpoints before it merges to trunk.
>
> To test the Blob FS driver, Blob FS team (including Esfandiar Manii and
> Thomas Marquardt) in Azure Storage will need the MSDN subscription ID(s)
> for all reviewers who want to run the tests. The ABFS team will then
> whitelist the subscription ID(s) for the Blob FS Preview. At that time,
> future storage accounts created will have the Blob FS endpoint,
> .dfs.core.windows.net, which
> the Blob FS driver relies on.
>
> This is a temporary state during the (current) Private Preview and the
> early phases of Public Preview. In a few months, the whitelisting will not
> be required and anyone will be able to create a storage account with access
> to the Blob FS endpoint.
>
> Thomas and Esfandiar have been active in the Hadoop project working on the
> WASB connector (see https://issues.apache.org/jira/browse/HADOOP-14552).
> They understand the processes and requirements of the software. Working on
> the branch directly will let them bring this significant feature into the
> hadoop-azure module without disrupting existing users.
>

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-11 Thread larry mccay

No, the proposal was to only fix the NN port change - as I understood it.

On Thu, Jan 11, 2018 at 2:01 PM, Eric Yang <ey...@hortonworks.com> wrote:

> If I am reading this correctly, Daryn and Larry are in favor of complete
> revert instead of namenode only.  Please charm in if I am wrong.  This is
> the reason that I try to explore each perspective to understand the cost of
> each options.  It appears that we have a fragment of opinions, and only one
> choice will serve the need of majority of the community.  It would be good
> for a PMC to call the vote at reasonable pace to address this issue to
> reduce the pain point from either side of oppositions.
>
>
>
> Regards,
>
> Eric
>
>
>
> *From: *Chris Douglas <cdoug...@apache.org>
> *Date: *Wednesday, January 10, 2018 at 7:36 PM
> *To: *Eric Yang <ey...@hortonworks.com>
> *Cc: *"Aaron T. Myers" <a...@apache.org>, Daryn Sharp <da...@oath.com>,
> Hadoop Common <common-dev@hadoop.apache.org>, larry mccay <
> lmc...@apache.org>
>
> *Subject: *Re: When are incompatible changes acceptable (HDFS-12990)
>
>
>
> Isn't this limited to reverting the 8020 -> 9820 change? -C
>
>
>
> On Wed, Jan 10, 2018 at 6:13 PM Eric Yang <ey...@hortonworks.com> wrote:
>
> The fix in HDFS-9427 can potentially bring in new customers because less
> chance for new comer to encountering “port already in use” problem.  If we
> make change according to HDFS-12990, then this incompatible change does not
> make incompatible change compatible.  Other ports are not reverted
> according to HDFS-12990.  User will encounter the bad taste in the mouth
> that HDFS-9427 attempt to solve.  Please do consider both negative side
> effects of reverting as well as incompatible minor release change.  Thanks
>
> Regards,
> Eric
>
> From: larry mccay <lmc...@apache.org>
> Date: Wednesday, January 10, 2018 at 10:53 AM
> To: Daryn Sharp <da...@oath.com>
> Cc: "Aaron T. Myers" <a...@apache.org>, Eric Yang <ey...@hortonworks.com>,
> Chris Douglas <cdoug...@apache.org>, Hadoop Common <
> common-dev@hadoop.apache.org>
> Subject: Re: When are incompatible changes acceptable (HDFS-12990)
>
> On Wed, Jan 10, 2018 at 1:34 PM, Daryn Sharp <da...@oath.com<mailto:daryn@
> oath.com>> wrote:
>
> I fully agree the port changes should be reverted.  Although
> "incompatible", the potential impact to existing 2.x deploys is huge.  I'd
> rather inconvenience 3.0 deploys that compromise <1% customers.  An
> incompatible change to revert an incompatible change is called
> compatibility.
>
> +1
>
>
>
>
> Most importantly, consider that there is no good upgrade path existing
> deploys, esp. large and/or multi-cluster environments.  It’s only feasible
> for first-time deploys or simple single-cluster upgrades willing to take
> downtime.  Let's consider a few reasons why:
>
>
>
> 1. RU is completely broken.  Running jobs will fail.  If MR on hdfs
> bundles the configs, there's no way to transparently coordinate the switch
> to the new bundle with the port changed.  Job submissions will fail.
>
>
>
> 2. Users generally do not add the rpc port number to uris so unless their
> configs are updated they will contact the wrong port.  Seamlessly
> coordinating the conf change without massive failures is impossible.
>
>
>
> 3. Even if client confs are updated, they will break in a multi-cluster
> env with NNs using different ports.  Users/services will be forced to add
> the port.  The cited hive "issue" is not a bug since it's the only way to
> work in a multi-port env.
>
>
>
> 4. Coordinating the port add/change of uris is systems everywhere (you
> know something will be missed), updating of confs, restarting all services,
> requiring customers to redeploy their workflows in sync with the NN
> upgrade, will cause mass disruption and downtime that will be unacceptable
> for production environments.
>
>
>
> This is a solution to a non-existent problem.  Ports can be bound by
> multiple processes but only 1 can listen.  Maybe multiple listeners is an
> issue for compute nodes but not responsibly managed service nodes.  Ie. Who
> runs arbitrary services on the NNs that bind to random ports?  Besides, the
> default port is and was ephemeral so it solved nothing.
>
>
>
> This either standardizes ports to a particular customer's ports or is a
> poorly thought out whim.  In either case, the needs of the many outweigh
> the needs of the few/none (3.0 users).  The only logical conclusion is
> revert.  If a particular site wants to change default ports and deal with
> the

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-10 Thread larry mccay

rent values to avoid rule bending
>> and
>> > future frustrations.
>> >
>>
>> That we allow this incompatible change now does not mean that we are
>> categorically allowing more incompatible changes in the future. My point
>> is
>> that we should in all instances evaluate the merit of any incompatible
>> change on a case-by-case basis. This is not an exceptional circumstance -
>> we've made incompatible changes in the past when appropriate, e.g.
>> breaking
>> some clients to address a security issue. I and others believe that in
>> this
>> case the benefits greatly outweigh the downsides of changing this back to
>> what it has always been.
>>
>> Best,
>> Aaron
>>
>>
>> >
>> > Regards,
>> > Eric
>> >
>> > On 1/9/18, 11:21 AM, "Chris Douglas" <cdoug...@apache.org> wrote:
>> >
>> > Particularly since 9820 isn't in the contiguous range of ports in
>> > HDFS-9427, is there any value in this change?
>> >
>> > Let's change it back to prevent the disruption to users, but
>> > downstream projects should treat this as a bug in their tests.
>> Please
>> > open JIRAs in affected projects. -C
>> >
>> >
>> > On Tue, Jan 9, 2018 at 5:18 AM, larry mccay <lmc...@apache.org>
>> wrote:
>> > > On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <a...@apache.org>
>> > wrote:
>> > >
>> > >> Thanks a lot for the response, Larry. Comments inline.
>> > >>
>> > >> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmc...@apache.org>
>> > wrote:
>> > >>
>> > >>> Question...
>> > >>>
>> > >>> Can this be addressed in some way during or before upgrade that
>> > allows it
>> > >>> to only affect new installs?
>> > >>> Even a config based workaround prior to upgrade might make this
>> a
>> > change
>> > >>> less disruptive.
>> > >>>
>> > >>> If part of the upgrade process includes a step (maybe even a
>> > script) to
>> > >>> set the NN RPC port explicitly beforehand then it would allow
>> > existing
>> > >>> deployments and related clients to remain whole - otherwise it
>> > will uptake
>> > >>> the new default port.
>> > >>>
>> > >>
>> > >> Perhaps something like this could be done, but I think there are
>> > downsides
>> > >> to anything like this. For example, I'm sure there are plenty of
>> > >> applications written on top of Hadoop that have tests which
>> > hard-code the
>> > >> port number. Nothing we do in a setup script will help here. If
>> we
>> > don't
>> > >> change the default port back to what it was, these tests will
>> > likely all
>> > >> have to be updated.
>> > >>
>> > >>
>> > >
>> > > I may not have made my point clear enough.
>> > > What I meant to say is to fix the default port but direct folks to
>> > > explicitly set the port they are using in a deployment (the
>> current
>> > > default) so that it doesn't change out from under them - unless
>> they
>> > are
>> > > fine with it changing.
>> > >
>> > >
>> > >>
>> > >>> Meta note: we shouldn't be so pedantic about policy that we
>> can't
>> > back
>> > >>> out something that is considered a bug or even mistake.
>> > >>>
>> > >>
>> > >> This is my bigger point. Rigidly adhering to the compat
>> guidelines
>> > in this
>> > >> instance helps almost no one, while hurting many folks.
>> > >>
>> > >> We basically made a mistake when we decided to change the default
>> > NN port
>> > >> with little upside, even between major versions. We discovered
>> this
>> > very
>> > >> quickly, and we have an opportunity to fix it now and in so doing
>> > likely
>> > >> disrupt very, very few users and downstream applications. If we
>> > don't
>> > >> change it, we'll be causing difficulty for our users, downstream
>> > >> developers, and ourselves, potentially for years.
>> > >>
>> > >
>> > > Agreed.
>> > >
>> > >
>> > >>
>> > >> Best,
>> > >> Aaron
>> > >>
>> >
>> > ---
>> --
>> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> >
>> >
>> >
>> >
>>
>
>

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-09 Thread larry mccay

On Mon, Jan 8, 2018 at 11:28 PM, Aaron T. Myers <a...@apache.org> wrote:

> Thanks a lot for the response, Larry. Comments inline.
>
> On Mon, Jan 8, 2018 at 6:44 PM, larry mccay <lmc...@apache.org> wrote:
>
>> Question...
>>
>> Can this be addressed in some way during or before upgrade that allows it
>> to only affect new installs?
>> Even a config based workaround prior to upgrade might make this a change
>> less disruptive.
>>
>> If part of the upgrade process includes a step (maybe even a script) to
>> set the NN RPC port explicitly beforehand then it would allow existing
>> deployments and related clients to remain whole - otherwise it will uptake
>> the new default port.
>>
>
> Perhaps something like this could be done, but I think there are downsides
> to anything like this. For example, I'm sure there are plenty of
> applications written on top of Hadoop that have tests which hard-code the
> port number. Nothing we do in a setup script will help here. If we don't
> change the default port back to what it was, these tests will likely all
> have to be updated.
>
>

I may not have made my point clear enough.
What I meant to say is to fix the default port but direct folks to
explicitly set the port they are using in a deployment (the current
default) so that it doesn't change out from under them - unless they are
fine with it changing.


>
>> Meta note: we shouldn't be so pedantic about policy that we can't back
>> out something that is considered a bug or even mistake.
>>
>
> This is my bigger point. Rigidly adhering to the compat guidelines in this
> instance helps almost no one, while hurting many folks.
>
> We basically made a mistake when we decided to change the default NN port
> with little upside, even between major versions. We discovered this very
> quickly, and we have an opportunity to fix it now and in so doing likely
> disrupt very, very few users and downstream applications. If we don't
> change it, we'll be causing difficulty for our users, downstream
> developers, and ourselves, potentially for years.
>

Agreed.


>
> Best,
> Aaron
>

Re: When are incompatible changes acceptable (HDFS-12990)

2018-01-08 Thread larry mccay

Question...

Can this be addressed in some way during or before upgrade that allows it
to only affect new installs?
Even a config based workaround prior to upgrade might make this a change
less disruptive.

If part of the upgrade process includes a step (maybe even a script) to set
the NN RPC port explicitly beforehand then it would allow existing
deployments and related clients to remain whole - otherwise it will uptake
the new default port.

Meta note: we shouldn't be so pedantic about policy that we can't back out
something that is considered a bug or even mistake.

On Mon, Jan 8, 2018 at 9:17 PM, Aaron T. Myers  wrote:

> Hello all,
>
> Over in HDFS-12990 [1],
> we're having some discussion about whether or not it's ever acceptable to
> make an incompatible change in a minor or dot release. In general this is
> of course undesirable and should be avoided in almost all cases. However, I
> believe that each instance of someone desiring to make an incompatible
> change should be evaluated on a case-by-case basis to consider the costs
> and benefits of making that change. For example, I believe that we've
> historically made incompatible changes in minor or dot releases which would
> break older clients for security reasons.
>
> In this particular case linked above,  I believe that given that Hadoop
> 3.0.0 was just released, and thus very few folks are likely to have
> deployed it, it would benefit a large number of existing deployments and
> downstream applications to change the default NN RPC port number back to
> what it was in all previously-released versions of Apache Hadoop. I'd like
> to make this change in 3.0.1, and there is no question that doing so would
> should be considered an incompatible change between 3.0.0 and 3.0.1.
> However, I believe this incompatible change is warranted given the
> circumstances.
>
> Would like to hear others' thoughts on this.
>
> Thanks,
> Aaron
>
> [1] For some background, it used to be the case that many of Hadoop's
> default service ports were in the ephemeral range. This could potentially
> cause a service to fail to start up on a given host if some other process
> had happened to have already bound to said port. As part of that effort, we
> also changed the default NN RPC port from 8020 to 9820. Even though 8020
> wasn't in the ephemeral range, we moved it to 9820 to be close to the new
> range of the rest of the ports. At the time this change was made, though, I
> and others didn't realize the substantial downsides that doing so would
> introduce, for example the Hive metastore will put full HDFS paths
> including the port into its database, which can be a substantial upgrade
> headache.
>

[jira] [Resolved] (HADOOP-15075) Implement KnoxSSO for hadoop web UIs (hdfs, yarn, history server etc.)

2017-11-29 Thread Larry McCay (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay resolved HADOOP-15075.
--
Resolution: Not A Problem

Closing as not a problem - since JWTRedirectAuthenticationHandler should cover 
this usecase. See HADOOP-11717.

> Implement KnoxSSO for hadoop web UIs (hdfs, yarn, history server etc.)
> --
>
> Key: HADOOP-15075
> URL: https://issues.apache.org/jira/browse/HADOOP-15075
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client, security
>Affects Versions: 3.0.0-alpha3
>Reporter: madhu raghavendra
> Fix For: site
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Need to implement Knox SSO login feature for hadoop webUIs like HDFS 
> Namenode, Yarn RM, MR2 Job history server, spark etc. I know that we have 
> SPNEGO feature enabled, however having Knox SSO login feature seems to be a 
> good option



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-31 Thread larry mccay

create an encryption scheme or
protocol? Does it have a "novel" or "unique" use of normal crypto?  There
be dragons. Even normal-looking use of cryptography must be carefully
reviewed.
3.5 If you need random bits for a security purpose, such as for a session
token or a cryptographic key, you need a cryptographically approved place
to acquire said bits. Use the SecureRandom class. [DEFAULT]

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

*7. Log Messages*

Do not write secrets or data into log files. This sounds obvious, but
mistakes happen.

7.1 Do not log passwords, keys, security-related tokens, or any sensitive
configuration item.
7.2 Do not log any user-supplied data, ever. Not even snippets of user
data, such as “I had an error parsing this line of text: ” where the
’s are user data. You never know, it might contain secrets like credit
card numbers.

*8. Secure By Default*

Strive to be secure by default. This means that products should ship in a
secure state, and only by human tuning be put into an insecure state.
Exhibit A here is the MongoDB ransomware fiasco, where the
insecure-by-default MongoDB installation resulted in completely open
instances of mongodb on the open internet.  Attackers removed or encrypted
the data and left ransom notes behind. We don't want that sort of notoriety
for hadoop. Granted, it's not always possible to turn on all security
features: for example you have to have a KDC set up in order to enable
Kerberos.

8.1 Are there settings or configurations that can be shipped in a
default-secure state?

On Tue, Oct 31, 2017 at 10:36 AM, larry mccay <lmc...@apache.org> wrote:

> Thanks for the examples, Mike.
>
> I think some of those should actually just be added to the checklist in
> other places as they are best practices.
> Which raises an interesting point that some of those items can be enabled
> by default and maybe indicating so throughout the list makes sense.
>
> Then we can ask for a description of any other Secure by Default
> considerations at the end.
>
> I will work on a new revision this morning.
>
>
> On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <myo...@cloudera.com>
> wrote:
>
>> #8 is a great topic - given that Hadoop is insecure by default.
>>> Actual movement to Secure by Default would be a challenge both
>>> technically (given the need for kerberos) and discussion-wise.
>>> Asking whether you have considered any settings of configurations that
>>> can be secure by default is an interesting idea.
>>>
>>> Can you provide an example though?
>>>
>>
>> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
>> etc.  But here are some ideas:
>>
>> - Default to only listen for network traffic on the loopback interface.
>> The admin would have to take specific action to listen on a non-loopback
>> address. Hence secure by default. I've known web servers that ship like
>> this. The counter argument to this is that this is a "useless by default"
>> setting for a distributed system... which does have some validity.
>> - A more constrained version of the above is to not bind to any network
>> interface that has an internet-routable ip address. (That is, not in the
>> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
>> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
>> that's obviously headed towards the open internet.  Sure this isn't
>> perfect, but it would catch some cases. The admin could provide a specific
>> flag to override.  (I got this one from discussion with the Kudu folks.)
>> - The examples don't have to be big. Another example would be... if using
>> TLS, and if the certificate authority used to sign the certificate is in
>> the default certificate store, turn on HSTS automatically.
>> - Always turn off TLSv1 and TLSv1.1
>> - Forbid single-DES and RC4 encryption algorithms
>>
>> You get the idea.
>> -Mike
>>
>>
>>
>>>
>>>
>>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <myo...@cloudera.com>
>>> wrote:
>>>
>>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mcc

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-31 Thread larry mccay

Thanks for the examples, Mike.

I think some of those should actually just be added to the checklist in
other places as they are best practices.
Which raises an interesting point that some of those items can be enabled
by default and maybe indicating so throughout the list makes sense.

Then we can ask for a description of any other Secure by Default
considerations at the end.

I will work on a new revision this morning.


On Wed, Oct 25, 2017 at 4:56 PM, Michael Yoder <myo...@cloudera.com> wrote:

> #8 is a great topic - given that Hadoop is insecure by default.
>> Actual movement to Secure by Default would be a challenge both
>> technically (given the need for kerberos) and discussion-wise.
>> Asking whether you have considered any settings of configurations that
>> can be secure by default is an interesting idea.
>>
>> Can you provide an example though?
>>
>
> It's tough, I admit - kerberos requires a KDC, TLS requires certificates,
> etc.  But here are some ideas:
>
> - Default to only listen for network traffic on the loopback interface.
> The admin would have to take specific action to listen on a non-loopback
> address. Hence secure by default. I've known web servers that ship like
> this. The counter argument to this is that this is a "useless by default"
> setting for a distributed system... which does have some validity.
> - A more constrained version of the above is to not bind to any network
> interface that has an internet-routable ip address. (That is, not in the
> ranges <https://en.wikipedia.org/wiki/Private_network> 192.168.x.x,
> 172.16.x.x, and 10.x).  The idea is that we wouldn't want to risk traffic
> that's obviously headed towards the open internet.  Sure this isn't
> perfect, but it would catch some cases. The admin could provide a specific
> flag to override.  (I got this one from discussion with the Kudu folks.)
> - The examples don't have to be big. Another example would be... if using
> TLS, and if the certificate authority used to sign the certificate is in
> the default certificate store, turn on HSTS automatically.
> - Always turn off TLSv1 and TLSv1.1
> - Forbid single-DES and RC4 encryption algorithms
>
> You get the idea.
> -Mike
>
>
>
>>
>>
>> On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <myo...@cloudera.com>
>> wrote:
>>
>>> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lmc...@apache.org> wrote:
>>>
>>>> New Revision...
>>>>
>>>
>>> These lists are wonderful. I appreciate the split between the Tech
>>> Preview and the GA Readiness lists, with the emphasis on the former being
>>> "don't enable by default" or at least "don't enable if security is on".  I
>>> don't have any comments on that part.
>>>
>>> Additions inline below. If some of the additions are items covered by
>>> existing frameworks that any code would use, please forgive my ignorance.
>>> Also, my points aren't as succinct as yours. Feel free to reword.
>>>
>>> *GA Readiness Security Audit*
>>>> At this point, we are merging full or partial security model
>>>> implementations.
>>>> Let's inventory what is covered by the model at this point and whether
>>>> there are future merges required to be full.
>>>>
>>>> *1. UIs*
>>>>
>>>> 1.1. What sort of validation is being done on any accepted user input?
>>>> (pointers to code would be appreciated)
>>>> 1.2. What explicit protections have been built in for (pointers to code
>>>> would be appreciated):
>>>>   1.2.1. cross site scripting
>>>>   1.2.2. cross site request forgery
>>>>   1.2.3. click jacking (X-Frame-Options)
>>>>
>>>
>>> 1.2.4 If using cookies, is the secure flag for cookies
>>> <https://www.owasp.org/index.php/SecureFlag> turned on?
>>>
>>>
>>>> 1.3. What sort of authentication is required for access to the UIs?
>>>>   1.3.1. Kerberos
>>>> 1.3.1.1. has TGT renewal been accounted for
>>>> 1.3.1.2. SPNEGO support?
>>>> 1.3.1.3. Delegation token?
>>>>   1.3.2. Proxy User ACL?
>>>> 1.4. What authorization is available for determining who can access
>>>> what capabilities of the UIs for either viewing, modifying data and/or
>>>> related processes?
>>>> 1.5. Is there any input that will ultimately be persisted in
>>>> configuration for executing shell commands or processes?
>>>> 1.6. Do the UIs support the trusted proxy pattern wi

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-25 Thread larry mccay

Terrific additions, Mike!
I will spin a new revision and incorporate your additions.

#8 is a great topic - given that Hadoop is insecure by default.
Actual movement to Secure by Default would be a challenge both technically
(given the need for kerberos) and discussion-wise.
Asking whether you have considered any settings of configurations that can
be secure by default is an interesting idea.

Can you provide an example though?


On Wed, Oct 25, 2017 at 2:14 PM, Michael Yoder <myo...@cloudera.com> wrote:

> On Sat, Oct 21, 2017 at 8:47 AM, larry mccay <lmc...@apache.org> wrote:
>
>> New Revision...
>>
>
> These lists are wonderful. I appreciate the split between the Tech Preview
> and the GA Readiness lists, with the emphasis on the former being "don't
> enable by default" or at least "don't enable if security is on".  I don't
> have any comments on that part.
>
> Additions inline below. If some of the additions are items covered by
> existing frameworks that any code would use, please forgive my ignorance.
> Also, my points aren't as succinct as yours. Feel free to reword.
>
> *GA Readiness Security Audit*
>> At this point, we are merging full or partial security model
>> implementations.
>> Let's inventory what is covered by the model at this point and whether
>> there are future merges required to be full.
>>
>> *1. UIs*
>>
>> 1.1. What sort of validation is being done on any accepted user input?
>> (pointers to code would be appreciated)
>> 1.2. What explicit protections have been built in for (pointers to code
>> would be appreciated):
>>   1.2.1. cross site scripting
>>   1.2.2. cross site request forgery
>>   1.2.3. click jacking (X-Frame-Options)
>>
>
> 1.2.4 If using cookies, is the secure flag for cookies
> <https://www.owasp.org/index.php/SecureFlag> turned on?
>
>
>> 1.3. What sort of authentication is required for access to the UIs?
>>   1.3.1. Kerberos
>> 1.3.1.1. has TGT renewal been accounted for
>> 1.3.1.2. SPNEGO support?
>> 1.3.1.3. Delegation token?
>>   1.3.2. Proxy User ACL?
>> 1.4. What authorization is available for determining who can access what
>> capabilities of the UIs for either viewing, modifying data and/or related
>> processes?
>> 1.5. Is there any input that will ultimately be persisted in
>> configuration for executing shell commands or processes?
>> 1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
>> 1.7. Is there TLS/SSL support?
>>
>
> 1.7.1 Is it possible to configure TLS protocols and cipher suites?
> 1.7.2 Is it possible to configure support for HTTP Strict Transport
> Security
> <https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet>
> (HSTS)?
> 1.8 Are accesses to the UI audited? ("User X logged into Y from IP address
> Z", etc)
>
>
>> *2. REST APIs*
>>
>> 2.1. Do the REST APIs support the trusted proxy pattern with doas
>> impersonation capabilities?
>> 2.2. What explicit protections have been built in for:
>>   2.2.1. cross site scripting (XSS)
>>   2.2.2. cross site request forgery (CSRF)
>>   2.2.3. XML External Entity (XXE)
>> 2.3. What is being used for authentication - Hadoop Auth Module?
>> 2.4. Are there separate processes for the HTTP resources (UIs and REST
>> endpoints) or are they part of existing processes?
>> 2.5. Is there TLS/SSL support?
>> 2.6. Are there new CLI commands and/or clients for accessing the REST
>> APIs?
>> 2.7. What authorization enforcement points are there within the REST APIs?
>>
>
> The TLS and audit comments above apply here, too.
>
>
>> *3. Encryption*
>>
>> 3.1. Is there any support for encryption of persisted data?
>> 3.2. If so, is KMS and the hadoop key command used for key management?
>> 3.3. KMS interaction with Proxy Users?
>>
>
> 3.4 Cryptography is hard. There are more obscure pitfalls in crypto than
> any other in computer science. Standard cryptographic libraries should
> always be used. Does this work attempt to create an encryption scheme or
> protocol? Does it have a "novel" or "unique" use of normal crypto?  There
> be dragons. Even normal-looking use of cryptography must be carefully
> reviewed.
> 3.5 If you need random bits for a security purpose, such as for a session
> token or a cryptographic key, you need a cryptographically approved place
> to acquire said bits. Use the SecureRandom class.
>
> *4. Configuration*
>>
>> 4.1. Are there any passwords or secrets being added to configuration?
>> 4.2. If so, are they

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-21 Thread larry mccay

New Revision...

This revision acknowledges the reality that we often have multiple phases
of feature lifecycle and that we need to account for each phase.
It has also been made more generic.
I have created a Tech Preview Security Audit list and a GA Readiness
Security Audit list.
I've also included suggested items into the GA Readiness list.

It has also been suggested that we publish the information as part of docs
so that the state of such features can be easily determined from these
pages. We can discuss this aspect as well.

Thoughts?

*Tech Preview Security Audit*
For features that are being merged without full security model coverage,
there need to be a base line of assurances that they do not introduce new
attack vectors in deployments that are from actual releases or even just
built from trunk.

*1. UIs*

1.1. Are there new UIs added with this merge?
1.2. Are they enabled/accessible by default?
1.3. Are they hosted in existing processes or as part of a new
process/server?
1.4. If new process/server, is it launched by default?

*2. APIs*

2.1. Are there new REST APIs added with this merge?
2.2. Are they enabled by default?
2.3. Are there RPC based APIs added with this merge?
2.4. Are they enabled by default?

*3. Secure Clusters*

3.1. Is this feature disabled completely in secure deployments?
3.2. If not, is there some justification as to why it should be available?

*4. CVEs*

4.1. Have all dependencies introduced by this merge been checked for known
issues?

--

*GA Readiness Security Audit*
At this point, we are merging full or partial security model
implementations.
Let's inventory what is covered by the model at this point and whether
there are future merges required to be full.

*1. UIs*

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
  1.3.1. Kerberos
1.3.1.1. has TGT renewal been accounted for
1.3.1.2. SPNEGO support?
1.3.1.3. Delegation token?
  1.3.2. Proxy User ACL?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data and/or related
processes?
1.5. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.6. Do the UIs support the trusted proxy pattern with doas impersonation?
1.7. Is there TLS/SSL support?

*2. REST APIs*

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are they part of existing processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for accessing the REST APIs?
2.7. What authorization enforcement points are there within the REST APIs?

*3. Encryption*

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?
3.3. KMS interaction with Proxy Users?

*4. Configuration*

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning to credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out command execution, etc?

*5. HA*

5.1. Are there provisions for HA?
5.2. Are there any single point of failures?

*6. CVEs*

Dependencies need to have been checked for known issues before we merge.
We don't however want to list any CVEs that have been fixed but not
released yet.

6.1. All dependencies checked for CVEs?

On Sat, Oct 21, 2017 at 10:26 AM, larry mccay <lmc...@apache.org> wrote:

> Hi Marton -
>
> I don't think there is any denying that it would be great to have such
> documentation for all of those reasons.
> If it is a natural extension of getting the checklist information as an
> assertion of security state when merging then we can certainly include it.
>
> I think that backfilling all such information across the project is a
> different topic altogether and wouldn't want to expand the scope of this
> discussion in that direction.
>
> Thanks for the great thoughts on this!
>
> thanks,
>
> --larry
>
>
>
>
>
> On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <h...@anzix.net&g

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-21 Thread larry mccay

Hi Marton -

I don't think there is any denying that it would be great to have such
documentation for all of those reasons.
If it is a natural extension of getting the checklist information as an
assertion of security state when merging then we can certainly include it.

I think that backfilling all such information across the project is a
different topic altogether and wouldn't want to expand the scope of this
discussion in that direction.

Thanks for the great thoughts on this!

thanks,

--larry





On Sat, Oct 21, 2017 at 3:00 AM, Elek, Marton <h...@anzix.net> wrote:

>
>
> On 10/21/2017 02:41 AM, larry mccay wrote:
>
>>
>> "We might want to start a security section for Hadoop wiki for each of the
>>> services and components.
>>> This helps to track what has been completed."
>>>
>>
>> Do you mean to keep the audit checklist for each service and component
>> there?
>> Interesting idea, I wonder what sort of maintenance that implies and
>> whether we want to take on that burden even though it would be great
>> information to have for future reviewers.
>>
>
> I think we should care about the maintenance of the documentation anyway.
> We also need to maintain all the other documentations. I think it could be
> even part of the generated docs and not the wiki.
>
> I also suggest to fill this list about the current trunk/3.0 as a first
> step.
>
> 1. It would be a very usefull documentation for the end-users (some
> answers could link the existing documentation, it exists, but I am not sure
> if all the answers are in the current documentation.)
>
> 2. It would be a good example who the questions could be answered.
>
> 3. It would help to check, if something is missing from the list.
>
> 4. There are future branches where some of the components are not touched.
> For example, no web ui or no REST service. A prefilled list could help to
> check if the branch doesn't break any old security functionality on trunk.
>
> 5. It helps to document the security features in one place. If we have a
> list for the existing functionality in the same format, it would be easy to
> merge the new documentation of the new features as they will be reported in
> the same form. (So it won't be so hard to maintain the list...).
>
> Marton
>

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-20 Thread larry mccay

Hi Eric -

Thanks for the additional item suggestions!

"We might want to start a security section for Hadoop wiki for each of the
services and components.
This helps to track what has been completed."

Do you mean to keep the audit checklist for each service and component
there?
Interesting idea, I wonder what sort of maintenance that implies and
whether we want to take on that burden even though it would be great
information to have for future reviewers.

"How do we want to enforce security completeness?  Most features will not
meet all security requirements on merge day."

This is a really important question and point.
Maybe we should have started with goals and intents before the actual list.

My high level goals:

1. To have a holistic idea of what a given feature (or merge) is bringing
to the table in terms of attack surface
2. To understand the level of security that intended for the feature in its
endstate (GA)
3. To fully understand the stated level of security that is in place at the
time of each merge
4. To ensure that a merge meets some minimal bar for not adding security
vulnerabilities to deployments of a release or even builds from trunk. Not
the least of which is whether it is enabled by default and what it means to
disabled.
5. To be as unobtrusive to the branch committers as possible while still
communicating what we need for security review.
6. To have a reasonable checklist of security concerns that may or may not
apply to each merge but should be at least thought about in the final
security model design for the particular feature.

I think that feature merges often span multiple branch merges with security
coming in phases or other aspects of the feature.
This intent should maybe be part of the checklist itself so that we can
assess the audit with the level of scrutiny appropriate for the current
merge.

I will work on another revision of the list and incorporate your
suggestions as well.

thanks!

--larry

On Fri, Oct 20, 2017 at 7:42 PM, Eric Yang <ey...@hortonworks.com> wrote:

> The check list looks good.  Some more items to add:
>
> Kerberos
>   TGT renewal
>   SPNEGO support
>   Delegation token
> Proxy User ACL
>
> CVE tracking list
>
> We might want to start a security section for Hadoop wiki for each of the
> services and components.
> This helps to track what has been completed.
>
> How do we want to enforce security completeness?  Most features will not
> meet all security requirements on merge day.
>
> Regards,
> Eric
>
> On 10/20/17, 12:41 PM, "larry mccay" <lmc...@apache.org> wrote:
>
> Adding security@hadoop list as well...
>
> On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lmc...@apache.org>
> wrote:
>
> > All -
> >
> > Given the maturity of Hadoop at this point, I would like to propose
> that
> > we start doing explicit security audits of features at merge time.
> >
> > There are a few reasons that I think this is a good place/time to do
> the
> > review:
> >
> > 1. It represents a specific snapshot of where the feature stands as a
> > whole. This means that we can more easily identity the attack
> surface of a
> > given feature.
> > 2. We can identify any security gaps that need to be fixed before a
> > release that carries the feature can be considered ready.
> > 3. We - in extreme cases - can block a feature from merging until
> some
> > baseline of security coverage is achieved.
> > 4. The folks that are interested and able to review security aspects
> can't
> > scale for every iteration over every JIRA but can review the
> checklist and
> > follow pointers for specific areas of interest.
> >
> > I have provided an impromptu security audit checklist on the DISCUSS
> > thread for merging Ozone - HDFS-7240 into trunk.
> >
> > I don't want to pick on it particularly but I think it is a good way
> to
> > bootstrap this audit process and figure out how to incorporate it
> without
> > being too intrusive.
> >
> > The questions that I provided below are a mix of general questions
> that
> > could be on a standard checklist that you provide along with the
> merge
> > thread and some that are specific to what I read about ozone in the
> > excellent docs provided. So, we should consider some subset of the
> > following as a proposal for a general checklist.
> >
> > Perhaps, a shared document can be created to iterate over the list
> to fine
> > tune it?
> >
> > Any thoughts on this, any additional datapoints to collect, etc?
> >
> >

Re: 答复: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-10-20 Thread larry mccay

All -

I broke this list of questions out into a separate DISCUSS thread where we
can iterate over how a security audit process at merge time might look and
whether it is even something that we want to take on.

I will try and continue discussion on that thread and drive that to some
conclusion before bringing it into any particular merge discussion.

thanks,

--larry

On Fri, Oct 20, 2017 at 12:37 PM, larry mccay <lmc...@apache.org> wrote:

> I previously sent this same email from my work email and it doesn't seem
> to have gone through - resending from apache account (apologizing up from
> for the length)
>
> For such sizable merges in Hadoop, I would like to start doing security
> audits in order to have an initial idea of the attack surface, the
> protections available for known threats, what sort of configuration is
> being used to launch processes, etc.
>
> I dug into the architecture documents while in the middle of this list -
> nice docs!
> I do intend to try and make a generic check list like this for such
> security audits in the future so a lot of this is from that but I tried to
> also direct specific questions from those docs as well.
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>
> On Fri, Oct 20, 2017 at 11:19 AM, Anu Engineer <aengin...@hortonworks.com>
> wrote:
>
>> Hi Steve,
>>
>> In addition to everything Weiwei mentioned (chapter 3 of user guide), if
>> you really want to drill down to REST protocol you might want to apply this
>> patch and build ozone.
>>
>> https://issues.a

Re: [DISCUSS] Feature Branch Merge and Security Audits

2017-10-20 Thread larry mccay

Adding security@hadoop list as well...

On Fri, Oct 20, 2017 at 2:29 PM, larry mccay <lmc...@apache.org> wrote:

> All -
>
> Given the maturity of Hadoop at this point, I would like to propose that
> we start doing explicit security audits of features at merge time.
>
> There are a few reasons that I think this is a good place/time to do the
> review:
>
> 1. It represents a specific snapshot of where the feature stands as a
> whole. This means that we can more easily identity the attack surface of a
> given feature.
> 2. We can identify any security gaps that need to be fixed before a
> release that carries the feature can be considered ready.
> 3. We - in extreme cases - can block a feature from merging until some
> baseline of security coverage is achieved.
> 4. The folks that are interested and able to review security aspects can't
> scale for every iteration over every JIRA but can review the checklist and
> follow pointers for specific areas of interest.
>
> I have provided an impromptu security audit checklist on the DISCUSS
> thread for merging Ozone - HDFS-7240 into trunk.
>
> I don't want to pick on it particularly but I think it is a good way to
> bootstrap this audit process and figure out how to incorporate it without
> being too intrusive.
>
> The questions that I provided below are a mix of general questions that
> could be on a standard checklist that you provide along with the merge
> thread and some that are specific to what I read about ozone in the
> excellent docs provided. So, we should consider some subset of the
> following as a proposal for a general checklist.
>
> Perhaps, a shared document can be created to iterate over the list to fine
> tune it?
>
> Any thoughts on this, any additional datapoints to collect, etc?
>
> thanks!
>
> --larry
>
> 1. UIs
> I see there are at least two UIs - Storage Container Manager and Key Space
> Manager. There are a number of typical vulnerabilities that we find in UIs
>
> 1.1. What sort of validation is being done on any accepted user input?
> (pointers to code would be appreciated)
> 1.2. What explicit protections have been built in for (pointers to code
> would be appreciated):
>   1.2.1. cross site scripting
>   1.2.2. cross site request forgery
>   1.2.3. click jacking (X-Frame-Options)
> 1.3. What sort of authentication is required for access to the UIs?
> 1.4. What authorization is available for determining who can access what
> capabilities of the UIs for either viewing, modifying data or affecting
> object stores and related processes?
> 1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
> headers?
> 1.6. Is there any input that will ultimately be persisted in configuration
> for executing shell commands or processes?
> 1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
> 1.8. Is there TLS/SSL support?
>
> 2. REST APIs
>
> 2.1. Do the REST APIs support the trusted proxy pattern with doas
> impersonation capabilities?
> 2.2. What explicit protections have been built in for:
>   2.2.1. cross site scripting (XSS)
>   2.2.2. cross site request forgery (CSRF)
>   2.2.3. XML External Entity (XXE)
> 2.3. What is being used for authentication - Hadoop Auth Module?
> 2.4. Are there separate processes for the HTTP resources (UIs and REST
> endpoints) or are the part of existing HDFS processes?
> 2.5. Is there TLS/SSL support?
> 2.6. Are there new CLI commands and/or clients for access the REST APIs?
> 2.7. Bucket Level API allows for setting of ACLs on a bucket - what
> authorization is required here - is there a restrictive ACL set on creation?
> 2.8. Bucket Level API allows for deleting a bucket - I assume this is
> dependent on ACLs based access control?
> 2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
> paging available?
> 2.10. Storage Level APIs indicate “Signed with User Authorization” what
> does this refer to exactly?
> 2.11. Object Level APIs indicate that there is no ACL support and only
> bucket owners can read and write - but there are ACL APIs on the Bucket
> Level are they meaningless for now?
> 2.12. How does a REST client know which Ozone Handler to connect to or am
> I missing some well known NN type endpoint in the architecture doc
> somewhere?
>
> 3. Encryption
>
> 3.1. Is there any support for encryption of persisted data?
> 3.2. If so, is KMS and the hadoop key command used for key management?
>
> 4. Configuration
>
> 4.1. Are there any passwords or secrets being added to configuration?
> 4.2. If so, are they accessed via Configuration.getPassword() to allow for
> provisioning in credential providers?
> 4.3. Are there any settings that are used to launch docker containers or
> shell out any commands, etc?
>
> 5. HA
>
> 5.1. Are there provisions for HA?
> 5.2. Are we leveraging the existing HA capabilities in HDFS?
> 5.3. Is Storage Container Manager a SPOF?
> 5.4. I see HA listed in future work in the architecture doc - is this
> still an open issue?
>

[DISCUSS] Feature Branch Merge and Security Audits

2017-10-20 Thread larry mccay

All -

Given the maturity of Hadoop at this point, I would like to propose that we
start doing explicit security audits of features at merge time.

There are a few reasons that I think this is a good place/time to do the
review:

1. It represents a specific snapshot of where the feature stands as a
whole. This means that we can more easily identity the attack surface of a
given feature.
2. We can identify any security gaps that need to be fixed before a release
that carries the feature can be considered ready.
3. We - in extreme cases - can block a feature from merging until some
baseline of security coverage is achieved.
4. The folks that are interested and able to review security aspects can't
scale for every iteration over every JIRA but can review the checklist and
follow pointers for specific areas of interest.

I have provided an impromptu security audit checklist on the DISCUSS thread
for merging Ozone - HDFS-7240 into trunk.

I don't want to pick on it particularly but I think it is a good way to
bootstrap this audit process and figure out how to incorporate it without
being too intrusive.

The questions that I provided below are a mix of general questions that
could be on a standard checklist that you provide along with the merge
thread and some that are specific to what I read about ozone in the
excellent docs provided. So, we should consider some subset of the
following as a proposal for a general checklist.

Perhaps, a shared document can be created to iterate over the list to fine
tune it?

Any thoughts on this, any additional datapoints to collect, etc?

thanks!

--larry

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data or affecting
object stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
headers?
1.6. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is
dependent on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
paging available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what
does this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only
bucket owners can read and write - but there are ACL APIs on the Bucket
Level are they meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still
an open issue?

Re: 答复: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-10-20 Thread larry mccay

I previously sent this same email from my work email and it doesn't seem to
have gone through - resending from apache account (apologizing up from for
the length)

For such sizable merges in Hadoop, I would like to start doing security
audits in order to have an initial idea of the attack surface, the
protections available for known threats, what sort of configuration is
being used to launch processes, etc.

I dug into the architecture documents while in the middle of this list -
nice docs!
I do intend to try and make a generic check list like this for such
security audits in the future so a lot of this is from that but I tried to
also direct specific questions from those docs as well.

1. UIs
I see there are at least two UIs - Storage Container Manager and Key Space
Manager. There are a number of typical vulnerabilities that we find in UIs

1.1. What sort of validation is being done on any accepted user input?
(pointers to code would be appreciated)
1.2. What explicit protections have been built in for (pointers to code
would be appreciated):
  1.2.1. cross site scripting
  1.2.2. cross site request forgery
  1.2.3. click jacking (X-Frame-Options)
1.3. What sort of authentication is required for access to the UIs?
1.4. What authorization is available for determining who can access what
capabilities of the UIs for either viewing, modifying data or affecting
object stores and related processes?
1.5. Are the UIs built with proxying in mind by leveraging X-Forwarded
headers?
1.6. Is there any input that will ultimately be persisted in configuration
for executing shell commands or processes?
1.7. Do the UIs support the trusted proxy pattern with doas impersonation?
1.8. Is there TLS/SSL support?

2. REST APIs

2.1. Do the REST APIs support the trusted proxy pattern with doas
impersonation capabilities?
2.2. What explicit protections have been built in for:
  2.2.1. cross site scripting (XSS)
  2.2.2. cross site request forgery (CSRF)
  2.2.3. XML External Entity (XXE)
2.3. What is being used for authentication - Hadoop Auth Module?
2.4. Are there separate processes for the HTTP resources (UIs and REST
endpoints) or are the part of existing HDFS processes?
2.5. Is there TLS/SSL support?
2.6. Are there new CLI commands and/or clients for access the REST APIs?
2.7. Bucket Level API allows for setting of ACLs on a bucket - what
authorization is required here - is there a restrictive ACL set on creation?
2.8. Bucket Level API allows for deleting a bucket - I assume this is
dependent on ACLs based access control?
2.9. Bucket Level API to list bucket returns up to 1000 keys - is there
paging available?
2.10. Storage Level APIs indicate “Signed with User Authorization” what
does this refer to exactly?
2.11. Object Level APIs indicate that there is no ACL support and only
bucket owners can read and write - but there are ACL APIs on the Bucket
Level are they meaningless for now?
2.12. How does a REST client know which Ozone Handler to connect to or am I
missing some well known NN type endpoint in the architecture doc somewhere?

3. Encryption

3.1. Is there any support for encryption of persisted data?
3.2. If so, is KMS and the hadoop key command used for key management?

4. Configuration

4.1. Are there any passwords or secrets being added to configuration?
4.2. If so, are they accessed via Configuration.getPassword() to allow for
provisioning in credential providers?
4.3. Are there any settings that are used to launch docker containers or
shell out any commands, etc?

5. HA

5.1. Are there provisions for HA?
5.2. Are we leveraging the existing HA capabilities in HDFS?
5.3. Is Storage Container Manager a SPOF?
5.4. I see HA listed in future work in the architecture doc - is this still
an open issue?

On Fri, Oct 20, 2017 at 11:19 AM, Anu Engineer 
wrote:

> Hi Steve,
>
> In addition to everything Weiwei mentioned (chapter 3 of user guide), if
> you really want to drill down to REST protocol you might want to apply this
> patch and build ozone.
>
> https://issues.apache.org/jira/browse/HDFS-12690
>
> This will generate an Open API (https://www.openapis.org ,
> http://swagger.io) based specification which can be accessed from KSM UI
> or just as a json file.
> Unfortunately, this patch is still at code review stage, so you will have
> to apply the patch and build it yourself.
>
> Thanks
> Anu
>
>
> On 10/20/17, 6:09 AM, "Yang Weiwei"  wrote:
>
> Hi Steve
>
>
> The code is available in HDFS-7240 feature branch, public git repo
> here.
>
> I am not sure if there is a "public" API for object stores, but the
> design doc 12799549/ozone_user_v0.pdf> uses most common syntax so I believe it
> should be compliance. You can find the rest API doc here<
> https://github.com/apache/hadoop/blob/HDFS-7240/
>

Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread larry mccay

Hi Jonathan -

Thank you for bringing this up for discussion!

I would personally like to see a specific security review of features like
this - especially ones that allow for remote access to configuration.
I'll take a look at the JIRA and see whether I can come up with any
concerns or questions and I would urge others to give it a pass from a
security perspective as well.

In addition, here are a couple questions of the top of my head:

Is this feature extending the existing YARM RM REST API?
When it isn't enabled what is the API behavior?
Does it implement the trusted proxy pattern for proxies to be able to
impersonate users and most importantly to dictate what proxies would be
allowed to impersonate an admin for this API - which I assume will be
required?

--larry

On Fri, Sep 29, 2017 at 2:44 PM, Andrew Wang 
wrote:

> Hi Jonathan,
>
> I'm okay with putting this into branch-3.0 for GA if it can be merged
> within the next two weeks. Even though beta1 has slipped by a month, I want
> to stick to the targeted GA data of Nov 1st as much as possible. Of course,
> let's not sacrifice quality or stability for speed; if something's not
> ready, let's defer it to 3.1.0.
>
> Subru, have you been able to review this feature from the 2.9.0
> perspective? It'd add confidence if you think it's immediately ready for
> merging to branch-2 for 2.9.0.
>
> Thanks,
> Andrew
>
> On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
> wrote:
>
> > Hi everyone,
> >
> > Starting this thread to discuss merging API-based scheduler configuration
> > to trunk/branch-2. The feature adds the framework for allowing users to
> > modify scheduler configuration via REST or CLI using a configurable
> backend
> > (leveldb/zk are currently supported), and adds capacity scheduler support
> > for this. The umbrella JIRA is YARN-5734. All the required work for this
> > feature is done and committed to branch YARN-5734, and a full diff has
> been
> > generated at YARN-7241.
> >
> > Regarding compatibility, this feature is configurable and turned off by
> > default.
> >
> > The feature has been tested locally on a couple RMs (since it is an RM
> > only change), with queue addition/removal/updates tested on single RM
> > (leveldb) and two RMs (zk). Also we verified the original configuration
> > update mechanism (via refreshQueues) is unaffected when the feature is
> > off/not configured.
> >
> > Our original plan was to merge this to trunk (which is what the YARN-7241
> > diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
> > what are your thoughts on also merging this to branch-3.0?
> >
> > Thanks!
> >
> > Jonathan Hung
> >
>

Re: Moving Java Forward Faster

2017-09-07 Thread larry mccay

Interesting.
Thanks for sharing this, Allen.

Question: Does GPL licensing of the JDK/JVM affect us negatively?


On Thu, Sep 7, 2017 at 10:14 AM, Allen Wittenauer 
wrote:

>
>
> > Begin forwarded message:
> >
> > From: "Rory O'Donnell" 
> > Subject: Moving Java Forward Faster
> > Date: September 7, 2017 at 2:12:45 AM PDT
> > To: "strub...@yahoo.de >> Mark Struberg" 
> > Cc: rory.odonn...@oracle.com, abdul.kolarku...@oracle.com,
> balchandra.vai...@oracle.com, dalibor.to...@oracle.com, bui...@apache.org
> > Reply-To: bui...@apache.org
> >
> > Hi Mark & Gavin,
> >
> > Oracle is proposing a rapid release model for Java SE going-forward.
> >
> > The high points are highlighted below, details of the changes can be
> found on Mark Reinhold’s blog [1] , OpenJDK discussion email list [2].
> >
> > Under the proposed release model, after JDK 9, we will adopt a strict,
> time-based model with a new major release every six months, update releases
> every quarter, and a long-term support release every three years.
> >
> > The new JDK Project will run a bit differently than the past "JDK $N"
> Projects:
> >
> > - The main development line will always be open but fixes, enhancements,
> and features will be merged only when they're nearly finished. The main
> line will be Feature Complete [3] at all times.
> >
> > - We'll continue to use the JEP Process [4] for new features and other
> significant changes. The bar to target a JEP to a specific release will,
> however, be higher since the work must be Feature Complete in order to go
> in. Owners of large or risky features will be strongly encouraged to split
> such features up into smaller and safer parts, to integrate earlier in the
> release cycle, and to publish separate lines of early-access builds prior
> to integration.
> >
> > The JDK Updates Project will run in much the same way as the past "JDK
> $N" Updates Projects, though update releases will be strictly limited to
> fixes of security issues, regressions, and bugs in newer features.
> >
> > Related to this proposal, we intend to make a few changes in what we do:
> >
> > - Starting with JDK 9 we'll ship OpenJDK builds under the GPL [5], to
> make it easier for developers to deploy Java applications to cloud
> environments. We'll initially publish OpenJDK builds for Linux/x64,
> followed later by builds for macOS/x64 and Windows/x64.
> >
> > - We'll continue to ship proprietary "Oracle JDK" builds, which include
> "commercial features" [6] such as Java Flight Recorder and Mission Control
> [7], under a click-through binary-code license [8]. Oracle will continue to
> offer paid support for these builds.
> >
> > - After JDK 9 we'll open-source the commercial features in order to make
> the OpenJDK builds more attractive to developers and to reduce the
> differences between those builds and the Oracle JDK. This will take some
> time, but the ultimate goal is to make OpenJDK and Oracle JDK builds
> completely interchangeable.
> >
> > - Finally, for the long term we'll work with other OpenJDK contributors
> to establish an open build-and-test infrastructure. This will make it
> easier to publish early-access builds for features in development, and
> eventually make it possible for the OpenJDK Community itself to publish
> authoritative builds of the JDK.
> >
> > Questions , comments, feedback to OpenJDK discuss mailing list [2]
> >
> > Rgds,Rory
> >
> > [1]https://mreinhold.org/blog/forward-faster
> > [2]http://mail.openjdk.java.net/pipermail/discuss/2017-
> September/004281.html
> > [3]http://openjdk.java.net/projects/jdk8/milestones#Feature_Complete
> > [4]http://openjdk.java.net/jeps/0
> > [5]http://openjdk.java.net/legal/gplv2+ce.html
> > [6]http://www.oracle.com/technetwork/java/javase/terms/
> products/index.html
> > [7]http://www.oracle.com/technetwork/java/javaseproducts/mission-
> control/index.html
> > [8]http://www.oracle.com/technetwork/java/javase/terms/
> license/index.html
> >
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: [DISCUSS] Looking to Apache Hadoop 3.1 release

2017-09-06 Thread larry mccay

Hi Wangda -

Thank you for starting this conversation!

+1000 for a faster release cadence.
Quicker releases make turning around security fixes so much easier.

When we consider alpha features, let’s please ensure that they are not
delivered in a state that has known security issues and also make sure that
they are disabled by default. IMO - it is not a feature - alpha or
otherwise - unless it has some reasonable assurance of being secure. Please
don't see this as calling out any particular feature. I just think we need
to be very explicit about security expectations. Maybe this is already well
understood.

Thank you for this proposed plan and for volunteering!

—larry

On Wed, Sep 6, 2017 at 7:22 PM, Anu Engineer 
wrote:

> Hi Wangda,
>
> We are planning to start the Ozone merge discussion by the end of this
> month. I am hopeful that it will be merged pretty soon after that.
> Please add Ozone to the list of features that are being tracked for Apache
> Hadoop 3.1.
>
> We would love to release Ozone as an alpha feature in Hadoop 3.1.
>
> Thanks
> Anu
>
>
> On 9/6/17, 2:26 PM, "Arun Suresh"  wrote:
>
> >Thanks for starting this Wangda.
> >
> >I would also like to add:
> >- YARN-5972: Support Pausing/Freezing of opportunistic containers
> >
> >Cheers
> >-Arun
> >
> >On Wed, Sep 6, 2017 at 1:49 PM, Steve Loughran 
> >wrote:
> >
> >>
> >> > On 6 Sep 2017, at 19:13, Wangda Tan  wrote:
> >> >
> >> > Hi all,
> >> >
> >> > As we discussed on [1], there were proposals from Steve / Vinod etc to
> >> have
> >> > a faster cadence of releases and to start thinking of a Hadoop 3.1
> >> release
> >> > earlier than March 2018 as is currently proposed.
> >> >
> >> > I think this is a good idea. I'd like to start the process sooner, and
> >> > establish timeline etc so that we can be ready when 3.0.0 GA is out.
> With
> >> > this we can also establish faster cadence for future Hadoop 3.x
> releases.
> >> >
> >> > To this end, I propose to target Hadoop 3.1.0 for a release by mid Jan
> >> > 2018. (About 4.5 months from now and 2.5 months after 3.0-GA, instead
> of
> >> > 6.5 months from now).
> >> >
> >> > I'd also want to take this opportunity to come up with a more
> elaborate
> >> > release plan to avoid some of the confusion we had with 3.0 beta.
> General
> >> > proposal for the timeline (per this other proposal [2])
> >> > - Feature freeze date - all features should be merged by Dec 15, 2017.
> >> > - Code freeze date - blockers/critical only, no more improvements and
> non
> >> > blocker/critical bug-fixes: Jan 1, 2018.
> >> > - Release date: Jan 15, 2018
> >> >
> >> > Following is a list of features on my radar which could be candidates
> >> for a
> >> > 3.1 release:
> >> > - YARN-5734, Dynamic scheduler queue configuration. (Owner: Jonathan
> >> Hung)
> >> > - YARN-5881, Add absolute resource configuration to CapacityScheduler.
> >> > (Owner: Sunil)
> >> > - YARN-5673, Container-executor rewrite for better security,
> >> extensibility
> >> > and portability. (Owner Varun Vasudev)
> >> > - YARN-6223, GPU isolation. (Owner: Wangda)
> >> >
> >> > And from email [3] mentioned by Andrew, there’re several other HDFS
> >> > features want to be released with 3.1 as well, assuming they fit the
> >> > timelines:
> >> > - Storage Policy Satisfier
> >> > - HDFS tiered storage
> >> >
> >> > Please let me know if I missed any features targeted to 3.1 per this
> >> > timeline.
> >>
> >>
> >> HADOOP-13786 : S3Guard committer, which also adds resilience to failures
> >> talking to S3 (we barely have any today),
> >>
> >> >
> >> > And I want to volunteer myself as release manager of 3.1.0 release.
> >> Please
> >> > let me know if you have any suggestions/concerns.
> >>
> >> well volunteered :)
> >>
> >> >
> >> > Thanks,
> >> > Wangda Tan
> >> >
> >> > [1] http://markmail.org/message/hwar5f5ap654ck5o?q=
> >> > Branch+merges+and+3%2E0%2E0-beta1+scope
> >> > [2] http://markmail.org/message/hwar5f5ap654ck5o?q=Branch+
> >> > merges+and+3%2E0%2E0-beta1+scope#query:Branch%20merges%
> >> > 20and%203.0.0-beta1%20scope+page:1+mid:2hqqkhl2dymcikf5+state:results
> >> > [3] http://markmail.org/message/h35obzqrh3ag6dgn?q=Branch+merge
> >> > s+and+3%2E0%2E0-beta1+scope
>

Re: Apache Hadoop 2.8.2 Release Plan

2017-09-01 Thread larry mccay

If we do "fix" this in 2.8.2 we should seriously consider not doing so in
3.0.
This is a very poor practice.

I can see an argument for backward compatibility in 2.8.x line though.

On Fri, Sep 1, 2017 at 1:41 PM, Steve Loughran 
wrote:

> One thing we need to consider is
>
> HADOOP-14439: regression: secret stripping from S3x URIs breaks some
> downstream code
>
> Hadoop 2.8 has a best-effort attempt to strip out secrets from the
> toString() value of an s3a or s3n path where someone has embedded them in
> the URI; this has caused problems in some uses, specifically: when people
> use secrets this way (bad) and assume that you can round trip paths to
> string and back
>
> Should we fix this? If so, Hadoop 2.8.2 is the time to do it
>
>
> > On 1 Sep 2017, at 11:14, Junping Du  wrote:
> >
> > HADOOP-14814 get committed and HADOOP-9747 get push out to 2.8.3, so we
> are clean on blocker/critical issues now.
> > I finish practice of going through JACC report and no more incompatible
> public API changes get found between 2.8.2 and 2.7.4. Also I check commit
> history and fixed 10+ commits which are missing from branch-2.8.2 for some
> reason. So, the current branch-2.8.2 should be good to go for RC stage, and
> I will kick off our first RC tomorrow.
> > In the meanwhile, please don't land any commits to branch-2.8.2 since
> now. If some issues really belong to blocker, please ping me on the JIRA
> before doing any commits. branch-2.8 is still open for landing. Thanks for
> your cooperation!
> >
> >
> > Thanks,
> >
> > Junping
> >
> > 
> > From: Junping Du 
> > Sent: Wednesday, August 30, 2017 12:35 AM
> > To: Brahma Reddy Battula; common-dev@hadoop.apache.org;
> hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> > Subject: Re: Apache Hadoop 2.8.2 Release Plan
> >
> > Thanks Brahma for comment on this thread. To be clear, I always update
> branch version just before RC kicking off.
> >
> > For 2.8.2 release, I don't have plan to involve big top or other
> third-party test tools. As always, we will rely on test/verify efforts from
> community especially from large deployed production cluster - as far as I
> know,  there are already several companies. like: Yahoo!, Alibaba, etc.
> already deploy 2.8 release in large production clusters for months which
> give me more confidence on 2.8.2.
> >
> >
> > Here is more update on 2.8.2 release:
> >
> > Blocker issues:
> >
> >-  A new blocker YARN-7076 get reported and fixed by Jian He through
> last weekend.
> >
> >-  Another new blocker - HADOOP-14814 get identified from my latest
> jdiff run against 2.7.4. The simple fix on an incompatible API change
> should get commit soon.
> >
> >
> > Critical issues:
> >
> >-  YARN-7083 already get committed. Thanks Jason for reporting the
> issue and delivering the fix.
> >
> >-  YARN-6091 get push out from 2.8.2 as issue is not a regression and
> pending for a while.
> >
> >-  Daryn is actively working on HADOOP-9747 for a while, and the
> patch are getting close to be committed. However, according to Daryn, the
> patch seems to cause some regression in some corner cases in secured
> environment (Storm auto tgt, etc.). May need some additional watch/review
> on this JIRA's fixes.
> >
> >
> >
> > My monitoring JACC report between 2.8.2 and 2.7.4 will get finish
> tomorrow. If everything is going smoothly, I am planning to kick off RC0
> around holiday (this weekend).
> >
> >
> >
> > Thanks,
> >
> >
> >
> > Junping
> >
> >
> >
> > 
> > From: Brahma Reddy Battula 
> > Sent: Tuesday, August 29, 2017 8:42 AM
> > To: Junping Du; common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> > Subject: Re: Apache Hadoop 2.8.2 Release Plan
> >
> >
> > Hi All
> >
> > Update on 2.8.2 release status
> > we are down to 3 critical issues ( YARN-6091,YARN-7083,HADOOP-9747),all
> are patch available and closer to commit.
> > Junping is closing tracking this.
> >
> > Todo:
> >
> > 1) Update pom.xml ..?  currently it's with 2.8.3
> > https://github.com/apache/hadoop/blob/branch-2.8.2/pom.xml#L21
> > 2) Wiki
> is outdated, need to update the wiki..?
> > 3) As this is going to stable release,are we planing enable Big top for
> 2.8.2 testing Or Dynamometer testing (anybody from linked-in can help)..?
> >
> > @Junping Du,Please correct me,if I am wrong.
> >
> >
> > --Brahma Reddy Battula
> > 
> > From: Junping Du 
> > Sent: Monday, August 7, 2017 2:44 PM
> > To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> > Subject: Re: Apache Hadoop 2.8.2

Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)

2017-03-22 Thread larry mccay

+1 (non-binding)

- verified signatures
- built from source and ran tests
- deployed pseudo cluster
- ran basic tests for hdfs, wordcount, credential provider API and related
commands
- tested webhdfs with knox


On Wed, Mar 22, 2017 at 7:21 AM, Ravi Prakash  wrote:

> Thanks for all the effort Junping!
>
> +1 (binding)
> + Verified signature and MD5, SHA1, SHA256 checksum of tarball
> + Verified SHA ID in git corresponds to RC3 tag
> + Verified wordcount for one small text file produces same output as
> hadoop-2.7.3.
> + HDFS Namenode UI looks good.
>
> I agree none of the issues reported so far are blockers. Looking forward to
> another great release.
>
> Thanks
> Ravi
>
> On Tue, Mar 21, 2017 at 8:10 PM, Junping Du  wrote:
>
> > Thanks all for response with verification work and vote!
> >
> >
> > Sounds like we are hitting several issues here, although none seems to be
> > blockers so far. Given the large commit set - 2000+ commits first landed
> in
> > branch-2 release, we may should follow 2.7.0 practice that to claim this
> > release is not for production cluster, just like Vinod's suggestion in
> > previous email. We should quickly come up with 2.8.1 release in next 1
> or 2
> > month for production deployment.
> >
> >
> > We will close the vote in next 24 hours. For people who haven't vote,
> > please keep on verification work and report any issues if founded - I
> will
> > check if another round of RC is needed based on your findings. Thanks!
> >
> >
> > Thanks,
> >
> >
> > Junping
> >
> >
> > 
> > From: Kuhu Shukla 
> > Sent: Tuesday, March 21, 2017 3:17 PM
> > Cc: Junping Du; common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org
> ;
> > yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> > Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC3)
> >
> >
> > +1 (non-binding)
> >
> > - Verified signatures.
> > - Downloaded and built from source tar.gz.
> > - Deployed a pseudo-distributed cluster on Mac Sierra.
> > - Ran example Sleep job successfully.
> > - Deployed latest Apache Tez 0.9 and ran sample Tez orderedwordcount
> > successfully.
> >
> > Thank you Junping and everyone else who worked on getting this release
> out.
> >
> > Warm Regards,
> > Kuhu
> > On Tuesday, March 21, 2017, 3:42:46 PM CDT, Eric Badger
> >  wrote:
> > +1 (non-binding)
> >
> > - Verified checksums and signatures of all files
> > - Built from source on MacOS Sierra via JDK 1.8.0 u65
> > - Deployed single-node cluster
> > - Successfully ran a few sample jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tuesday, March 21, 2017 2:56 PM, John Zhuge 
> > wrote:
> >
> >
> >
> > +1. Thanks for the great effort, Junping!
> >
> >
> >   - Verified checksums and signatures of the tarballs
> >   - Built source code with Java 1.8.0_66-b17 on Mac OS X 10.12.3
> >   - Built source and native code with Java 1.8.0_111 on Centos 7.2.1511
> >   - Cloud connectors:
> >   - s3a: integration tests, basic fs commands
> >   - adl: live unit tests, basic fs commands. See notes below.
> >   - Deployed a pseudo cluster, passed the following sanity tests in
> >   both insecure and SSL mode:
> >   - HDFS: basic dfs, distcp, ACL commands
> >   - KMS and HttpFS: basic tests
> >   - MapReduce wordcount
> >   - balancer start/stop
> >
> >
> > Needs the following JIRAs to pass all ADL tests:
> >
> >   - HADOOP-14205. No FileSystem for scheme: adl. Contributed by John
> Zhuge.
> >   - HDFS-11132. Allow AccessControlException in contract tests when
> >   getFileStatus on subdirectory of existing files. Contributed by
> > Vishwajeet
> >   Dusane
> >   - HADOOP-13928. TestAdlFileContextMainOperatio
> nsLive.testGetFileContext1
> >   runtime error. (John Zhuge via lei)
> >
> >
> > On Mon, Mar 20, 2017 at 10:31 AM, John Zhuge 
> wrote:
> >
> > > Yes, it only affects ADL. There is a workaround of adding these 2
> > > properties to core-site.xml:
> > >
> > >  
> > >fs.adl.impl
> > >org.apache.hadoop.fs.adl.AdlFileSystem
> > >  
> > >
> > >  
> > >fs.AbstractFileSystem.adl.impl
> > >org.apache.hadoop.fs.adl.Adl
> > >  
> > >
> > > I have the initial patch ready but hitting these live unit test
> failures:
> > >
> > > Failed tests:
> > >
> > > TestAdlFileSystemContractLive.runTest:60->FileSystemContractBaseTest.
> > > testListStatus:257
> > > expected:<1> but was:<10>
> > >
> > > Tests in error:
> > >
> > > TestAdlFileContextMainOperationsLive>FileContextMainOperationsBaseT
> est.
> > > testMkdirsFailsForSubdirectoryOfExistingFile:254
> > > » AccessControl
> > >
> > > TestAdlFileSystemContractLive.runTest:60->FileSystemContractBaseTest.
> > > testMkdirsFailsForSubdirectoryOfExistingFile:190
> > > » AccessControl
> > >
> > >
> > > Stay tuned...
> > >
> > > John Zhuge
> > > Software Engineer, Cloudera
> > >
> > > On Mon, Mar 20, 2017 at 10:02 AM,

Re: [VOTE] Release Apache Hadoop 2.6.5 (RC1)

2016-10-07 Thread larry mccay

+1 (non-binding)


* Downloaded and verified signatures

* Built from source

* Deployed a standalone cluster

* Tested HDFS commands and job submit

* Tested webhdfs through Apache Knox

On Fri, Oct 7, 2016 at 10:35 PM, Karthik Kambatla 
wrote:

> Thanks for putting the RC together, Sangjin.
>
> +1 (binding)
>
> Built from source, deployed pseudo distributed cluster and ran some example
> MR jobs.
>
> On Fri, Oct 7, 2016 at 6:01 PM, Yongjun Zhang  wrote:
>
> > Hi Sangjin,
> >
> > Thanks a lot for your work here.
> >
> > My +1 (binding).
> >
> > - Downloaded both binary and src tarballs
> > - Verified md5 checksum and signature for both
> > - Build from source tarball
> > - Deployed 2 pseudo clusters, one with the released tarball and the other
> > with what I built from source, and did the following on both:
> > - Run basic HDFS operations, and distcp jobs
> > - Run pi job
> > - Examined HDFS webui, YARN webui.
> >
> > Best,
> >
> > --Yongjun
> >
> > > > > * verified basic HDFS operations and Pi job.
> > > > > * Did a sanity check for RM and NM UI.
> >
> >
> > On Fri, Oct 7, 2016 at 5:08 PM, Sangjin Lee  wrote:
> >
> > > I'm casting my vote: +1 (binding)
> > >
> > > Regards,
> > > Sangjin
> > >
> > > On Fri, Oct 7, 2016 at 3:12 PM, Andrew Wang 
> > > wrote:
> > >
> > > > Thanks to Chris and Sangjin for working on this release.
> > > >
> > > > +1 binding
> > > >
> > > > * Verified signatures
> > > > * Built from source tarball
> > > > * Started HDFS and did some basic ops
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > > On Fri, Oct 7, 2016 at 2:50 PM, Wangda Tan 
> > wrote:
> > > >
> > > > > Thanks Sangjin for cutting this release!
> > > > >
> > > > > +1 (Binding)
> > > > >
> > > > > - Downloaded binary tar ball and setup a single node cluster.
> > > > > - Submit a few applications and which can successfully run.
> > > > >
> > > > > Thanks,
> > > > > Wangda
> > > > >
> > > > >
> > > > > On Fri, Oct 7, 2016 at 10:33 AM, Zhihai Xu  >
> > > > wrote:
> > > > >
> > > > > > Thanks Sangjin for creating release 2.6.5 RC1.
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > * Downloaded and built from source
> > > > > > * Verified md5 checksums and signature
> > > > > > * Deployed a pseudo cluster
> > > > > > * verified basic HDFS operations and Pi job.
> > > > > > * Did a sanity check for RM and NM UI.
> > > > > >
> > > > > > Thanks
> > > > > > zhihai
> > > > > >
> > > > > > On Fri, Oct 7, 2016 at 8:16 AM, Sangjin Lee 
> > > wrote:
> > > > > >
> > > > > > > Thanks Masatake!
> > > > > > >
> > > > > > > Today's the last day for this vote, and I'd like to ask you to
> > try
> > > > out
> > > > > > the
> > > > > > > RC and vote on this today. So far there has been no binding
> vote.
> > > > > Thanks
> > > > > > > again.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Sangjin
> > > > > > >
> > > > > > > On Fri, Oct 7, 2016 at 6:45 AM, Masatake Iwasaki <
> > > > > > > iwasak...@oss.nttdata.co.jp> wrote:
> > > > > > >
> > > > > > > > +1(non-binding)
> > > > > > > >
> > > > > > > > * verified signature and md5.
> > > > > > > > * built with -Pnative on CentOS6 and OpenJDK7.
> > > > > > > > * built documentation and skimmed the contents.
> > > > > > > > * built rpms by bigtop and ran smoke-tests of hdfs, yarn and
> > > > > mapreduce
> > > > > > on
> > > > > > > > 3-node cluster.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Masatake Iwasaki
> > > > > > > >
> > > > > > > > On 10/3/16 09:12, Sangjin Lee wrote:
> > > > > > > >
> > > > > > > >> Hi folks,
> > > > > > > >>
> > > > > > > >> I have pushed a new release candidate (R1) for the Apache
> > Hadoop
> > > > > 2.6.5
> > > > > > > >> release (the next maintenance release in the 2.6.x release
> > > line).
> > > > > RC1
> > > > > > > >> contains fixes to CHANGES.txt, and is otherwise identical to
> > > RC0.
> > > > > > > >>
> > > > > > > >> Below are the details of this release candidate:
> > > > > > > >>
> > > > > > > >> The RC is available for validation at:
> > > > > > > >> http://home.apache.org/~sjlee/hadoop-2.6.5-RC1/.
> > > > > > > >>
> > > > > > > >> The RC tag in git is release-2.6.5-RC1 and its git commit is
> > > > > > > >> e8c9fe0b4c252caf2ebf1464220599650f119997.
> > > > > > > >>
> > > > > > > >> The maven artifacts are staged via repository.apache.org
> at:
> > > > > > > >> https://repository.apache.org/content/repositories/
> > > > > > > orgapachehadoop-1050/.
> > > > > > > >>
> > > > > > > >> You can find my public key at
> > > > > > > >> http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS.
> > > > > > > >>
> > > > > > > >> Please try the release and vote. The vote will run for the
> > > usual 5
> > > > > > > days. I
> > > > > > > >> would greatly appreciate your timely vote. Thanks!
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Sangjin
> > > > > > >

[jira] [Created] (HADOOP-13556) Change Configuration.getPropsWithPrefix to use getProps instead of iterator

2016-08-28 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-13556:


 Summary: Change Configuration.getPropsWithPrefix to use getProps 
instead of iterator
 Key: HADOOP-13556
 URL: https://issues.apache.org/jira/browse/HADOOP-13556
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.8.0


Current implementation of getPropsWithPrefix uses the Configuration.iterator() 
method. This method is not threadsafe.

This patch will reimplement the gathering of properties that begin with a 
prefix by using the safe getProps() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread larry mccay

I believe it was described as some previous audit entries have been
superseded by new ones and that the order may no longer be the same for
other entries.

For what it’s worth, I agree with the assertion that this is a backward
incompatible output - especially for audit logs.

On Thu, Aug 18, 2016 at 11:32 AM, Steve Loughran 
wrote:

>
> > On 18 Aug 2016, at 14:57, Junping Du  wrote:
> >
> > I think Allen's previous comments are very misleading.
> > In my understanding, only incompatible API (RPC, CLIs, WebService, etc.)
> shouldn't land on branch-2, but other incompatible behaviors (logs,
> audit-log, daemon's restart, etc.) should get flexible for landing.
> Otherwise, how could 52 issues ( https://s.apache.org/xJk5) marked with
> incompatible-changes could get landed on branch-2 after 2.2.0 release? Most
> of them are already released.
> >
> > Thanks,
> >
> > Junping
>
>
> Don't get AW started on compatiblity; it'll only upset him.
>
> One thing he does care about is the ability of programs to consume the
> output of commands and logs —and for that even the output of commands and
> logs need to continue to be parseable
>
> https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/
> Compatibility.html#Command_Line_Interface_CLI
>
> " Changing the path of a command, removing or renaming command line
> options, the order of arguments, or the command return code and output
> break compatibility and may adversely affect users."
>
> I believe Allen is particularly concerned that a minor point release is
> going in as incompatible, on the basis the audit log output will change
> —that's the log that is explicitly designed for machine processing, hooking
> up to flume & kafka, etc. As example, Spotify spoke at a Hadoop Summit
> conference about how they used it to identify files which hadn't been used
> for a long time; inferring an atime attribute from the access history.
>
> What has changed in the output?
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-25 Thread larry mccay

Oops - make that:

+1 (non-binding)

On Sun, Jul 24, 2016 at 4:07 PM, larry mccay <lmc...@apache.org> wrote:

> +1 binding
>

> * downloaded and built from source
> * checked LICENSE and NOTICE files
> * verified signatures
> * ran standalone tests
> * installed pseudo-distributed instance on my mac
> * ran through HDFS and mapreduce tests
> * tested credential command
> * tested webhdfs access through Apache Knox
>
>
> On Fri, Jul 22, 2016 at 10:15 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
>
>> Hi all,
>>
>> I've created a release candidate RC0 for Apache Hadoop 2.7.3.
>>
>> As discussed before, this is the next maintenance release to follow up
>> 2.7.2.
>>
>> The RC is available for validation at:
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>>
>> The RC tag in git is: release-2.7.3-RC0
>>
>> The maven artifacts are available via repository.apache.org <
>> http://repository.apache.org/> at
>> https://repository.apache.org/content/repositories/orgapachehadoop-1040/
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1040/
>> >
>>
>> The release-notes are inside the tar-balls at location
>> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
>> hosted this at
>> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
>> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html>
>> for your quick perusal.
>>
>> As you may have noted, a very long fix-cycle for the License & Notice
>> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
>> to slip by quite a bit. This release's related discussion thread is linked
>> below: [1].
>>
>> Please try the release and vote; the vote will run for the usual 5 days.
>>
>> Thanks,
>> Vinod
>>
>> [1]: 2.7.3 release plan:
>> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
>> http://markmail.org/thread/6yv2fyrs4jlepmmr>
>
>
>

Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-24 Thread larry mccay

+1 binding

* downloaded and built from source
* checked LICENSE and NOTICE files
* verified signatures
* ran standalone tests
* installed pseudo-distributed instance on my mac
* ran through HDFS and mapreduce tests
* tested credential command
* tested webhdfs access through Apache Knox


On Fri, Jul 22, 2016 at 10:15 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> Hi all,
>
> I've created a release candidate RC0 for Apache Hadoop 2.7.3.
>
> As discussed before, this is the next maintenance release to follow up
> 2.7.2.
>
> The RC is available for validation at:
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ <
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>
>
> The RC tag in git is: release-2.7.3-RC0
>
> The maven artifacts are available via repository.apache.org <
> http://repository.apache.org/> at
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/ <
> https://repository.apache.org/content/repositories/orgapachehadoop-1040/>
>
> The release-notes are inside the tar-balls at location
> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
> hosted this at
> http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html <
> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for
> your quick perusal.
>
> As you may have noted, a very long fix-cycle for the License & Notice
> issues (HADOOP-12893) caused 2.7.3 (along with every other Hadoop release)
> to slip by quite a bit. This release's related discussion thread is linked
> below: [1].
>
> Please try the release and vote; the vote will run for the usual 5 days.
>
> Thanks,
> Vinod
>
> [1]: 2.7.3 release plan:
> https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html <
> http://markmail.org/thread/6yv2fyrs4jlepmmr>

Re: Why there are so many revert operations on trunk?

2016-06-07 Thread larry mccay

-1 needs not be a taken as a derogatory statement being a number should
actually make it less emotional.
It is dangerous to a community to become oversensitive to it.

I generally see language such as "I am -1 on this until this particular
thing is fixed" or that it violates some common pattern or precedence set
in the project. This is perfectly reasonable language and there is no
reason to make the reviewer provide an alternative.

So, I am giving my -1 to any proposal for rule changes on -1 votes. :)


On Tue, Jun 7, 2016 at 1:15 PM, Ravi Prakash  wrote:

> +1 on being more respectful. We seem to be having a lot of distasteful
> discussions recently. If we fight each other, we are only helping our
> competitors win (and trust me, its out there).
>
> I would also respectfully request people not to throw -1s around. I have
> faced this a few times and its really frustrating. Every one has opinions
> and some times different people can't fathom why someone else thinks the
> way they do. I am pretty sure none of us is acting with malicious intent,
> so perhaps a little more tolerance, faith and trust will help all of us
> improve Hadoop and the ecosystem much faster. That's not to say that
> sometimes -1s are not warranted, but we should look to it as an extreme
> measure. Unfortunately there is very little disincentive right now to vote
> -1 . Maybe we should modify the rules. if you vote -1 , you have to
> come up with an alternative implementation? (perhaps limit the amount of
> time you have to the amount already spent in producing the patch that you
> are against)?
>
> Just my 2 cents
> Ravi
>
>
> On Tue, Jun 7, 2016 at 7:54 AM, Junping Du  wrote:
>
> > - We need to at the least force a reset of expectations w.r.t how trunk
> > and small / medium / incompatible changes there are treated. We should
> hold
> > off making a release off trunk before this gets fully discussed in the
> > community and we all reach a consensus.
> >
> > +1. We should hold off any release work off trunk before we reach a
> > consensus. Or more and more developing work/features could be affected
> just
> > like Larry mentioned.
> >
> >
> > - Reverts (or revert and move to a feature-branch) shouldn’t have been
> > unequivocally done without dropping a note / informing everyone /
> building
> > consensus.
> >
> > Agree. To revert commits from other committers, I think we need to: 1)
> > provide technical evidence/reason that is solid as rack, like: break
> > functionality, tests, API compatibility, or significantly offend code
> > convention, etc. 2) Making consensus with related contributors/committers
> > based on these technical reasons/evidences. Unfortunately, I didn't see
> we
> > ever do either thing in this case.
> >
> >
> > - Freaking out on -1’s and reverts - we as a community need to be less
> > stigmatic about -1s / reverts.
> >
> > +1. As a community, I believe we all prefer to work in a more friendly
> > environment. In many cases, -1 without solid reason will frustrate people
> > who are doing contributions. I think we should restraint our -1 unless it
> > is really necessary.
> >
> >
> >
> > Thanks,
> >
> >
> > Junping
> >
> >
> > 
> > From: Vinod Kumar Vavilapalli 
> > Sent: Monday, June 06, 2016 9:36 PM
> > To: Andrew Wang
> > Cc: Junping Du; Aaron T. Myers; common-dev@hadoop.apache.org;
> > hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org;
> > yarn-...@hadoop.apache.org
> > Subject: Re: Why there are so many revert operations on trunk?
> >
> > Folks,
> >
> > It is truly disappointing how we are escalating situations that can be
> > resolved through basic communication.
> >
> > Things that shouldn’t have happened
> > - After a few objections were raised, commits should have simply stopped
> > before restarting again but only after consensus
> > - Reverts (or revert and move to a feature-branch) shouldn’t have been
> > unequivocally done without dropping a note / informing everyone /
> building
> > consensus. And no, not even a release-manager gets this free pass. Not on
> > branch-2, not on trunk, not anywhere.
> > - Freaking out on -1’s and reverts - we as a community need to be less
> > stigmatic about -1s / reverts.
> >
> > Trunk releases:
> > This is the other important bit about huge difference of expectations
> > between the two sides w.r.t trunk and branching. Till now, we’ve never
> made
> > releases out of trunk. So in-progress features that people deemed to not
> > need a feature branch could go into trunk without much trouble. Given
> that
> > we are now making releases off trunk, I can see (a) the RM saying "no,
> > don’t put in-progress stuff and (b) the contributors saying “no we don’t
> > want the overhead of a branch”. I’ve raised related topics (but only
> > focusing on incompatible changes) before -
> > http://markmail.org/message/m6x73t6srlchywsn - but we never decided
> > anything.
>

Re: 2.7.3 release plan

2016-05-16 Thread larry mccay

Curious on the status of 2.7.3

It seems that we still have two outstanding critical/blocker JIRAs:

   1. [image: Bug] HADOOP-12893
   Verify LICENSE.txt and NOTICE.txt
   
   2. [image: Sub-task] HADOOP-13154
   S3AFileSystem
   printAmazonServiceException/printAmazonClientException appear copy & paste
   of AWS examples 


But 45-ish when we include Majors as well.

I know there are a number of critical issues with fixes that need to go out.

What is the plan?

On Tue, Apr 12, 2016 at 2:09 PM, Vinod Kumar Vavilapalli  wrote:

> Others and I committed a few, I pushed out a few.
>
> Down to just three now!
>
> +Vinod
>
> > On Apr 6, 2016, at 3:00 PM, Vinod Kumar Vavilapalli 
> wrote:
> >
> > Down to only 10 blocker / critical tickets (
> https://issues.apache.org/jira/issues/?filter=12335343 <
> https://issues.apache.org/jira/issues/?filter=12335343>) now!
> >
> > Thanks
> > +Vinod
> >
> >> On Mar 30, 2016, at 4:18 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org > wrote:
> >>
> >> Hi all,
> >>
> >> Got nudged about 2.7.3. Was previously waiting for 2.6.4 to go out
> (which did go out mid February). Got a little busy since.
> >>
> >> Following up the 2.7.2 maintenance release, we should work towards a
> 2.7.3. The focus obviously is to have blocker issues [1], bug-fixes and
> *no* features / improvements.
> >>
> >> I hope to cut an RC in a week - giving enough time for outstanding
> blocker / critical issues. Will start moving out any tickets that are not
> blockers and/or won’t fit the timeline - there are 3 blockers and 15
> critical tickets outstanding as of now.
> >>
> >> Thanks,
> >> +Vinod
> >>
> >> [1] 2.7.3 release blockers:
> https://issues.apache.org/jira/issues/?filter=12335343 <
> https://issues.apache.org/jira/issues/?filter=12335343>
> >
>
>

[jira] [Resolved] (HADOOP-12942) hadoop credential commands non-obviously use password of "none"

2016-05-16 Thread Larry McCay (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay resolved HADOOP-12942.
--
Resolution: Fixed

> hadoop credential commands non-obviously use password of "none"
> ---
>
> Key: HADOOP-12942
> URL: https://issues.apache.org/jira/browse/HADOOP-12942
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Mike Yoder
>Assignee: Mike Yoder
> Attachments: HADOOP-12942.001.patch, HADOOP-12942.002.patch, 
> HADOOP-12942.003.patch, HADOOP-12942.004.patch, HADOOP-12942.005.patch, 
> HADOOP-12942.006.patch, HADOOP-12942.007.patch, HADOOP-12942.008.patch
>
>
> The "hadoop credential create" command, when using a jceks provider, defaults 
> to using the value of "none" for the password that protects the jceks file.  
> This is not obvious in the command or in documentation - to users or to other 
> hadoop developers - and leads to jceks files that essentially are not 
> protected.
> In this example, I'm adding a credential entry with name of "foo" and a value 
> specified by the password entered:
> {noformat}
> # hadoop credential create foo -provider localjceks://file/bar.jceks
> Enter password: 
> Enter password again: 
> foo has been successfully created.
> org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider has been updated.
> {noformat}
> However, the password that protects the file bar.jceks is "none", and there 
> is no obvious way to change that. The practical way of supplying the password 
> at this time is something akin to
> {noformat}
> HADOOP_CREDSTORE_PASSWORD=credpass hadoop credential create --provider ...
> {noformat}
> That is, stuffing HADOOP_CREDSTORE_PASSWORD into the environment of the 
> command. 
> This is more than a documentation issue. I believe that the password ought to 
> be _required_.  We have three implementations at this point, the two 
> JavaKeystore ones and the UserCredential. The latter is "transient" which 
> does not make sense to use in this context. The former need some sort of 
> password, and it's relatively easy to envision that any non-transient 
> implementation would need a mechanism by which to protect the store that it's 
> creating.  
> The implementation gets interesting because the password in the 
> AbstractJavaKeyStoreProvider is determined in the constructor, and changing 
> it after the fact would get messy. So this probably means that the 
> CredentialProviderFactory should have another factory method like the first 
> that additionally takes the password, and an additional constructor exist in 
> all the implementations that takes the password. 
> Then we just ask for the password in getCredentialProvider() and that gets 
> passed down to via the factory to the implementation. The code does have 
> logic in the factory to try multiple providers, but I don't really see how 
> multiple providers would be rationaly be used in the command shell context.
> This issue was brought to light when a user stored credentials for a Sqoop 
> action in Oozie; upon trying to figure out where the password was coming from 
> we discovered it to be the default value of "none".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Reopened] (HADOOP-12942) hadoop credential commands non-obviously use password of "none"

2016-05-16 Thread Larry McCay (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay reopened HADOOP-12942:
--

> hadoop credential commands non-obviously use password of "none"
> ---
>
> Key: HADOOP-12942
> URL: https://issues.apache.org/jira/browse/HADOOP-12942
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Mike Yoder
>Assignee: Mike Yoder
> Attachments: HADOOP-12942.001.patch, HADOOP-12942.002.patch, 
> HADOOP-12942.003.patch, HADOOP-12942.004.patch, HADOOP-12942.005.patch, 
> HADOOP-12942.006.patch, HADOOP-12942.007.patch, HADOOP-12942.008.patch
>
>
> The "hadoop credential create" command, when using a jceks provider, defaults 
> to using the value of "none" for the password that protects the jceks file.  
> This is not obvious in the command or in documentation - to users or to other 
> hadoop developers - and leads to jceks files that essentially are not 
> protected.
> In this example, I'm adding a credential entry with name of "foo" and a value 
> specified by the password entered:
> {noformat}
> # hadoop credential create foo -provider localjceks://file/bar.jceks
> Enter password: 
> Enter password again: 
> foo has been successfully created.
> org.apache.hadoop.security.alias.LocalJavaKeyStoreProvider has been updated.
> {noformat}
> However, the password that protects the file bar.jceks is "none", and there 
> is no obvious way to change that. The practical way of supplying the password 
> at this time is something akin to
> {noformat}
> HADOOP_CREDSTORE_PASSWORD=credpass hadoop credential create --provider ...
> {noformat}
> That is, stuffing HADOOP_CREDSTORE_PASSWORD into the environment of the 
> command. 
> This is more than a documentation issue. I believe that the password ought to 
> be _required_.  We have three implementations at this point, the two 
> JavaKeystore ones and the UserCredential. The latter is "transient" which 
> does not make sense to use in this context. The former need some sort of 
> password, and it's relatively easy to envision that any non-transient 
> implementation would need a mechanism by which to protect the store that it's 
> creating.  
> The implementation gets interesting because the password in the 
> AbstractJavaKeyStoreProvider is determined in the constructor, and changing 
> it after the fact would get messy. So this probably means that the 
> CredentialProviderFactory should have another factory method like the first 
> that additionally takes the password, and an additional constructor exist in 
> all the implementations that takes the password. 
> Then we just ask for the password in getCredentialProvider() and that gets 
> passed down to via the factory to the implementation. The code does have 
> logic in the factory to try multiple providers, but I don't really see how 
> multiple providers would be rationaly be used in the command shell context.
> This issue was brought to light when a user stored credentials for a Sqoop 
> action in Oozie; upon trying to figure out where the password was coming from 
> we discovered it to be the default value of "none".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Guidance needed on HADOOP-13096 and HADOOP-13097

2016-05-06 Thread larry mccay

I agree with your rationale for not doing C now.
And those clean up tasks can more easily be discussed when separated from
this effort.


On Fri, May 6, 2016 at 3:11 PM, Allen Wittenauer <a...@apache.org> wrote:

> After thinking about it, I think you are correct here: I’m more
> inclined to do D w/follow-up JIRAs to fix this up. The hadoop and hdfs
> script functionality is being tested, so it isn’t like HADOOP-12930 is
> going in with zero unit tests. Never mind that large chunks of hadoop-tools
> gets modified to use this “for reals” as well. The yarn and mapred tests
> don’t really bring _that_ much to the table.
>
> I think the biggest worry about doing C inside the HADOOP-12930
> feature branch is that it seems like the wrong time/place to do it.  Making
> that big of a change to the build should probably be two separate,
> orthogonal JIRAs (one for YARN, one for MR) in their own right.  But I do
> think C is the correct, long-term path.  We should probably move hdfs and
> common scripts into separate dirs as well, honestly.
>
> Thanks for the feedback!
>
>
> > On May 5, 2016, at 7:22 PM, Larry McCay <lmc...@hortonworks.com> wrote:
> >
> > I would vote for C or D with a filed JIRA to clean up the maven
> structure as a separate effort.
> > Before moving to D, could you describe any reason to not go with C?
> >
> > On May 4, 2016, at 9:51 PM, Allen Wittenauer <a...@apache.org> wrote:
> >
> >>
> >>  When the sub-projects re-merged, maven work was done, whatever,
> the shell scripts for MR and YARN were placed (effectively) outside of the
> normal maven hierarchy.  In order to add unit tests to the shell scripts
> for these sub-projects, it means effectively turning
> hadoop-yarn-project/hadoop-yarn and hadoop-mapreduce-project into “real”
> modules so that mvn test works as expected.   Doing so will likely have
> some surprising consequences, such as anyone who modifies java code and the
> shell code in a patch will trigger _all_ of the unit tests in yarn.
> >>
> >>  I think we have four options:
> >>
> >> a) Continue forward turning these into real modules with src
> directories, etc and we live with the consequences
> >>
> >> b) Move the related bits into an existing module, making them similar
> to HDFS, common, tools
> >>
> >> c) Move the related bits into a new module, using the layout that maven
> really really wants
> >>
> >> d) Skip the unit tests; we don’t have them now
> >>
> >>  This is clearly more work than what I really wanted to cover in
> this branch, but given that there was a specific request to add unit test
> code for this functionality, I’m sort of stuck here.
> >>
> >>  Thoughts?
> >> -
> >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >>
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: Guidance needed on HADOOP-13096 and HADOOP-13097

2016-05-05 Thread Larry McCay

I would vote for C or D with a filed JIRA to clean up the maven structure as a 
separate effort.
Before moving to D, could you describe any reason to not go with C?

On May 4, 2016, at 9:51 PM, Allen Wittenauer  wrote:

> 
>   When the sub-projects re-merged, maven work was done, whatever, the 
> shell scripts for MR and YARN were placed (effectively) outside of the normal 
> maven hierarchy.  In order to add unit tests to the shell scripts for these 
> sub-projects, it means effectively turning hadoop-yarn-project/hadoop-yarn 
> and hadoop-mapreduce-project into “real” modules so that mvn test works as 
> expected.   Doing so will likely have some surprising consequences, such as 
> anyone who modifies java code and the shell code in a patch will trigger 
> _all_ of the unit tests in yarn.
> 
>   I think we have four options:
> 
> a) Continue forward turning these into real modules with src directories, etc 
> and we live with the consequences
> 
> b) Move the related bits into an existing module, making them similar to 
> HDFS, common, tools
> 
> c) Move the related bits into a new module, using the layout that maven 
> really really wants
> 
> d) Skip the unit tests; we don’t have them now
> 
>   This is clearly more work than what I really wanted to cover in this 
> branch, but given that there was a specific request to add unit test code for 
> this functionality, I’m sort of stuck here.
> 
>   Thoughts?
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> 
> 


-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Commit History Edit Alert

2016-04-22 Thread larry mccay

Thanks for the clarification, Andrew.
Yes, I'll add a comment about the results of my "testing". :)


On Fri, Apr 22, 2016 at 2:06 AM, Andrew Wang <andrew.w...@cloudera.com>
wrote:

> Squashing means force pushing, so please don't do that per ASF policies.
> The normal recommendation is just not to fix it, commit message typos
> aren't that a big deal. What I do is leave a comment on the JIRA to make it
> easier for people to track down the commit.
>
> I found INFRA-11136 where we supposedly protected trunk and also
> INFRA-11236 about getting this in place for the branch-Xs. Larry, could you
> update INFRA-11236 with your empirical testing? Would be good to get these
> branches protected again for the future.
>
> Thanks,
> Andrew
>
>
> On Thu, Apr 21, 2016 at 9:42 PM, larry mccay <lmc...@apache.org> wrote:
>
> > I believe that he squashed my attempted --amend into a single commit on
> > branch-2.8.
> > Not sure about trunk and branch-2.
> >
> > Thanks for the clarification on the formatting.
> > I will comply in the future.
> >
> > For such issues, is a dev@ email first better than trying to "fix" it?
> >
> > Again, sorry for the inconvenience.
> >
> > On Fri, Apr 22, 2016 at 12:10 AM, Andrew Wang <andrew.w...@cloudera.com>
> > wrote:
> >
> > > What does "fix" mean? We aren't supposed to force push to non-feature
> > > branches, and actually thought this was disabled.
> > >
> > > Also FYI for the future, we normally format our commit messages with
> > > periods, e.g.:
> > >
> > > HADOOP-13011. Clearly Document the Password Details for Keystore-based
> > > Credential Providers.
> > >
> > >
> > > On Thu, Apr 21, 2016 at 8:26 PM, larry mccay <lmc...@apache.org>
> wrote:
> > >
> > > > All -
> > > >
> > > > My first hadoop commit for HADOOP-13011 inadvertently referenced the
> > > wrong
> > > > JIRA (HADOOP-13001) in the commit message.
> > > >
> > > > Owen O'Malley helped me out by fixing the history on all 3 branches:
> > > trunk,
> > > > branch-2, branch-2.8. The message is correct now in the current
> history
> > > but
> > > > you may need to rebase to the current history for things to align
> > > properly.
> > > >
> > > > I apologize for the inconvenience.
> > > >
> > > > thanks,
> > > >
> > > > --larry
> > > >
> > >
> >
>

Commit History Edit Alert

2016-04-21 Thread larry mccay

All -

My first hadoop commit for HADOOP-13011 inadvertently referenced the wrong
JIRA (HADOOP-13001) in the commit message.

Owen O'Malley helped me out by fixing the history on all 3 branches: trunk,
branch-2, branch-2.8. The message is correct now in the current history but
you may need to rebase to the current history for things to align properly.

I apologize for the inconvenience.

thanks,

--larry

[jira] [Created] (HADOOP-13011) Clearly Document the Password Details for Keystore-based Credential Providers

2016-04-09 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-13011:


 Summary: Clearly Document the Password Details for Keystore-based 
Credential Providers
 Key: HADOOP-13011
 URL: https://issues.apache.org/jira/browse/HADOOP-13011
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: documentation
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.8.0


HADOOP-12942 discusses the unobviousness of the use of a default password for 
the keystores for keystore-based credential providers. This patch adds 
documentation to the CredentialProviderAPI.md that describes the different 
types of credential providers available and the password management details of 
the keystore-based ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-13008) Add XFS Filter for UIs to Hadoop Common

2016-04-08 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-13008:


 Summary: Add XFS Filter for UIs to Hadoop Common
 Key: HADOOP-13008
 URL: https://issues.apache.org/jira/browse/HADOOP-13008
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.8.0


CSRF prevention for REST APIs can be provided through a common servlet filter. 
This filter would check for the existence of an expected (configurable) HTTP 
header - such as X-XSRF-Header.

The fact that CSRF attacks are entirely browser based means that the above 
approach can ensure that requests are coming from either: applications served 
by the same origin as the REST API or that there is explicit policy 
configuration that allows the setting of a header on XmlHttpRequest from 
another origin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Branch policy question

2016-03-23 Thread larry mccay

Thanks for digging that up, Chris.
That is completely what I would have expected but began questioning it
given this thread.

I think that Allen's use of a feature branch for this effort makes sense
and that he should have the freedom to choose his commit policy for the
branch.
The tricky part will be getting the reviews at the end but I would imagine
that it can be managed with some documentation, code review, tests and
instructions.

On Wed, Mar 23, 2016 at 5:20 PM, Chris Nauroth <cnaur...@hortonworks.com>
wrote:

> It's interesting to go back to the change in bylaws in 2011 that
> introduced the requirement for 3 binding +1s on a branch merge [1].  The
> text of that resolution suggests that it's supportive of
> commit-then-review if that's what the developers on the branch want to do.
>
> "Branches' commit requirements are determined by the branch maintainer and
> in this situation are often set up as commit-then-review."
>
> It would also be very much against the spirit of that resolution to
> combine it all down into a single patch file and get a single +1.
>
> "As such, there is no way to guarantee that the entire change set offered
> for trunk merge has had a second pair of eyes on it.  Therefore, it is
> prudent to give that final merge heightened scrutiny, particularly since
> these branches often extensively affect critical parts of the system.
> Requiring three binding +1s does not slow down the branch development
> process, but does provide a better chance of catching bugs before they
> make their way to trunk."
>
> --Chris Nauroth
>
> [1] https://s.apache.org/iW1F
>
>
>
> On 3/23/16, 2:11 PM, "Steve Loughran" <ste...@hortonworks.com> wrote:
>
> >
> >> On 22 Mar 2016, at 18:23, Andrew Wang <andrew.w...@cloudera.com> wrote:
> >>
> >> A branch sounds fine, but how are we going to get 3 +1's to merge it? If
> >> it's hard to find one reviewer, seems even harder to find two.
> >
> >Given that only one +1 is needed to merge a non-branch patch, he could in
> >theory convert the entire branch into a single .patch for review. Not
> >that I'd encourage that, just observing that its possible
> >
> >
> >>
> >> On Tue, Mar 22, 2016 at 10:56 AM, Allen Wittenauer <
> >> allenwittena...@yahoo.com.invalid> wrote:
> >>
> >>>
> >>>> On Mar 22, 2016, at 10:49 AM, larry mccay <larry.mc...@gmail.com>
> >>>>wrote:
> >>>>
> >>>> That sounds like a reasonable approach and valid use of branches to
> >>>>me.
> >>>>
> >>>> Perhaps a set of functional tests could be provided/identified that
> >>>>would
> >>>> help the review process by showing backward compatibility along with
> >>>>new
> >>>> extensions for things like dynamic commands?
> >>>>
> >>>
> >>>This is going into trunk, so no need for backward compatibility.
> >>>
> >
> >
>
>

Re: Branch policy question

2016-03-22 Thread larry mccay

Just to clarify, we are talking about a feature branch in which Allen and
others that are working on the branch could commit without requiring 3 +1’s.
Then, at some point, we will need 3 +1’s to merge the branch to trunk.
Correct?

I think that if we have a set of usecases that are being added and those
that are being reworked with test coverage for them all - maybe including
manual testing scenarios - that we could find folks that would be willing
to work with Allen to get it reviewed. I am listed as a branch committer.
If my vote would count then I would be more than willing to lend a hand
where I can - especially if we have the tests/instructions for testing.

I think that we need to strike a balance of not inhibiting progress and at
the same time not getting a patch so large that it can’t be reviewed.
Colin's point is valid and a design document that calls out the areas that
will need testing and general approach would help review volunteers.

There is always a risk with branches that the change gets to a level of
complexity that it is too hard to reasonably review. I think that the
community should allow folks to take that risk and try and provide enough
guidance and transparency for review. Any non-trivial branch effort should
have a merge strategy for getting it reviewed, tested and merged.

On Tue, Mar 22, 2016 at 9:46 PM, Gangumalla, Uma 
wrote:

> > is it possible for me to setup a branch, self review+commit to that
> >branch, then request a branch merge?
> Basically this is something like Commit-Then-Review(here review later)
> process right. I have not seen we followed this approach here( not sure if
> I missed some branches followed that). Even though original author code
> quality is good, there would always be chances for missing somethings. So,
> peer review is important because the other eye can catch some issues which
> original author might overlooked (general review advantage :-)). In this
> case for branch merge we need three  +1s. if we face difficult in getting
> one +1, then I am afraid that we may face more difficult to get reviewers
> a the time of merge, because code base much larger than normal patch. Some
> times we suggest contributors to split patches into multiple JIRAs of
> patch size is becoming larger. It is better to find some reviewers for the
> branch and creating branch could turn into healthy merge later.
>
> Colin suggestion sounds good to me. How about providing more details and
> find some reviewers (who is more familiar in that area etc)?
>
> If this is general question question for branch policy MY answer is ³no²
> for "self review+commit to that branch, then request a branch merge². But
> for this kind of special cases where we are sure that we may not have
> enough reviewers for branch, having dev mailing discussion about that
> JIRA/branch proposal and see how to go about that changes may be good idea
> instead of going ahead finishing work and raising merge vote thread ?
> (Something like this what you did now :-))
>
> Just my thoughts on this discussion.
>
> Thanks & Regards,
> Uma
>
> On 3/22/16, 9:14 AM, "Allen Wittenauer"  wrote:
>
> >Since it¹s nearly impossible for me to get timely reviews for some build
> >and script changes, is it possible for me to setup a branch, self
> >review+commit to that branch, then request a branch merge?
>
>

Re: Branch policy question

2016-03-22 Thread larry mccay

That sounds like a reasonable approach and valid use of branches to me.

Perhaps a set of functional tests could be provided/identified that would
help the review process by showing backward compatibility along with new
extensions for things like dynamic commands?

On Tue, Mar 22, 2016 at 12:14 PM, Allen Wittenauer  wrote:

>
> Since it’s nearly impossible for me to get timely reviews for some
> build and script changes, is it possible for me to setup a branch, self
> review+commit to that branch, then request a branch merge?  I’m basically
> looking at doing this for HADOOP-12857 + HADOOP-12930 and their subtasks
> esp since they are very order dependent.  I don’t know if a branch is
> easier to review, honestly, but at least I wouldn’t be blocked on making
> progress...
>
> Thanks.

[jira] [Created] (HADOOP-12929) JWTRedirectAuthenticationHandler must accommodate null expiration time

2016-03-19 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12929:


 Summary: JWTRedirectAuthenticationHandler must accommodate null 
expiration time
 Key: HADOOP-12929
 URL: https://issues.apache.org/jira/browse/HADOOP-12929
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


The underlying JWT token within the hadoop-jwt cookie should be able to have no 
expiration time. This allows the token lifecycle to be the same as the cookie 
that contains it.

Current validation processing of the token interprets the absence of an 
expiration time as requiring a new token to be acquired. JWT itself considers 
the exp to be an optional claim. As such, this patch will change the processing 
to accept a null expiration as valid for as long as the cookie is presented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12851) S3AFileSystem Uptake of ProviderUtils.excludeIncompatibleCredentialProviders

2016-02-28 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12851:


 Summary: S3AFileSystem Uptake of 
ProviderUtils.excludeIncompatibleCredentialProviders
 Key: HADOOP-12851
 URL: https://issues.apache.org/jira/browse/HADOOP-12851
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Larry McCay
Assignee: Larry McCay


HADOOP-12846 introduced the ability for FileSystem based integration points of 
credential providers to eliminate the threat of a recursive infinite loop due 
to a provider in the same filesystem being configured.

It was WASB has already uptaken its use in HADOOP-12846 and this patch will add 
it to the S3A integration point as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12846) Credential Provider Recursive Dependencies

2016-02-26 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12846:


 Summary: Credential Provider Recursive Dependencies
 Key: HADOOP-12846
 URL: https://issues.apache.org/jira/browse/HADOOP-12846
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay


There are a few credential provider integration points in which the use of a 
certain type of provider in a certain filesystem causes a recursive infinite 
loop. 

For instance, a component such as sqoop can be protecting a db password in a 
credential provider within the wasb/azure filesystem. Now that HADOOP-12555 has 
introduced the ability to protect the access keys for wasb we suddenly need 
access to wasb to get the database keys which initiates the attempt to get the 
access keys from wasb - since there is a provider path configured for sqoop.

For such integrations, those in which it doesn't make sense to protect the 
access keys inside the thing that we need the keys to access, we need a 
solution to avoid this recursion - other than dictating what filesystems can be 
used by other components.

This patch proposes the ability to scrub the configured provider path of any 
provider types that would be incompatible with the integration point. In other 
words, before calling Configuration.getPassword for the access keys to wasb, we 
can remove any configured providers that require access to wasb.

This will require some regex expressions that can be used to identify the 
configuration of such provider uri's within the provider path parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Introduce Apache Kerby to Hadoop

2016-02-22 Thread larry mccay

Replacing MiniKDC with kerby certainly makes sense.

Kerby-izing Hadoop 3 needs to be defined carefully.
As much as a JWT proponent that I am, I don't know that that taking up
non-standard features such as the JWT token would necessarily serve us well.
If we are talking about client side only uptake in Hadoop 3 as a better
diagnosable client library that completely makes sense.

Better algorithms and APIs would require server side compliance as well -
no?
These decisions would need to align deployment usecases that want to go
directly to AD/MIT.
Perhaps, it just means careful configuration of algorithms to match the
server side in those cases.

+1 on the baby step of replacing MiniKDC - as this is really just alignment
with the directory project roadmap anyway.

On Mon, Feb 22, 2016 at 5:51 AM, Steve Loughran 
wrote:

>
>
> I've discussed this offline with Kai, as part of the "let's fix kerberos"
> project. Not only is it a better Kerberos engine, we can do more
> diagnostics, get better algorithms and ultimately get better APIs for doing
> Kerberos and SASL —the latter would dramatically reduce the cost of
> wire-encrypting IPC.
>
> For now, I'd like to see basic steps -upgrading minkdc to krypto, see how
> it works.
>
> Long term, I'd like Hadoop 3 to be Kerby-ized
>
>
> > On 22 Feb 2016, at 06:41, Zheng, Kai  wrote:
> >
> > Hi folks,
> >
> > I'd like to mention Apache Kerby [1] here to the community and propose
> to introduce the project to Hadoop, a sub project of Apache Directory
> project.
> >
> > Apache Kerby is a Kerberos centric project and aims to provide a first
> Java Kerberos library that contains both client and server supports. The
> relevant features include:
> > It supports full Kerberos encryption types aligned with both MIT KDC and
> MS AD;
> > Client APIs to allow to login via password, credential cache, keytab
> file and etc.;
> > Utilities for generate, operate and inspect keytab and credential cache
> files;
> > A simple KDC server that borrows some ideas from Hadoop-MiniKDC and can
> be used in tests but with minimal overhead in external dependencies;
> > A brand new token mechanism is provided, can be experimentally used,
> using it a JWT token can be used to exchange a TGT or service ticket;
> > Anonymous PKINIT support, can be experientially used, as the first Java
> library that supports the Kerberos major extension.
> >
> > The project stands alone and is ensured to only depend on JRE for easier
> usage. It has made the first release (1.0.0-RC1) and 2nd release (RC2) is
> upcoming.
> >
> >
> > As an initial step, this proposal suggests using Apache Kerby to upgrade
> the existing codes related to ApacheDS for the Kerberos support. The
> advantageous:
> >
> > 1. The kerby-kerb library is all the need, which is purely in Java,
> SLF4J is the only dependency, the whole is rather small;
> >
> > 2. There is a SimpleKDC in the library for test usage, which borrowed
> the MiniKDC idea and implemented all the support existing in MiniKDC. We
> had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works fine;
> >
> > 3. Full Kerberos encryption types (many of them are not available in JRE
> but supported by major Kerberos vendors) and more functionalities like
> credential cache support;
> >
> > 4. Perhaps the most concerned, Hadoop MiniKDC and etc. depend on the old
> Kerberos implementation in Directory Server project, but the implementation
> is stopped being maintained. Directory project has a plan to replace the
> implementation using Kerby. MiniKDC can use Kerby directly to simplify the
> deps;
> >
> > 5. Extensively tested with all kinds of unit tests, already being used
> for some time (like PSU), even in production environment;
> >
> > 6. Actively developed, and can be fixed and released in time if
> necessary, separately and independently from other components in Apache
> Directory project. By actively developing Apache Kerby and now applying it
> to Hadoop, our side wish to make the Kerberos deploying, troubleshooting
> and further enhancement can  be much easier and thereafter possible.
> >
> >
> >
> > Wish this is a good beginning, and eventually Apache Kerby can benefit
> other projects in the ecosystem as well.
> >
> >
> >
> > This Kerberos related work is actually a long time effort led by Weihua
> Jiang in Intel, and had been kindly encouraged by Andrew Purtell, Steve
> Loughran, Gangumalla Uma, Andrew Wang and etc., thanks a lot for their
> great discussions and inputs in the past.
> >
> >
> >
> > Your feedback is very welcome. Thanks in advance.
> >
> >
> >
> > [1] https://github.com/apache/directory-kerby
> >
> >
> >
> > Regards,
> >
> > Kai
>
>

[jira] [Created] (HADOOP-12804) Read Proxy Password from Credential Providers in S3 FileSystem

2016-02-13 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12804:


 Summary: Read Proxy Password from Credential Providers in S3 
FileSystem
 Key: HADOOP-12804
 URL: https://issues.apache.org/jira/browse/HADOOP-12804
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Larry McCay
Assignee: Larry McCay


HADOOP-12548 added credential provider support for the AWS credentials to 
S3FileSystem. This JIRA is for considering the use of the credential providers 
for the proxy password as well.

Instead of adding the proxy password to the config file directly and in clear 
text, we could provision it in addition to the AWS credentials into a 
credential provider and keep it out of clear text.

In terms of usage, it could be added to the same credential store as the AWS 
credentials or potentially to a more universally available path - since it is 
the same for everyone. This would however require multiple providers to be 
configured in the provider.path property and more open file permissions on the 
store itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12691) Add CSRF Filter to Hadoop Common

2016-01-06 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12691:


 Summary: Add CSRF Filter to Hadoop Common
 Key: HADOOP-12691
 URL: https://issues.apache.org/jira/browse/HADOOP-12691
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0


CSRF prevention for REST APIs can be provided through a common servlet filter. 
This filter would check for the existence of an expected (configurable) HTTP 
header - such as X-Requested-By.

The fact that CSRF attacks are entirely browser based means that the above 
approach can ensure that requests are coming from either: applications served 
by the same origin as the REST API or that there is explicit policy 
configuration that allows the setting of a header on XmlHttpRequest from 
another origin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12481) JWTRedirectAuthenticationHandler doesn't Retain Original Query String

2015-10-15 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12481:


 Summary: JWTRedirectAuthenticationHandler doesn't Retain Original 
Query String
 Key: HADOOP-12481
 URL: https://issues.apache.org/jira/browse/HADOOP-12481
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 2.8.0


Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs.

The actual authentication is done by some external service that the handler 
will redirect to when there is no hadoop.auth cookie and no JWT token found in 
the incoming request.

Using JWT provides a number of benefits:

* It is not tied to any specific authentication mechanism - so buys us many SSO 
integrations
* It is cryptographically verifiable for determining whether it can be trusted
* Checking for expiration allows for a limited lifetime and window for 
compromised use

This will introduce the use of nimbus-jose-jwt library for processing, 
validating and parsing JWT tokens.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: hadoop-hdfs-client splitoff is going to break code

2015-10-14 Thread larry mccay

Interesting...

As long as #2 provides full backward compatibility and the ability to
explicitly exclude the server dependencies that seems the best way to go.
That would get my non-binding +1.
:)

Perhaps we could add another artifact called hadoop-thin-client that would
not be backward compatible at some point?

On Wed, Oct 14, 2015 at 1:36 PM, Steve Loughran 
wrote:

> just an FYI, the split off of hadoop hdfs into client and server is going
> to break things.
>
> I know that, as my code is broken; DFSConfigKeys off the path,
> HdfsConfiguration, the class I've been loading to force pickup of
> hdfs-site.xml -all missing.
>
> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
> HDFS constants, HDFS-6566 covering the issue of making this public,
> something that's been sitting around for a year.
>
> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>
> Any application which used stuff which has now been declared server-side
> isn't going to compile any more, which does appear to break the
> compatibility guidelines we've adopted, specifically "The hadoop-client
> artifact (maven groupId:artifactId) stays compatible within a major release"
>
>
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>
>
> We need to do one of
>
> 1. agree that this change, is considered acceptable according to policy,
> and mark it as incompatible in hdfs/CHANGES.TXT
> 2. Change the POMs to add both hdfs-client and -hdfs server in
> hadoop-client -with downstream users free to exclude the server code
>
> We unintentionally caused similar grief with the move of the s3n clients
> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
> This time we know the problems going to arise, so lets explicitly make a
> decision this time, and share it with our users.
>
> -steve
>

[jira] [Created] (HADOOP-12076) Incomplete Cache Mechanism in CredentialProvider API

2015-06-08 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-12076:


 Summary: Incomplete Cache Mechanism in CredentialProvider API
 Key: HADOOP-12076
 URL: https://issues.apache.org/jira/browse/HADOOP-12076
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


The AbstractJavaKeyStoreProvider class in the CredentialProvider API has a 
cache member variable and interrogation of it during access but does not 
populate it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: 2.7.1 status

2015-05-26 Thread larry mccay

Hi Vinod -

I think that https://issues.apache.org/jira/browse/HADOOP-11934 should also
be added to the blocker list.
This is a critical bug in our ability to protect the LDAP connection
password in LdapGroupsMapper.

thanks!

--larry

On Tue, May 26, 2015 at 3:32 PM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Tx for reporting this, Elliot.

 Made it a blocker, not with a deeper understanding of the problem. Can you
 please chime in with your opinion and perhaps code reviews?

 Thanks
 +Vinod

 On May 26, 2015, at 10:48 AM, Elliott Clark ecl...@apache.org wrote:

  HADOOP-12001 should probably be added to the blocker list since it's a
  regression that can keep ldap from working.

[jira] [Created] (HADOOP-11717) Add Redirecting WebSSO behavior with JWT Token in Hadoop Auth

2015-03-14 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-11717:


 Summary: Add Redirecting WebSSO behavior with JWT Token in Hadoop 
Auth
 Key: HADOOP-11717
 URL: https://issues.apache.org/jira/browse/HADOOP-11717
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


Extend AltKerberosAuthenticationHandler to provide WebSSO flow for UIs.

The actual authentication is done by some external service that the handler 
will redirect to when there is no hadoop.auth cookie and no JWT token found in 
the incoming request.

Using JWT provides a number of benefits:

* It is not tied to any specific authentication mechanism - so buys us many SSO 
integrations
* It is cryptographically verifiable for determining whether it can be trusted
* Checking for expiration allows for a limited lifetime and window for 
compromised use

This will introduce the use of nimbus-jose-jwt library for processing, 
validating and parsing JWT tokens.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11265) Credential and Key Shell Commands not available on Windows

2014-11-04 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-11265:


 Summary: Credential and Key Shell Commands not available on Windows
 Key: HADOOP-11265
 URL: https://issues.apache.org/jira/browse/HADOOP-11265
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Larry McCay
Assignee: Larry McCay
 Fix For: 3.0.0


Must add the credential and key commands to the hadoop.cmd file for windows 
environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-10904) Provide Alt to Clear Text Passwords through Cred Provider API

2014-10-20 Thread Larry McCay (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay resolved HADOOP-10904.
--
Resolution: Fixed

 Provide Alt to Clear Text Passwords through Cred Provider API
 -

 Key: HADOOP-10904
 URL: https://issues.apache.org/jira/browse/HADOOP-10904
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay

 This is an umbrella jira to track various child tasks to uptake the 
 credential provider API to enable deployments without storing 
 passwords/credentials in clear text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-11200) HttpFS proxyuser, doAs param is case sensitive

2014-10-14 Thread Larry McCay (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay resolved HADOOP-11200.
--
Resolution: Duplicate

Didn't realize that HADOOP-11083 addressed this for HttpFS. Closing as a 
duplicate.

 HttpFS proxyuser, doAs param is case sensitive
 --

 Key: HADOOP-11200
 URL: https://issues.apache.org/jira/browse/HADOOP-11200
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
Reporter: Larry McCay
 Fix For: 2.6.0


 It appears that the doAs processing in HttpFS for proxyusers is case 
 sensitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11200) HttpFS proxyuser, doAs param is case sensitive

2014-10-13 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-11200:


 Summary: HttpFS proxyuser, doAs param is case sensitive
 Key: HADOOP-11200
 URL: https://issues.apache.org/jira/browse/HADOOP-11200
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
Reporter: Larry McCay
Assignee: Alejandro Abdelnur
 Fix For: 2.6.0


In HADOOP-10835 I've overlooked that the {{doAs}} parameter was been handled as 
case insensitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11031) Design Document for Credential Provider API

2014-08-29 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-11031:


 Summary: Design Document for Credential Provider API
 Key: HADOOP-11031
 URL: https://issues.apache.org/jira/browse/HADOOP-11031
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Larry McCay


Provide detailed overview of the design, intent and use of the credential 
management API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10933) FileBasedKeyStoresFactory Should use Configuration.getPassword for SSL Passwords

2014-08-03 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10933:


 Summary: FileBasedKeyStoresFactory Should use 
Configuration.getPassword for SSL Passwords
 Key: HADOOP-10933
 URL: https://issues.apache.org/jira/browse/HADOOP-10933
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


As part of HADOOP-10904, this jira represents the ability to leverage the 
credential provider API when clear text passwords on disk are unacceptable. By 
using the Configuration.getPassword method, the credential provider API may be 
used while maintaining backward compatibility for passwords stored in 
config/files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10929) Typo in Configuration.getPasswordFromCredentialProviders

2014-08-02 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10929:


 Summary: Typo in Configuration.getPasswordFromCredentialProviders
 Key: HADOOP-10929
 URL: https://issues.apache.org/jira/browse/HADOOP-10929
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Priority: Trivial
 Attachments: HADOOP-10929.patch

Transposed letters in getPasswordFromCredenitalProviders need to be fixed to be 
Credential.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10904) Provider Alt to Clear Text Passwords through Cred Provider API

2014-07-30 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10904:


 Summary: Provider Alt to Clear Text Passwords through Cred 
Provider API
 Key: HADOOP-10904
 URL: https://issues.apache.org/jira/browse/HADOOP-10904
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


This is an umbrella jira to track various child tasks to uptake the credential 
provider API to enable deployments without storing passwords/credentials in 
clear text.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HADOOP-9534) Credential Management Framework (CMF)

2014-05-23 Thread Larry McCay (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Larry McCay resolved HADOOP-9534.
-

Resolution: Duplicate

This jira has been superseded by HADOOP-10141 and HADOOP-10607. All related
work will be done there.

Credential Management Framework (CMF)
-

Key: HADOOP-9534
URL: https://issues.apache.org/jira/browse/HADOOP-9534
Project: Hadoop Common
Issue Type: New Feature
Components: security
Affects Versions: 3.0.0
Reporter: Larry McCay
Assignee: Larry McCay
Labels: patch
Attachments:
0001-HADOOP-9534-Credential-Management-Framework-initial-.patch,
0002-HADOOP-9534-Credential-Management-Framework-second-iteration-.patch,
CMF-overview.txt, HADOOP-9534.patch, HADOOP-9534.patch, HADOOP-9534.patch,
HADOOP-9534.patch

Original Estimate: 504h
Remaining Estimate: 504h

The credential management framework consists of library for securing,
acquiring and rolling credentials for a given Hadoop service.
Specifically the library will provide:
1. Password Indirection or Aliasing
2. Management of identity and trust keystores
3. Rolling of key pairs and credentials
4. Discovery of externally provisioned credentials
5. Service specific CMF secret protection
6. Syntax for Aliases within configuration files
Password Indirection or Aliasing:
By providing alias based access to actual secrets stored within a service
specific JCEKS keystore, we are able to eliminate the need for any secret to
be stored in clear text on the filesystem. This is a current redflag during
security reviews for many customers.
Management of Identity and Trust Keystores:
Service specific identity and trust keystores will be managed by a
combination of the HSSO service and CMF.
Upon registration with the HSSO service a dependent service will be able
discover externally provisioned keystores or have them created by the HSSO
service on its behalf. The public key of the HSSO service will be provided to
the service to be imported into its service specific trust store.
Service specific keystores and credential stores will be protected with the
service specific CMF secret.
Rolling of Keypairs and Credentials:
The ability to automate the rolling of PKI keypairs and credentials provide
the services a common facility for discovering new HSSO public keys and the
need and means to roll their own credentials while being able to retain a
number of previous values (as needed).
Discovery of Externally Provisioned Credentials:
For environments that want control over the certificate generation and
provisioning, CMF provides the ability to discover preprovisioned artifacts
based on naming conventions of the artifacts and the use of the service
specific CMF secret to access the credentials within the keystores.
Service Specific CMF Secret Protection:
By providing a common facility to prompt for and optionally persist a service
specific CMF secret at service installation/startup, we enable the ability to
protect all the service specific security artifacts with this protected
secret. It is protected with a combination of AES 128 bit encryption and file
permissions set for only the service specific OS user.
Syntax for Aliases within configuration files:
In order to facilitate the use of aliases but also preserve backward
compatibility of config files, we will introduce a syntax for marking a value
in a configuration file as an alias. A getSecret(String value) type utility
method will encapsulate the recognition and parsing of an alias and the
retrieval from CMF or return the provided value as the password.
For instance, if a properties file were to require a password to be provided
instead of:
passwd=supersecret
we would provide an alias as such:
passwd=${ALIAS=supersecret}
At runtime, the value from the properties file is provided to the
CMF.getSecret(value) method and it either resolves the alias (where it finds
the alias syntax) or returns the value (when there is no alias syntax).

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10607) Create an API to separate Credentials/Password Storage from Applications

2014-05-15 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10607:


 Summary: Create an API to separate Credentials/Password Storage 
from Applications
 Key: HADOOP-10607
 URL: https://issues.apache.org/jira/browse/HADOOP-10607
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Owen O'Malley
 Fix For: 3.0.0


As with the filesystem API, we need to provide a generic mechanism to support 
multiple key storage mechanisms that are potentially from third parties. 

An additional requirement for long term data lakes is to keep multiple versions 
of each key so that keys can be rolled periodically without requiring the 
entire data set to be re-written. Rolling keys provides containment in the 
event of keys being leaked.

Toward that end, I propose an API that is configured using a list of URLs of 
KeyProviders. The implementation will look for implementations using the 
ServiceLoader interface and thus support third party libraries.

Two providers will be included in this patch. One using the credentials cache 
in MapReduce jobs and the other using Java KeyStores from either HDFS or local 
file system. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10491) Add Collection of Labels to KeyProvider API

2014-04-11 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10491:


 Summary: Add Collection of Labels to KeyProvider API
 Key: HADOOP-10491
 URL: https://issues.apache.org/jira/browse/HADOOP-10491
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


A set of arbitrary labels would provide opportunity for interesting access 
policy decisions based on things like classification, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10342) Extend UserGroupInformation to return a UGI given a preauthenticated kerberos Subject

2014-02-12 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10342:


 Summary: Extend UserGroupInformation to return a UGI given a 
preauthenticated kerberos Subject
 Key: HADOOP-10342
 URL: https://issues.apache.org/jira/browse/HADOOP-10342
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


We need the ability to use a Subject that was created inside an embedding 
application through a kerberos authentication. For example, an application that 
uses JAAS to authenticate to a KDC should be able to provide the resulting 
Subject and get a UGI instance to call doAs on.

Example: 
{code}
UserGroupInformation.setConfiguration(conf);

LoginContext context = new 
LoginContext(com.sun.security.jgss.login, new 
UserNamePasswordCallbackHandler(userName, password));
context.login();

Subject subject = context.getSubject();

final UserGroupInformation ugi2 = 
UserGroupInformation.getUGIFromSubject(subject);

ugi2.doAs(new PrivilegedExceptionActionObject() {
@Override
public Object run() throws Exception {
final FileSystem fs = FileSystem.get(conf);
int i=0;

for (FileStatus status : fs.listStatus(new Path(/user))) {
System.out.println(status.getPath());
System.out.println(status);
if (i++  10) {
System.out.println(only first 10 showed...);
break;
}
}
return null;
}
});
{code}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HADOOP-10244) TestKeyShell improperly tests the results of a Delete

2014-01-20 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10244:


 Summary: TestKeyShell improperly tests the results of a Delete
 Key: HADOOP-10244
 URL: https://issues.apache.org/jira/browse/HADOOP-10244
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


The TestKeyShell.testKeySuccessfulKeyLifecycle test is supposed to ensure that 
the deleted key is no longer in the results of a subsequent delete command. 
Mistakenly, it is testing that it is STILL there.

The delete command is actually working but the stdout capture should be reset 
instead of flushed. Therefore, the test is picking up the existence of the key 
name from the deletion message in the previous command.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HADOOP-10237) JavaKeyStoreProvider needs to set keystore permissions properly

2014-01-16 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10237:


 Summary: JavaKeyStoreProvider needs to set keystore permissions 
properly
 Key: HADOOP-10237
 URL: https://issues.apache.org/jira/browse/HADOOP-10237
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


In order protect access to the created keystores permissions should initially 
be set to 700 by the JavaKeyStoreProvider. Subsequent permission changes can 
then be done using FS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HADOOP-10238) Decouple the Creation of Key metadata from the creation of a key version

2014-01-16 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10238:


 Summary: Decouple the Creation of Key metadata from the creation 
of a key version
 Key: HADOOP-10238
 URL: https://issues.apache.org/jira/browse/HADOOP-10238
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


Currently, KeyProvider createKey establishes a new key metadata and an initial 
version of the key. This should be separated such that a key is created and an 
initial version is then realized through the KeyProvider rollNewVersion method.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HADOOP-10224) JavaKeyStoreProvider has to protect against corrupting underlying store

2014-01-10 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10224:


 Summary: JavaKeyStoreProvider has to protect against corrupting 
underlying store
 Key: HADOOP-10224
 URL: https://issues.apache.org/jira/browse/HADOOP-10224
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


Java keystores get corrupted at times. A key management operation that writes 
the store to disk could cause a corruption and all protected data would then be 
unaccessible.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HADOOP-10201) Add Listing Support to Key Management APIs

2014-01-02 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-10201:


 Summary: Add Listing Support to Key Management APIs
 Key: HADOOP-10201
 URL: https://issues.apache.org/jira/browse/HADOOP-10201
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Reporter: Larry McCay
Assignee: Larry McCay


Extend the key management APIs from HADOOP-10141 to include the ability to list 
the available keys.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-09-04 Thread Larry McCay

Chris -

I am curious whether there are any guidelines for feature branch use.

The general goals should be to:
* keep branches as small and as easily reviewable as possible for a given
feature
* decouple the pluggable framework from any specific central server
implementation
* scope specific content into iterations that can be merged into trunk on
their own and then development continued in new branches for the next
iteration

So, I guess the questions that immediately come to mind are:
1. Is there a document that describes the best way to do this?
2. How best do we leverage code being done in one feature branch within
another?

Thanks!

--larry



On Tue, Sep 3, 2013 at 10:00 PM, Zheng, Kai kai.zh...@intel.com wrote:

 This looks good and reasonable to me. Thanks Chris.

 -Original Message-
 From: Chris Douglas [mailto:cdoug...@apache.org]
 Sent: Wednesday, September 04, 2013 6:45 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components

 On Tue, Sep 3, 2013 at 5:20 AM, Larry McCay lmc...@hortonworks.com
 wrote:
  One outstanding question for me - how do we go about getting the
  branches created?

 Once a group has converged on a purpose- ideally with some initial code
 from JIRA- please go ahead and create the feature branch in svn.
 There's no ceremony. -C

  On Tue, Aug 6, 2013 at 6:22 PM, Chris Nauroth cnaur...@hortonworks.com
 wrote:
 
  Near the bottom of the bylaws, it states that addition of a New
  Branch Committer requires Lazy consensus of active PMC members.  I
  think this means that you'll need to get a PMC member to sponsor the
 vote for you.
   Regular committer votes happen on the private PMC mailing list, and
  I assume it would be the same for a branch committer vote.
 
  http://hadoop.apache.org/bylaws.html
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Tue, Aug 6, 2013 at 2:48 PM, Larry McCay lmc...@hortonworks.com
  wrote:
 
   That sounds perfect!
   I have been thinking of late that we would maybe need an incubator
  project
   or something for this - which would be unfortunate.
  
   This would allow us to move much more quickly with a set of patches
  broken
   up into consumable/understandable chunks that are made functional
   more easily within the branch.
   I assume that we need to start a separate thread for DISCUSS or
   VOTE to start that process - correct?
  
   On Aug 6, 2013, at 4:15 PM, Alejandro Abdelnur t...@cloudera.com
  wrote:
  
yep, that is what I meant. Thanks Chris
   
   
On Tue, Aug 6, 2013 at 1:12 PM, Chris Nauroth 
  cnaur...@hortonworks.com
   wrote:
   
Perhaps this is also a good opportunity to try out the new
branch committers clause in the bylaws, enabling
non-committers who are
   working
on this to commit to the feature branch.
   
   
   
  
  http://mail-archives.apache.org/mod_mbox/hadoop-general/201308.mbox/%
  3CCACO5Y4we4d8knB_xU3a=hr2gbeqo5m3vau+inba0li1i9e2...@mail.gmail.com%
  3E
   
Chris Nauroth
Hortonworks
http://hortonworks.com/
   
   
   
On Tue, Aug 6, 2013 at 1:04 PM, Alejandro Abdelnur
t...@cloudera.com
wrote:
   
Larry,
   
Sorry for the delay answering. Thanks for laying down things,
yes, it
makes
sense.
   
Given the large scope of the changes, number of JIRAs and
number of developers involved, wouldn't make sense to create a
feature branch
  for
all
this work not to destabilize (more ;) trunk?
   
Thanks again.
   
   
On Tue, Jul 30, 2013 at 9:43 AM, Larry McCay
lmc...@hortonworks.com
  
wrote:
   
The following JIRA was filed to provide a token and basic
authority implementation for this effort:
https://issues.apache.org/jira/browse/HADOOP-9781
   
I have attached an initial patch though have yet to submit it
as one
since
it is dependent on the patch for CMF that was posted to:
https://issues.apache.org/jira/browse/HADOOP-9534
and this patch still has a couple outstanding issues - javac
  warnings
for
com.sun classes for certification generation and 11 javadoc
  warnings.
   
Please feel free to review the patches and raise any questions
or
concerns
related to them.
   
On Jul 26, 2013, at 8:59 PM, Larry McCay
lmc...@hortonworks.com
wrote:
   
Hello All -
   
In an effort to scope an initial iteration that provides
value to
  the
community while focusing on the pluggable authentication
aspects,
  I've
written a description for Iteration 1. It identifies the
goal of
  the
iteration, the endstate and a set of initial usecases. It also
enumerates
the components that are required for each usecase. There is a
scope
section
that details specific things that should be kept out of the
first iteration. This is certainly up for discussion. There
may be some of
these
things that can be contributed in short order. If we can

[DISCUSS] Security Efforts and Branching

2013-09-04 Thread larry mccay

Hello Kai, Jerry and common-dev'ers -

I would like to try and get a game plan together for how we go about
getting some of these larger security changes into branches that are
manageable, reviewable and ultimately mergeable in a timely manner.

In order to even start this discussion, I think we need an inventory of the
high level projects that are underway in parallel. We can then identify
those that are at the point where patches can be used to seed a branch.
This will give us some insight into how to break it into phases.

Off the top of my head, I can think of the following high level efforts:

1. Pluggable Authentication and Token based SSO
2. CryptoFS for volume level encryption
3. Hive Table/Column Level Encryption (admittedly this is Hive work but it
will leverage common work done in Hadoop)
4. Authorization

Now, #1 and #2 above have related Jiras and a number of patches available
and are therefore early contenders for branching.

#1 has a draft for an initial iteration that was discussed in another
thread and I will attach a pdf version of the iteration-1 proposal to this
mail.

I propose that we converge on an initial plan based on further discussion
of the attached iteration and file a Jira to represent that iteration. We
can then break down the larger patches on existing Jiras to fit into the
constrained scope of the agreed upon iteration and attach them to subtasks
of the iteration Jira.

We can then seed a Pluggable Authentication and Token based SSO branch with
those related patches from H-9392, H-9534, H-9781.

Now, whether we introduce a whole central sso service in that branch is up
for discussion but I personally think that it will violate the keeping it
small and manageable goal. I am wondering whether a branch for security
services would do well to decouple the consumers from a specific
implementation that happens to be remote. Then within the Pluggable
Authentication branch - we can concentrate on the consumer level and local
implementations.

I assume that the CryptoFS work is also intended to be done within the
branches and we have to therefore consider how to leverage common code for
things like key access for encryption/decryption and signing/verifying.
This sort of thing is being introduced by H-9534 as part of the Pluggable
Authentication branch in support of JWT tokens. So, we will have to think
through what branches are required for Crypto in the near term.

Perhaps, we can concentrate on those portions of crypto that will be of
immediate benefit to iteration-1 and leave higher order CryptoFS stuff to
another iteration? I don't think that we want an explosion of branches at
any given time. If we can limit it to specific areas, close down on the
iteration and get it merged before creating a new set of branches that
would be best. Again, ease of review, test and merge is important for us.

I am curious how development across related branches like these would work
though. If the service work needs to leverage work from the other how do we
do that easily. Can we branch a branch? Will that require both to be ready
to merge at the same time?

Perhaps, low-level dependencies can be duplicated for some time and then
consolidated later?

Anyway, specific questions:

Does the proposal to start with the attached iteration-1 draft to create an
iteration Jira make sense to everyone?

Does anyone have specific suggestions regarding the best way for managing
branches that should be decoupled but at the same time leverage common code?

Any other thoughts or insight?

thanks,

--larry

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-09-03 Thread Larry McCay

All -

Given that we have moved forward with the branch committerships for the
initial set of security branch contributors, I think that we should propose
a branch for iteration-1 as described in this thread.

My proposal is that we limit the scope of this initial branch to be only
that which is required for the pluggable authentication mechanism as
described in iteration-1. We will then create a separate branch in order to
introduce whole new services - such as: TAS Server Instances and a Key
Management Service.

This will make the ability to review each branch easier and the merging of
each into trunk less destabilizing/risky.

In terms of check-in philosophy, we should take a review then check-in
approach to the branch with lazy consensus - wherein we do not need to
explicitly +1 every check-in to the branch but we will honor any -1's with
discussion to resolve before checking in. This will provide us each with
the opportunity to track the work being done and ensure that we understand
it and find that it meets the intended goals.

I am excited to get this work really moving and look forward to working on
it with you all.

One outstanding question for me - how do we go about getting the branches
created?

Off the top of my head, I believe there to be a need for 3 for the related
security efforts actually: pluggable authentication/sso, security services
and cryptographic filesystem.

thanks!

--larry

On Tue, Aug 6, 2013 at 6:22 PM, Chris Nauroth cnaur...@hortonworks.comwrote:

Near the bottom of the bylaws, it states that addition of a New Branch
Committer requires Lazy consensus of active PMC members. I think this
means that you'll need to get a PMC member to sponsor the vote for you.
Regular committer votes happen on the private PMC mailing list, and I
assume it would be the same for a branch committer vote.

http://hadoop.apache.org/bylaws.html

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, Aug 6, 2013 at 2:48 PM, Larry McCay lmc...@hortonworks.com
wrote:

That sounds perfect!
I have been thinking of late that we would maybe need an incubator
project
or something for this - which would be unfortunate.

This would allow us to move much more quickly with a set of patches
broken
up into consumable/understandable chunks that are made functional more
easily within the branch.
I assume that we need to start a separate thread for DISCUSS or VOTE to
start that process - correct?

On Aug 6, 2013, at 4:15 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

yep, that is what I meant. Thanks Chris

On Tue, Aug 6, 2013 at 1:12 PM, Chris Nauroth
cnaur...@hortonworks.com
wrote:

Perhaps this is also a good opportunity to try out the new branch
committers clause in the bylaws, enabling non-committers who are
working
on this to commit to the feature branch.

http://mail-archives.apache.org/mod_mbox/hadoop-general/201308.mbox/%3CCACO5Y4we4d8knB_xU3a=hr2gbeqo5m3vau+inba0li1i9e2...@mail.gmail.com%3E

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, Aug 6, 2013 at 1:04 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

Larry,

Sorry for the delay answering. Thanks for laying down things, yes, it
makes
sense.

Given the large scope of the changes, number of JIRAs and number of
developers involved, wouldn't make sense to create a feature branch
for
all
this work not to destabilize (more ;) trunk?

Thanks again.

On Tue, Jul 30, 2013 at 9:43 AM, Larry McCay lmc...@hortonworks.com

wrote:

The following JIRA was filed to provide a token and basic authority
implementation for this effort:
https://issues.apache.org/jira/browse/HADOOP-9781

I have attached an initial patch though have yet to submit it as one
since
it is dependent on the patch for CMF that was posted to:
https://issues.apache.org/jira/browse/HADOOP-9534
and this patch still has a couple outstanding issues - javac
warnings
for
com.sun classes for certification generation and 11 javadoc
warnings.

Please feel free to review the patches and raise any questions or
concerns
related to them.

On Jul 26, 2013, at 8:59 PM, Larry McCay lmc...@hortonworks.com
wrote:

Hello All -

In an effort to scope an initial iteration that provides value to
the
community while focusing on the pluggable authentication aspects,
I've
written a description for Iteration 1. It identifies the goal of
the
iteration, the endstate and a set of initial usecases. It also
enumerates
the components that are required for each usecase. There is a scope
section
that details specific things that should be kept out of the first
iteration. This is certainly up for discussion. There may be some of
these
things that can be contributed in short order. If we can add some
things
in
without unnecessary complexity for the identified

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-09-03 Thread Larry McCay

Very good.
Thank you, Chris!

On Tue, Sep 3, 2013 at 6:44 PM, Chris Douglas cdoug...@apache.org wrote:

On Tue, Sep 3, 2013 at 5:20 AM, Larry McCay lmc...@hortonworks.com
wrote:
One outstanding question for me - how do we go about getting the branches
created?

Once a group has converged on a purpose- ideally with some initial
code from JIRA- please go ahead and create the feature branch in svn.
There's no ceremony. -C

On Tue, Aug 6, 2013 at 6:22 PM, Chris Nauroth cnaur...@hortonworks.com
wrote:

Near the bottom of the bylaws, it states that addition of a New Branch
Committer requires Lazy consensus of active PMC members. I think
this
means that you'll need to get a PMC member to sponsor the vote for you.
Regular committer votes happen on the private PMC mailing list, and I
assume it would be the same for a branch committer vote.

http://hadoop.apache.org/bylaws.html

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, Aug 6, 2013 at 2:48 PM, Larry McCay lmc...@hortonworks.com
wrote:

That sounds perfect!
I have been thinking of late that we would maybe need an incubator
project
or something for this - which would be unfortunate.

On Aug 6, 2013, at 4:15 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

yep, that is what I meant. Thanks Chris

On Tue, Aug 6, 2013 at 1:12 PM, Chris Nauroth
cnaur...@hortonworks.com
wrote:

Perhaps this is also a good opportunity to try out the new branch
committers clause in the bylaws, enabling non-committers who are
working
on this to commit to the feature branch.

http://mail-archives.apache.org/mod_mbox/hadoop-general/201308.mbox/%3CCACO5Y4we4d8knB_xU3a=hr2gbeqo5m3vau+inba0li1i9e2...@mail.gmail.com%3E

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, Aug 6, 2013 at 1:04 PM, Alejandro Abdelnur
t...@cloudera.com
wrote:

Larry,

Sorry for the delay answering. Thanks for laying down things,
yes, it
makes
sense.

Given the large scope of the changes, number of JIRAs and number
of
developers involved, wouldn't make sense to create a feature
branch
for
all
this work not to destabilize (more ;) trunk?

Thanks again.

On Tue, Jul 30, 2013 at 9:43 AM, Larry McCay
lmc...@hortonworks.com

wrote:

The following JIRA was filed to provide a token and basic
authority
implementation for this effort:
https://issues.apache.org/jira/browse/HADOOP-9781

I have attached an initial patch though have yet to submit it as
one
since
it is dependent on the patch for CMF that was posted to:
https://issues.apache.org/jira/browse/HADOOP-9534
and this patch still has a couple outstanding issues - javac
warnings
for
com.sun classes for certification generation and 11 javadoc
warnings.

Please feel free to review the patches and raise any questions or
concerns
related to them.

On Jul 26, 2013, at 8:59 PM, Larry McCay lmc...@hortonworks.com

wrote:

Hello All -

In an effort to scope an initial iteration that provides value
to
the
community while focusing on the pluggable authentication aspects,
I've
written a description for Iteration 1. It identifies the goal
of
the
iteration, the endstate and a set of initial usecases. It also
enumerates
the components that are required for each usecase. There is a
scope
section
that details specific things that should be kept out of the first
iteration. This is certainly up for discussion. There may be
some of
these
things that can be contributed in short order. If we can add some
things
in
without unnecessary complexity for the identified usecases then
we
should.

@Alejandro - please review this and see whether it satisfies
your
point
for a definition of what we are building.

In addition to the document that I will paste here as text and
attach a
pdf version, we have a couple patches for components that are
identified
in
the document.
Specifically, COMP-7 and COMP-8.

I will be posting COMP-8 patch to the HADOOP-9534 JIRA which was
filed
specifically for that functionality.
COMP-7 is a small set of classes to introduce JsonWebToken as
the
token
format and a basic JsonWebTokenAuthority that can issue and
verify
these
tokens.

Since there is no JIRA for this yet, I will likely file a new
JIRA
for
a
SSO token implementation.

Both of these patches assume to be modules

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-08-06 Thread Larry McCay

That sounds perfect!
I have been thinking of late that we would maybe need an incubator project or
something for this - which would be unfortunate.

This would allow us to move much more quickly with a set of patches broken up
into consumable/understandable chunks that are made functional more easily
within the branch.
I assume that we need to start a separate thread for DISCUSS or VOTE to start
that process - correct?

On Aug 6, 2013, at 4:15 PM, Alejandro Abdelnur t...@cloudera.com wrote:

yep, that is what I meant. Thanks Chris

On Tue, Aug 6, 2013 at 1:12 PM, Chris Nauroth cnaur...@hortonworks.comwrote:

Perhaps this is also a good opportunity to try out the new branch
committers clause in the bylaws, enabling non-committers who are working
on this to commit to the feature branch.

http://mail-archives.apache.org/mod_mbox/hadoop-general/201308.mbox/%3CCACO5Y4we4d8knB_xU3a=hr2gbeqo5m3vau+inba0li1i9e2...@mail.gmail.com%3E

Chris Nauroth
Hortonworks
http://hortonworks.com/

On Tue, Aug 6, 2013 at 1:04 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

Larry,

Sorry for the delay answering. Thanks for laying down things, yes, it
makes
sense.

Given the large scope of the changes, number of JIRAs and number of
developers involved, wouldn't make sense to create a feature branch for
all
this work not to destabilize (more ;) trunk?

Thanks again.

On Tue, Jul 30, 2013 at 9:43 AM, Larry McCay lmc...@hortonworks.com
wrote:

The following JIRA was filed to provide a token and basic authority
implementation for this effort:
https://issues.apache.org/jira/browse/HADOOP-9781

I have attached an initial patch though have yet to submit it as one
since
it is dependent on the patch for CMF that was posted to:
https://issues.apache.org/jira/browse/HADOOP-9534
and this patch still has a couple outstanding issues - javac warnings
for
com.sun classes for certification generation and 11 javadoc warnings.

Please feel free to review the patches and raise any questions or
concerns
related to them.

On Jul 26, 2013, at 8:59 PM, Larry McCay lmc...@hortonworks.com
wrote:

Hello All -

In an effort to scope an initial iteration that provides value to the
community while focusing on the pluggable authentication aspects, I've
written a description for Iteration 1. It identifies the goal of the
iteration, the endstate and a set of initial usecases. It also
enumerates
the components that are required for each usecase. There is a scope
section
that details specific things that should be kept out of the first
iteration. This is certainly up for discussion. There may be some of
these
things that can be contributed in short order. If we can add some
things
in
without unnecessary complexity for the identified usecases then we
should.

@Alejandro - please review this and see whether it satisfies your
point
for a definition of what we are building.

In addition to the document that I will paste here as text and
attach a
pdf version, we have a couple patches for components that are
identified
in
the document.
Specifically, COMP-7 and COMP-8.

I will be posting COMP-8 patch to the HADOOP-9534 JIRA which was
filed
specifically for that functionality.
COMP-7 is a small set of classes to introduce JsonWebToken as the
token
format and a basic JsonWebTokenAuthority that can issue and verify
these
tokens.

Since there is no JIRA for this yet, I will likely file a new JIRA
for
a
SSO token implementation.

Both of these patches assume to be modules within
hadoop-common/hadoop-common-project.
While they are relatively small, I think that they will be pulled in
by
other modules such as hadoop-auth which would likely not want a
dependency
on something larger like
hadoop-common/hadoop-common-project/hadoop-common.

This is certainly something that we should discuss within the
community
for this effort though - that being, exactly how to add these libraries
so
that they are most easily consumed by existing projects.

Anyway, the following is the Iteration-1 document - it is also
attached
as a pdf:

Iteration 1: Pluggable User Authentication and Federation

Introduction
The intent of this effort is to bootstrap the development of
pluggable
token-based authentication mechanisms to support certain goals of
enterprise authentication integrations. By restricting the scope of
this
effort, we hope to provide immediate benefit to the community while
keeping
the initial contribution to a manageable size that can be easily
reviewed,
understood and extended with further development through follow up
JIRAs
and related iterations.

Iteration Endstate
Once complete, this effort will have extended the authentication
mechanisms - for all client types - from the existing: Simple, Kerberos
and
Plain (for RPC) to include LDAP authentication and SAML based
federation.
In addition, the ability to provide additional/custom

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-30 Thread Larry McCay

The following JIRA was filed to provide a token and basic authority 
implementation for this effort:
https://issues.apache.org/jira/browse/HADOOP-9781

I have attached an initial patch though have yet to submit it as one since it 
is dependent on the patch for CMF that was posted to:
https://issues.apache.org/jira/browse/HADOOP-9534
and this patch still has a couple outstanding issues - javac warnings for 
com.sun classes for certification generation and 11 javadoc warnings.

Please feel free to review the patches and raise any questions or concerns 
related to them.

On Jul 26, 2013, at 8:59 PM, Larry McCay lmc...@hortonworks.com wrote:

 Hello All -
 
 In an effort to scope an initial iteration that provides value to the 
 community while focusing on the pluggable authentication aspects, I've 
 written a description for Iteration 1. It identifies the goal of the 
 iteration, the endstate and a set of initial usecases. It also enumerates the 
 components that are required for each usecase. There is a scope section that 
 details specific things that should be kept out of the first iteration. This 
 is certainly up for discussion. There may be some of these things that can be 
 contributed in short order. If we can add some things in without unnecessary 
 complexity for the identified usecases then we should.
 
 @Alejandro - please review this and see whether it satisfies your point for a 
 definition of what we are building.
 
 In addition to the document that I will paste here as text and attach a pdf 
 version, we have a couple patches for components that are identified in the 
 document.
 Specifically, COMP-7 and COMP-8.
 
 I will be posting COMP-8 patch to the HADOOP-9534 JIRA which was filed 
 specifically for that functionality.
 COMP-7 is a small set of classes to introduce JsonWebToken as the token 
 format and a basic JsonWebTokenAuthority that can issue and verify these 
 tokens.
 
 Since there is no JIRA for this yet, I will likely file a new JIRA for a SSO 
 token implementation.
 
 Both of these patches assume to be modules within 
 hadoop-common/hadoop-common-project.
 While they are relatively small, I think that they will be pulled in by other 
 modules such as hadoop-auth which would likely not want a dependency on 
 something larger like hadoop-common/hadoop-common-project/hadoop-common.
 
 This is certainly something that we should discuss within the community for 
 this effort though - that being, exactly how to add these libraries so that 
 they are most easily consumed by existing projects.
 
 Anyway, the following is the Iteration-1 document - it is also attached as a 
 pdf:
 
 Iteration 1: Pluggable User Authentication and Federation
 
 Introduction
 The intent of this effort is to bootstrap the development of pluggable 
 token-based authentication mechanisms to support certain goals of enterprise 
 authentication integrations. By restricting the scope of this effort, we hope 
 to provide immediate benefit to the community while keeping the initial 
 contribution to a manageable size that can be easily reviewed, understood and 
 extended with further development through follow up JIRAs and related 
 iterations.
 
 Iteration Endstate
 Once complete, this effort will have extended the authentication mechanisms - 
 for all client types - from the existing: Simple, Kerberos and Plain (for 
 RPC) to include LDAP authentication and SAML based federation. In addition, 
 the ability to provide additional/custom authentication mechanisms will be 
 enabled for users to plug in their preferred mechanisms.
 
 Project Scope
 The scope of this effort is a subset of the features covered by the overviews 
 of HADOOP-9392 and HADOOP-9533. This effort concentrates on enabling Hadoop 
 to issue, accept/validate SSO tokens of its own. The pluggable authentication 
 mechanism within SASL/RPC layer and the authentication filter pluggability 
 for REST and UI components will be leveraged and extended to support the 
 results of this effort.
 
 Out of Scope
 In order to scope the initial deliverable as the minimally viable product, a 
 handful of things have been simplified or left out of scope for this effort. 
 This is not meant to say that these aspects are not useful or not needed but 
 that they are not necessary for this iteration. We do however need to ensure 
 that we don’t do anything to preclude adding them in future iterations.
 1. Additional Attributes - the result of authentication will continue to use 
 the existing hadoop tokens and identity representations. Additional 
 attributes used for finer grained authorization decisions will be added 
 through follow-up efforts.
 2. Token revocation - the ability to revoke issued identity tokens will be 
 added later
 3. Multi-factor authentication - this will likely require additional 
 attributes and is not necessary for this iteration.
 4. Authorization changes - we will require additional attributes for the 
 fine-grained access control plans

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-10 Thread Larry McCay

All -

After combing through this thread - as well as the summit session summary 
thread, I think that we have the following two items that we can probably move 
forward with:

1. TokenAuth method - assuming this means the pluggable authentication 
mechanisms within the RPC layer (2 votes: Kai and Kyle)
2. An actual Hadoop Token format (2 votes: Brian and myself)

I propose that we attack both of these aspects as one. Let's provide the 
structure and interfaces of the pluggable framework for use in the RPC layer 
through leveraging Daryn's pluggability work and POC it with a particular token 
format (not necessarily the only format ever supported - we just need one to 
start). If there has already been work done in this area by anyone then please 
speak up and commit to providing a patch - so that we don't duplicate effort. 

@Daryn - is there a particular Jira or set of Jiras that we can look at to 
discern the pluggability mechanism details? Documentation of it would be great 
as well.
@Kai - do you have existing code for the pluggable token authentication 
mechanism - if not, we can take a stab at representing it with interfaces 
and/or POC code.
I can standup and say that we have a token format that we have been working 
with already and can provide a patch that represents it as a contribution to 
test out the pluggable tokenAuth.

These patches will provide progress toward code being the central discussion 
vehicle. As a community, we can then incrementally build on that foundation in 
order to collaboratively deliver the common vision.

In the absence of any other home for posting such patches, let's assume that 
they will be attached to HADOOP-9392 - or a dedicated subtask for this 
particular aspect/s - I will leave that detail to Kai.

@Alejandro, being the only voice on this thread that isn't represented in the 
votes above, please feel free to agree or disagree with this direction.

thanks,

--larry

On Jul 5, 2013, at 3:24 PM, Larry McCay lmc...@hortonworks.com wrote:

 Hi Andy -
 
 Happy Fourth of July to you and yours.
 
 Same to you and yours. :-)
 We had some fun in the sun for a change - we've had nothing but rain on the 
 east coast lately.
 
 My concern here is there may have been a misinterpretation or lack of
 consensus on what is meant by clean slate
 
 
 Apparently so.
 On the pre-summit call, I stated that I was interested in reconciling the 
 jiras so that we had one to work from.
 
 You recommended that we set them aside for the time being - with the 
 understanding that work would continue on your side (and our's as well) - and 
 approach the community discussion from a clean slate.
 We seemed to do this at the summit session quite well.
 It was my understanding that this community discussion would live beyond the 
 summit and continue on this list.
 
 While closing the summit session we agreed to follow up on common-dev with 
 first a summary then a discussion of the moving parts.
 
 I never expected the previous work to be abandoned and fully expected it to 
 inform the discussion that happened here.
 
 If you would like to reframe what clean slate was supposed to mean or 
 describe what it means now - that would be welcome - before I waste anymore 
 time trying to facilitate a community discussion that is apparently not 
 wanted.
 
 Nowhere in this
 picture are self appointed master JIRAs and such, which have been
 disappointing to see crop up, we should be collaboratively coding not
 planting flags.
 
 I don't know what you mean by self-appointed master JIRAs.
 It has certainly not been anyone's intention to disappoint.
 Any mention of a new JIRA was just to have a clear context to gather the 
 agreed upon points - previous and/or existing JIRAs would easily be linked.
 
 Planting flags… I need to go back and read my discussion point about the JIRA 
 and see how this is the impression that was made.
 That is not how I define success. The only flags that count is code. What we 
 are lacking is the roadmap on which to put the code.
 
 I read Kai's latest document as something approaching today's consensus (or
 at least a common point of view?) rather than a historical document.
 Perhaps he and it can be given equal share of the consideration.
 
 I definitely read it as something that has evolved into something approaching 
 what we have been talking about so far. There has not however been enough 
 discussion anywhere near the level of detail in that document and more 
 details are needed for each component in the design. 
 Why the work in that document should not be fed into the community discussion 
 as anyone else's would be - I fail to understand.
 
 My suggestion continues to be that you should take that document and speak to 
 the inventory of moving parts as we agreed.
 As these are agreed upon, we will ensure that the appropriate subtasks are 
 filed against whatever JIRA is to host them - don't really care much which it 
 is.
 
 I don't really want to continue

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-10 Thread Larry McCay

It seems to me that we can have the best of both worlds here…it's all about the 
scoping.

If we were to reframe the immediate scope to the lowest common denominator of 
what is needed for accepting tokens in authentication plugins then we gain:

1. a very manageable scope to define and agree upon
2. a deliverable that should be useful in and of itself
3. a foundation for community collaboration that we build on for higher level 
solutions built on this lowest common denominator and experience as a working 
community

So, to Alejandro's point, perhaps we need to define what would make #2 above 
true - this could serve as the what we are building instead of the how to 
build it.
Including:
a. project structure within hadoop-common-project/common-security or the like
b. the usecases that would need to be enabled to make it a self contained and 
useful contribution - without higher level solutions
c. the JIRA/s for contributing patches
d. what specific patches will be needed to accomplished the usecases in #b

In other words, an end-state for the lowest common denominator that enables 
code patches in the near-term is the best of both worlds.

I think this may be a good way to bootstrap the collaboration process for our 
emerging security community rather than trying to tackle a huge vision all at 
once.

@Alejandro - if you have something else in mind that would bootstrap this 
process - that would great - please advise.

thoughts?

On Jul 10, 2013, at 1:06 PM, Brian Swan brian.s...@microsoft.com wrote:

 Hi Alejandro, all-
 
 There seems to be agreement on the broad stroke description of the components 
 needed to achieve pluggable token authentication (I'm sure I'll be corrected 
 if that isn't the case). However, discussion of the details of those 
 components doesn't seem to be moving forward. I think this is because the 
 details are really best understood through code. I also see *a* (i.e. one of 
 many possible) token format and pluggable authentication mechanisms within 
 the RPC layer as components that can have immediate benefit to Hadoop users 
 AND still allow flexibility in the larger design. So, I think the best way to 
 move the conversation of what we are aiming for forward is to start looking 
 at code for these components. I am especially interested in moving forward 
 with pluggable authentication mechanisms within the RPC layer and would love 
 to see what others have done in this area (if anything).
 
 Thanks.
 
 -Brian
 
 -Original Message-
 From: Alejandro Abdelnur [mailto:t...@cloudera.com] 
 Sent: Wednesday, July 10, 2013 8:15 AM
 To: Larry McCay
 Cc: common-dev@hadoop.apache.org; da...@yahoo-inc.com; Kai Zheng
 Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components
 
 Larry, all,
 
 Still is not clear to me what is the end state we are aiming for, or that we 
 even agree on that.
 
 IMO, Instead trying to agree what to do, we should first  agree on the final 
 state, then we see what should be changed to there there, then we see how we 
 change things to get there.
 
 The different documents out there focus more on how.
 
 We not try to say how before we know what.
 
 Thx.
 
 
 
 
 On Wed, Jul 10, 2013 at 6:42 AM, Larry McCay lmc...@hortonworks.com wrote:
 
 All -
 
 After combing through this thread - as well as the summit session 
 summary thread, I think that we have the following two items that we 
 can probably move forward with:
 
 1. TokenAuth method - assuming this means the pluggable authentication 
 mechanisms within the RPC layer (2 votes: Kai and Kyle) 2. An actual 
 Hadoop Token format (2 votes: Brian and myself)
 
 I propose that we attack both of these aspects as one. Let's provide 
 the structure and interfaces of the pluggable framework for use in the 
 RPC layer through leveraging Daryn's pluggability work and POC it with 
 a particular token format (not necessarily the only format ever 
 supported - we just need one to start). If there has already been work 
 done in this area by anyone then please speak up and commit to 
 providing a patch - so that we don't duplicate effort.
 
 @Daryn - is there a particular Jira or set of Jiras that we can look 
 at to discern the pluggability mechanism details? Documentation of it 
 would be great as well.
 @Kai - do you have existing code for the pluggable token 
 authentication mechanism - if not, we can take a stab at representing 
 it with interfaces and/or POC code.
 I can standup and say that we have a token format that we have been 
 working with already and can provide a patch that represents it as a 
 contribution to test out the pluggable tokenAuth.
 
 These patches will provide progress toward code being the central 
 discussion vehicle. As a community, we can then incrementally build on 
 that foundation in order to collaboratively deliver the common vision.
 
 In the absence of any other home for posting such patches, let's 
 assume that they will be attached to HADOOP-9392 - or a dedicated 
 subtask

Re: Hadoop Summit: Security Design Lounge Session

2013-07-09 Thread Larry McCay

Adding additional takeaways that were articulated by Alejandro and expanded by 
me in another thread - so that we have it all in one place…thanks again, 
Alejandro!



Hi Alejandro -

I missed your #4 in my summary and takeaways of the session in another thread 
on this list.

I believe that the points of discussion were along the lines of:

* put common security libraries into common much the same way as hadoop-auth is 
today making each available as separate maven modules to be used across the 
ecosystem
* the was a concern raised that we need to be cognizant of not using common as 
a dumping grounds
- I believe this to mean that we need to ensure that the libraries that 
are added there are truly cross cutting and can be used by the other projects 
across Hadoop
- I think that security related things will largely be of that nature 
but we need to keep it in mind

I'm not sure whether #3 is represented in the other summary or not…

There was certainly discussions around the emerging work from Daryn related to 
pluggable authentication mechanisms within that layer and we will immediately 
have the options of kerberos, simple and plain. There was also talk of how this 
can be leveraged to introduce a Hadoop token mechanism as well. 

At the same time, there was talk of the possibility of simply making kerberos 
easy and a non-issue for intra-cluster use. Certainly we need both of these 
approaches.
I believe someone used ApacheDS' KDC support as an example - if we could 
standup an ApacheDS based KDC and configure it and related keytabs easily than 
the end-to-end story is more palatable to a broader user base. That story being 
the choice of authentication mechanisms for user authentication and easy 
provisioning and management of kerberos for intra-cluster service 
authentication.

If you agree with this extended summary then I can update the other thread with 
that recollection.
Thanks for providing it!

--larry

On Jul 4, 2013, at 4:09 PM, Alejandro Abdelnur t...@cloudera.com wrote:

 Leaving JIRAs and design docs aside, my recollection from the f2f lounge
 discussion could be summarized as:
 
 --
 1* Decouple users-services authentication from (intra) services-services
 authentication.
 
 The main motivation for this is to get pluggable authentication and
 integrated SSO experience for users.
 
 (we never discussed if this is needed for external-apps talking with Hadoop)
 
 2* We should leave the Hadoop delegation tokens alone
 
 No need to make this pluggable as this is an internal authentication
 mechanism after the 'real' authentication happened.
 
 (this is independent from factoring out all classes we currently have into
 a common implementation for Hadoop and other projects to use)
 
 3* Being able to replace kerberos with something else for (intra)
 services-services authentication.
 
 It was suggested that to support deployments where stock Kerberos may not
 be an option (i.e. cloud) we should make sure that UserGroupInformation and
 RPC security logic work with a pluggable GSS implementation.
 
 4* Create a common security component ie 'hadoop-security' to be 'the'
 security lib for all projects to use.
 
 Create a component/project that would provide the common security pieces
 for all projects to use.
 
 --
 
 If we agree with this, after any necessary corrections, I think we could
 distill clear goals from it and start from there.
 
 Thanks.
 
 Tucu  Alejandro


On Jul 1, 2013, at 5:40 PM, Larry McCay lmc...@hortonworks.com wrote:

 All -
 
 Last week at Hadoop Summit there was a room dedicated as the summit Design 
 Lounge.
 This was a place where like folks could get together and talk about design 
 issues with other contributors with a simple flip board and some beanbag 
 chairs.
 We used this as an opportunity to bootstrap some discussions within 
 common-dev for security related topics. I'd like to summarize the security 
 session and takeaways here for everyone.
 
 This summary and set of takeaways are largely from memory. 
 Please - anyone that attended - feel free to correct anything that is 
 inaccurate or omitted.
 
 Pretty well attended - companies represented:
 * Yahoo!
 * Microsoft
 * Hortonworks
 * Cloudera
 * Intel
 * eBay
 * Voltage Security
 * Flying Penguins
 * EMC
 * others...
 
 Most folks were pretty engaged throughout the session.
 We set expectations as a meet and greet/project kickoff - project being the 
 emerging security development community.
 
 In order to keep the scope of conversations manageable we tried to keep 
 focused on authentication and the ideas around SSO and tokens.
 
 We discussed kerberos as:
 1. major pain point and barrier to entry for some
 2. seemingly perfect for others
   a. obviously requiring backward compatibility
 
 It seemed to be consensus that:
 1. user authentication should be easily integrated with alternative 
 enterprise identity solutions
 2

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-05 Thread Larry McCay

Hi Alejandro -

I missed your #4 in my summary and takeaways of the session in another thread 
on this list.

I believe that the points of discussion were along the lines of:

* put common security libraries into common much the same way as hadoop-auth is 
today making each available as separate maven modules to be used across the 
ecosystem
* the was a concern raised that we need to be cognizant of not using common as 
a dumping grounds
- I believe this to mean that we need to ensure that the libraries that 
are added there are truly cross cutting and can be used by the other projects 
across Hadoop
- I think that security related things will largely be of that nature 
but we need to keep it in mind

I'm not sure whether #3 is represented in the other summary or not…

There was certainly discussions around the emerging work from Daryn related to 
pluggable authentication mechanisms within that layer and we will immediately 
have the options of kerberos, simple and plain. There was also talk of how this 
can be leveraged to introduce a Hadoop token mechanism as well. 

At the same time, there was talk of the possibility of simply making kerberos 
easy and a non-issue for intra-cluster use. Certainly we need both of these 
approaches.
I believe someone used ApacheDS' KDC support as an example - if we could 
standup an ApacheDS based KDC and configure it and related keytabs easily than 
the end-to-end story is more palatable to a broader user base. That story being 
the choice of authentication mechanisms for user authentication and easy 
provisioning and management of kerberos for intra-cluster service 
authentication.

If you agree with this extended summary then I can update the other thread with 
that recollection.
Thanks for providing it!

--larry

On Jul 4, 2013, at 4:09 PM, Alejandro Abdelnur t...@cloudera.com wrote:

 Leaving JIRAs and design docs aside, my recollection from the f2f lounge
 discussion could be summarized as:
 
 --
 1* Decouple users-services authentication from (intra) services-services
 authentication.
 
 The main motivation for this is to get pluggable authentication and
 integrated SSO experience for users.
 
 (we never discussed if this is needed for external-apps talking with Hadoop)
 
 2* We should leave the Hadoop delegation tokens alone
 
 No need to make this pluggable as this is an internal authentication
 mechanism after the 'real' authentication happened.
 
 (this is independent from factoring out all classes we currently have into
 a common implementation for Hadoop and other projects to use)
 
 3* Being able to replace kerberos with something else for (intra)
 services-services authentication.
 
 It was suggested that to support deployments where stock Kerberos may not
 be an option (i.e. cloud) we should make sure that UserGroupInformation and
 RPC security logic work with a pluggable GSS implementation.
 
 4* Create a common security component ie 'hadoop-security' to be 'the'
 security lib for all projects to use.
 
 Create a component/project that would provide the common security pieces
 for all projects to use.
 
 --
 
 If we agree with this, after any necessary corrections, I think we could
 distill clear goals from it and start from there.
 
 Thanks.
 
 Tucu  Alejandro
 
 On Thu, Jul 4, 2013 at 11:40 AM, Andrew Purtell apurt...@apache.org wrote:
 
 Hi Larry (and all),
 
 Happy Fourth of July to you and yours.
 
 In our shop Kai and Tianyou are already doing the coding, so I'd defer to
 them on the detailed points.
 
 My concern here is there may have been a misinterpretation or lack of
 consensus on what is meant by clean slate. Hopefully that can be quickly
 cleared up. Certainly we did not mean ignore all that came before. The idea
 was to reset discussions to find common ground and new direction where we
 are working together, not in conflict, on an agreed upon set of design
 points and tasks. There's been a lot of good discussion and design
 preceeding that we should figure out how to port over. Nowhere in this
 picture are self appointed master JIRAs and such, which have been
 disappointing to see crop up, we should be collaboratively coding not
 planting flags.
 
 I read Kai's latest document as something approaching today's consensus (or
 at least a common point of view?) rather than a historical document.
 Perhaps he and it can be given equal share of the consideration.
 
 
 On Wednesday, July 3, 2013, Larry McCay wrote:
 
 Hey Andrew -
 
 I largely agree with that statement.
 My intention was to let the differences be worked out within the
 individual components once they were identified and subtasks created.
 
 My reference to HSSO was really referring to a SSO *server* based design
 which was not clearly articulated in the earlier documents.
 We aren't trying to compare and contrast one design over another anymore.
 
 Let's move this collaboration along as we've mapped out and the
 differences in the details will reveal

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-04 Thread Larry McCay

 
 there was any discussion of abandoning the current JIRAs which tracks a lot 
 of good input from others in the community and important for us to consider 
 as we move forward with the work. Recommend we continue to move forward with 
 the two JIRAs that we have already been respectively working on, as well 
 other JIRAs that others in the community continue to work on.
 
 Your latest design revision actually makes it clear that you are now 
 targeting exactly what was described as HSSO - so comparing and contrasting 
 is not going to add any value.
 That is not my understanding. As Kai has pointed out in response to your 
 comment on HADOOP-9392, a lot of these updates predate last week's discussion 
 at the summit. Fortunately the discussion at the summit was in line with our 
 thinking on the required revisions from discussing with others in the 
 community prior to the summit. Our updated design doc clearly addresses the 
 authorization and proxy flow which are important for users. HSSO can continue 
 to be layered on top of TAS via federation.
 
 Personally, I think that continuing the separation of 9533 and 9392 will do 
 this effort a disservice. There doesn't seem to be enough differences between 
 the two to justify separate jiras anymore.
 Actually I see many key differences between 9392 and 9533. Andrew and Kai has 
 also pointed out there are key differences when comparing 9392 and 9533. 
 Please review the design doc we have uploaded to understand the differences. 
 I am sure Kai will also add more details about the differences between these 
 JIRAs.
 
 The work proposed by us on 9392 addresses additional user needs beyond what 
 9533 proposes to implement. We should figure out some of the implementation 
 specifics for those JIRAs so both of us can keep moving on the code without 
 colliding. Kai has also recommended the same as his preference in response to 
 your comment on 9392.
 
 Let's work that out as a community of peers so we can all agree on an 
 approach to move forward collaboratively.
 
 Thanks,
 Tianyou
 
 -Original Message-
 From: Larry McCay [mailto:lmc...@hortonworks.com] 
 Sent: Thursday, July 04, 2013 4:10 AM
 To: Zheng, Kai
 Cc: common-dev@hadoop.apache.org
 Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components
 
 Hi Kai -
 
 I think that I need to clarify something...
 
 This is not an update for 9533 but a continuation of the discussions that are 
 focused on a fresh look at a SSO for Hadoop.
 We've agreed to leave our previous designs behind and therefore we aren't 
 really seeing it as an HSSO layered on top of TAS approach or an HSSO vs TAS 
 discussion.
 
 Your latest design revision actually makes it clear that you are now 
 targeting exactly what was described as HSSO - so comparing and contrasting 
 is not going to add any value.
 
 What we need you to do at this point, is to look at those high-level 
 components described on this thread and comment on whether we need additional 
 components or any that are listed that don't seem necessary to you and why.
 In other words, we need to define and agree on the work that has to be done.
 
 We also need to determine those components that need to be done before 
 anything else can be started.
 I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the 
 other components and should probably be defined and POC'd in short order.
 
 Personally, I think that continuing the separation of 9533 and 9392 will do 
 this effort a disservice. There doesn't seem to be enough differences between 
 the two to justify separate jiras anymore. It may be best to file a new one 
 that reflects a single vision without the extra cruft that has built up in 
 either of the existing ones. We would certainly reference the existing ones 
 within the new one. This approach would align with the spirit of the 
 discussions up to this point.
 
 I am prepared to start a discussion around the shape of the two Hadoop SSO 
 tokens: identity and access. If this is what others feel the next topic 
 should be.
 If we can identify a jira home for it, we can do it there - otherwise we can 
 create another DISCUSS thread for it.
 
 thanks,
 
 --larry
 
 
 On Jul 3, 2013, at 2:39 PM, Zheng, Kai kai.zh...@intel.com wrote:
 
 Hi Larry,
 
 Thanks for the update. Good to see that with this update we are now aligned 
 on most points.
 
 I have also updated our TokenAuth design in HADOOP-9392. The new revision 
 incorporates feedback and suggestions in related discussion with the 
 community, particularly from Microsoft and others attending the Security 
 design lounge session at the Hadoop summit. Summary of the changes:
 1.Revised the approach to now use two tokens, Identity Token plus Access 
 Token, particularly considering our authorization framework and 
 compatibility with HSSO;
 2.Introduced Authorization Server (AS) from our authorization framework 
 into the flow that issues access tokens for clients with identity tokens to 
 access

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-04 Thread Larry McCay

 authorization models like RBAC, ABAC 
 and even XACML standards.
 
 TokenAuth targets support for domain based authN  authZ to allow 
 multi-tenant deployments. Authentication and authorization rules can be 
 configured and enforced per domain, which allows organizations to manage 
 their individual policies separately while sharing a common large pool of 
 resources.
 
 TokenAuth addresses proxy/impersonation case with flow as Tianyou mentioned, 
 where a service can proxy client to access another service in a secured and 
 constrained way.
 
 Regarding token based authentication plus SSO and unified authorization 
 framework, HADOOP-9392 and HADOOP-9466 let's continue to use these as 
 umbrella JIRAs for these efforts. HSSO targets support for centralized SSO 
 server for multiple clusters and as we have pointed out before is a nice 
 subset of the work proposed on HADOOP-9392. Let's align these two JIRAs and 
 address the question Kevin raised multiple times in 9392/9533 JIRAs, How can 
 HSSO and TAS work together? What is the relationship?. The design update I 
 provided was meant to provide the necessary details so we can nail down that 
 relationship and collaborate on the implementation of these JIRAs.
 
 As you have also confirmed, this design aligns with related community 
 discussions, so let's continue our collaborative effort to contribute code to 
 these JIRAs.
 
 Regards,
 Kai
 
 -Original Message-
 From: Larry McCay [mailto:lmc...@hortonworks.com] 
 Sent: Thursday, July 04, 2013 4:10 AM
 To: Zheng, Kai
 Cc: common-dev@hadoop.apache.org
 Subject: Re: [DISCUSS] Hadoop SSO/Token Server Components
 
 Hi Kai -
 
 I think that I need to clarify something...
 
 This is not an update for 9533 but a continuation of the discussions that are 
 focused on a fresh look at a SSO for Hadoop.
 We've agreed to leave our previous designs behind and therefore we aren't 
 really seeing it as an HSSO layered on top of TAS approach or an HSSO vs TAS 
 discussion.
 
 Your latest design revision actually makes it clear that you are now 
 targeting exactly what was described as HSSO - so comparing and contrasting 
 is not going to add any value.
 
 What we need you to do at this point, is to look at those high-level 
 components described on this thread and comment on whether we need additional 
 components or any that are listed that don't seem necessary to you and why.
 In other words, we need to define and agree on the work that has to be done.
 
 We also need to determine those components that need to be done before 
 anything else can be started.
 I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the 
 other components and should probably be defined and POC'd in short order.
 
 Personally, I think that continuing the separation of 9533 and 9392 will do 
 this effort a disservice. There doesn't seem to be enough differences between 
 the two to justify separate jiras anymore. It may be best to file a new one 
 that reflects a single vision without the extra cruft that has built up in 
 either of the existing ones. We would certainly reference the existing ones 
 within the new one. This approach would align with the spirit of the 
 discussions up to this point.
 
 I am prepared to start a discussion around the shape of the two Hadoop SSO 
 tokens: identity and access. If this is what others feel the next topic 
 should be.
 If we can identify a jira home for it, we can do it there - otherwise we can 
 create another DISCUSS thread for it.
 
 thanks,
 
 --larry
 
 
 On Jul 3, 2013, at 2:39 PM, Zheng, Kai kai.zh...@intel.com wrote:
 
 Hi Larry,
 
 Thanks for the update. Good to see that with this update we are now aligned 
 on most points.
 
 I have also updated our TokenAuth design in HADOOP-9392. The new revision 
 incorporates feedback and suggestions in related discussion with the 
 community, particularly from Microsoft and others attending the Security 
 design lounge session at the Hadoop summit. Summary of the changes:
 1.Revised the approach to now use two tokens, Identity Token plus Access 
 Token, particularly considering our authorization framework and 
 compatibility with HSSO;
 2.Introduced Authorization Server (AS) from our authorization framework 
 into the flow that issues access tokens for clients with identity tokens to 
 access services;
 3.Refined proxy access token and the proxy/impersonation flow;
 4.Refined the browser web SSO flow regarding access to Hadoop web 
 services;
 5.Added Hadoop RPC access flow regarding CLI clients accessing Hadoop 
 services via RPC/SASL;
 6.Added client authentication integration flow to illustrate how desktop 
 logins can be integrated into the authentication process to TAS to exchange 
 identity token;
 7.Introduced fine grained access control flow from authorization 
 framework, I have put it in appendices section for the reference;
 8.Added a detailed flow to illustrate Hadoop Simple authentication

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-03 Thread Larry McCay

Hi Kai -

I think that I need to clarify something…

This is not an update for 9533 but a continuation of the discussions that are 
focused on a fresh look at a SSO for Hadoop.
We've agreed to leave our previous designs behind and therefore we aren't 
really seeing it as an HSSO layered on top of TAS approach or an HSSO vs TAS 
discussion.

Your latest design revision actually makes it clear that you are now targeting 
exactly what was described as HSSO - so comparing and contrasting is not going 
to add any value.

What we need you to do at this point, is to look at those high-level components 
described on this thread and comment on whether we need additional components 
or any that are listed that don't seem necessary to you and why.
In other words, we need to define and agree on the work that has to be done.

We also need to determine those components that need to be done before anything 
else can be started.
I happen to agree with Brian that #4 Hadoop SSO Tokens are central to all the 
other components and should probably be defined and POC'd in short order.

Personally, I think that continuing the separation of 9533 and 9392 will do 
this effort a disservice. There doesn't seem to be enough differences between 
the two to justify separate jiras anymore. It may be best to file a new one 
that reflects a single vision without the extra cruft that has built up in 
either of the existing ones. We would certainly reference the existing ones 
within the new one. This approach would align with the spirit of the 
discussions up to this point.

I am prepared to start a discussion around the shape of the two Hadoop SSO 
tokens: identity and access. If this is what others feel the next topic should 
be.
If we can identify a jira home for it, we can do it there - otherwise we can 
create another DISCUSS thread for it.

thanks,

--larry


On Jul 3, 2013, at 2:39 PM, Zheng, Kai kai.zh...@intel.com wrote:

 Hi Larry,
 
 Thanks for the update. Good to see that with this update we are now aligned 
 on most points.
 
 I have also updated our TokenAuth design in HADOOP-9392. The new revision 
 incorporates feedback and suggestions in related discussion with the 
 community, particularly from Microsoft and others attending the Security 
 design lounge session at the Hadoop summit. Summary of the changes:
 1.Revised the approach to now use two tokens, Identity Token plus Access 
 Token, particularly considering our authorization framework and compatibility 
 with HSSO;
 2.Introduced Authorization Server (AS) from our authorization framework 
 into the flow that issues access tokens for clients with identity tokens to 
 access services;
 3.Refined proxy access token and the proxy/impersonation flow;
 4.Refined the browser web SSO flow regarding access to Hadoop web 
 services;
 5.Added Hadoop RPC access flow regarding CLI clients accessing Hadoop 
 services via RPC/SASL;
 6.Added client authentication integration flow to illustrate how desktop 
 logins can be integrated into the authentication process to TAS to exchange 
 identity token;
 7.Introduced fine grained access control flow from authorization 
 framework, I have put it in appendices section for the reference;
 8.Added a detailed flow to illustrate Hadoop Simple authentication over 
 TokenAuth, in the appendices section;
 9.Added secured task launcher in appendices as possible solutions for 
 Windows platform;
 10.Removed low level contents, and not so relevant parts into appendices 
 section from the main body.
 
 As we all think about how to layer HSSO on TAS in TokenAuth framework, please 
 take some time to look at the doc and then let's discuss the gaps we might 
 have. I would like to discuss these gaps with focus on the implementations 
 details so we are all moving towards getting code done. Let's continue this 
 part of the discussion in HADOOP-9392 to allow for better tracking on the 
 JIRA itself. For discussions related to Centralized SSO server, suggest we 
 continue to use HADOOP-9533 to consolidate all discussion related to that 
 JIRA. That way we don't need extra umbrella JIRAs.
 
 I agree we should speed up these discussions, agree on some of the 
 implementation specifics so both us can get moving on the code while not 
 stepping on each other in our work.
 
 Look forward to your comments and comments from others in the community. 
 Thanks.
 
 Regards,
 Kai
 
 -Original Message-
 From: Larry McCay [mailto:lmc...@hortonworks.com] 
 Sent: Wednesday, July 03, 2013 4:04 AM
 To: common-dev@hadoop.apache.org
 Subject: [DISCUSS] Hadoop SSO/Token Server Components
 
 All -
 
 As a follow up to the discussions that were had during Hadoop Summit, I would 
 like to introduce the discussion topic around the moving parts of a Hadoop 
 SSO/Token Service.
 There are a couple of related Jira's that can be referenced and may or may 
 not be updated as a result of this discuss thread.
 
 https://issues.apache.org

Re: [DISCUSS] Hadoop SSO/Token Server Components

2013-07-03 Thread Larry McCay

Thanks, Brian!
Look at that - the power of collaboration - the numbering is correct already! 
;-)

I am inclined to agree that we should start with the Hadoop SSO Tokens and am 
leaning toward a new jira that leaves behind the cruft but I don't feel very 
strongly about it being new.
I do feel like, especially given Kai's new document, that we have only one.

On Jul 3, 2013, at 2:32 PM, Brian Swan brian.s...@microsoft.com wrote:

 Thanks, Larry, for starting this conversation (and thanks for the great 
 Summit meeting summary you sent out a couple of days ago). To weigh in on 
 your specific discussion points (and renumber them :-))...
 
 1. Are there additional components that would be required for a Hadoop SSO 
 service?
 Not that I can see.
 
 2. Should any of the above described components be considered not actually 
 necessary or poorly described?
 I think this will be determined as we get into the details of each component. 
 What you've described here is certainly an excellent starting point.
 
 3. Should we create a new umbrella Jira to identify each of these as a 
 subtask?
 4. Should we just continue to use 9533 for the SSO server and add additional 
 subtasks?
 What is described here seem to fit with 9533, though 9533 may contain some 
 details that need further discussion. IMHO, it may be better to file a new 
 umbrella Jira, though I'm not 100% convinced of that. Would be very 
 interested on input from others.
 
 5. What are the natural seams of separation between these components and any 
 dependencies between one and another that affect priority?
 Is 4 the right place to start? (4. Hadoop SSO Tokens: the exact shape and 
 form of the sso tokens...) It seemed that in some 1:1 conversations after the 
 Summit meeting that others may agree with this. Would like to hear if that is 
 the case more broadly.
 
 -Brian
 
 -Original Message-
 From: Larry McCay [mailto:lmc...@hortonworks.com] 
 Sent: Tuesday, July 2, 2013 1:04 PM
 To: common-dev@hadoop.apache.org
 Subject: [DISCUSS] Hadoop SSO/Token Server Components
 
 All -
 
 As a follow up to the discussions that were had during Hadoop Summit, I would 
 like to introduce the discussion topic around the moving parts of a Hadoop 
 SSO/Token Service.
 There are a couple of related Jira's that can be referenced and may or may 
 not be updated as a result of this discuss thread.
 
 https://issues.apache.org/jira/browse/HADOOP-9533
 https://issues.apache.org/jira/browse/HADOOP-9392
 
 As the first aspect of the discussion, we should probably state the overall 
 goals and scoping for this effort:
 * An alternative authentication mechanism to Kerberos for user authentication
 * A broader capability for integration into enterprise identity and SSO 
 solutions
 * Possibly the advertisement/negotiation of available authentication 
 mechanisms
 * Backward compatibility for the existing use of Kerberos
 * No (or minimal) changes to existing Hadoop tokens (delegation, job, block 
 access, etc)
 * Pluggable authentication mechanisms across: RPC, REST and webui enforcement 
 points
 * Continued support for existing authorization policy/ACLs, etc
 * Keeping more fine grained authorization policies in mind - like attribute 
 based access control
   - fine grained access control is a separate but related effort that we 
 must not preclude with this effort
 * Cross cluster SSO
 
 In order to tease out the moving parts here are a couple high level and 
 simplified descriptions of SSO interaction flow:
   +--+
   +--+ credentials 1 | SSO  |
   |CLIENT|--|SERVER|
   +--+  :tokens  +--+
 2 |
   | access token
   V :requested resource
   +---+
   |HADOOP |
   |SERVICE|
   +---+
   
 The above diagram represents the simplest interaction model for an SSO 
 service in Hadoop.
 1. client authenticates to SSO service and acquires an access token
  a. client presents credentials to an authentication service endpoint exposed 
 by the SSO server (AS) and receives a token representing the authentication 
 event and verified identity
  b. client then presents the identity token from 1.a. to the token endpoint 
 exposed by the SSO server (TGS) to request an access token to a particular 
 Hadoop service and receives an access token 2. client presents the Hadoop 
 access token to the Hadoop service for which the access token has been 
 granted and requests the desired resource or services
  a. access token is presented as appropriate for the service endpoint 
 protocol being used
  b. Hadoop service token validation handler validates the token and verifies 
 its integrity and the identity of the issuer
 
+--+
|  IdP |
+--+
1   ^ credentials
| :idp_token
|  +--+
   +--+  idp_token  2 | SSO  |
   |CLIENT|--|SERVER|
   +--+  :tokens

[DISCUSS] Hadoop SSO/Token Server Components

2013-07-02 Thread Larry McCay

All -

As a follow up to the discussions that were had during Hadoop Summit, I would 
like to introduce the discussion topic around the moving parts of a Hadoop 
SSO/Token Service.
There are a couple of related Jira's that can be referenced and may or may not 
be updated as a result of this discuss thread.

https://issues.apache.org/jira/browse/HADOOP-9533
https://issues.apache.org/jira/browse/HADOOP-9392

As the first aspect of the discussion, we should probably state the overall 
goals and scoping for this effort:
* An alternative authentication mechanism to Kerberos for user authentication
* A broader capability for integration into enterprise identity and SSO 
solutions
* Possibly the advertisement/negotiation of available authentication mechanisms
* Backward compatibility for the existing use of Kerberos
* No (or minimal) changes to existing Hadoop tokens (delegation, job, block 
access, etc)
* Pluggable authentication mechanisms across: RPC, REST and webui enforcement 
points
* Continued support for existing authorization policy/ACLs, etc
* Keeping more fine grained authorization policies in mind - like attribute 
based access control
- fine grained access control is a separate but related effort that we 
must not preclude with this effort
* Cross cluster SSO

In order to tease out the moving parts here are a couple high level and 
simplified descriptions of SSO interaction flow:
   +--+
+--+ credentials 1 | SSO  |
|CLIENT|--|SERVER|
+--+  :tokens  +--+
  2 |
| access token
V :requested resource
+---+
|HADOOP |
|SERVICE|
+---+

The above diagram represents the simplest interaction model for an SSO service 
in Hadoop.
1. client authenticates to SSO service and acquires an access token
  a. client presents credentials to an authentication service endpoint exposed 
by the SSO server (AS) and receives a token representing the authentication 
event and verified identity
  b. client then presents the identity token from 1.a. to the token endpoint 
exposed by the SSO server (TGS) to request an access token to a particular 
Hadoop service and receives an access token
2. client presents the Hadoop access token to the Hadoop service for which the 
access token has been granted and requests the desired resource or services
  a. access token is presented as appropriate for the service endpoint protocol 
being used
  b. Hadoop service token validation handler validates the token and verifies 
its integrity and the identity of the issuer

+--+
|  IdP |
+--+
1   ^ credentials
| :idp_token
|  +--+
+--+  idp_token  2 | SSO  |
|CLIENT|--|SERVER|
+--+  :tokens  +--+
  3 |
| access token
V :requested resource
+---+
|HADOOP |
|SERVICE|
+---+


The above diagram represents a slightly more complicated interaction model for 
an SSO service in Hadoop that removes Hadoop from the credential collection 
business.
1. client authenticates to a trusted identity provider within the enterprise 
and acquires an IdP specific token
  a. client presents credentials to an enterprise IdP and receives a token 
representing the authentication identity
2. client authenticates to SSO service and acquires an access token
  a. client presents idp_token to an authentication service endpoint exposed by 
the SSO server (AS) and receives a token representing the authentication event 
and verified identity
  b. client then presents the identity token from 2.a. to the token endpoint 
exposed by the SSO server (TGS) to request an access token to a particular 
Hadoop service and receives an access token
3. client presents the Hadoop access token to the Hadoop service for which the 
access token has been granted and requests the desired resource or services
  a. access token is presented as appropriate for the service endpoint protocol 
being used
  b. Hadoop service token validation handler validates the token and verifies 
its integrity and the identity of the issuer

Considering the above set of goals and high level interaction flow description, 
we can start to discuss the component inventory required to accomplish this 
vision:

1. SSO Server Instance: this component must be able to expose endpoints for 
both authentication of users by collecting and validating credentials and 
federation of identities represented by tokens from trusted IdPs within the 
enterprise. The endpoints should be composable so as to allow for multifactor 
authentication mechanisms. They will also need to return tokens that represent 
the authentication event and verified identity as well as access tokens for 
specific Hadoop services.

2. Authentication

Hadoop Summit: Security Design Lounge Session

2013-07-01 Thread Larry McCay

All -

Last week at Hadoop Summit there was a room dedicated as the summit Design 
Lounge.
This was a place where like folks could get together and talk about design 
issues with other contributors with a simple flip board and some beanbag chairs.
We used this as an opportunity to bootstrap some discussions within common-dev 
for security related topics. I'd like to summarize the security session and 
takeaways here for everyone.

This summary and set of takeaways are largely from memory. 
Please - anyone that attended - feel free to correct anything that is 
inaccurate or omitted.

Pretty well attended - companies represented:
* Yahoo!
* Microsoft
* Hortonworks
* Cloudera
* Intel
* eBay
* Voltage Security
* Flying Penguins
* EMC
* others...

Most folks were pretty engaged throughout the session.
We set expectations as a meet and greet/project kickoff - project being the 
emerging security development community.

In order to keep the scope of conversations manageable we tried to keep focused 
on authentication and the ideas around SSO and tokens.

We discussed kerberos as:
1. major pain point and barrier to entry for some
2. seemingly perfect for others
a. obviously requiring backward compatibility

It seemed to be consensus that:
1. user authentication should be easily integrated with alternative enterprise 
identity solutions
2. that service identity issues should not require thousands of service 
identities added to enterprise user repositories
3. that customers should not be forced to install/deploy and manage a KDC for 
services - this implies a couple options:
a. alternatives to kerberos for service identities
b. hadoop KDC implementation - ie. ApacheDS?

There was active discussion around:
1. Hadoop SSO server
a. acknowledgement of Hadoop SSO tokens as something that can be 
standardized for representing both the identity and authentication event data 
as well and access tokens representing a verifiable means for the authenticated 
identity to access resources or services
b. a general understanding of Hadoop SSO as being an analogue and 
alternative for the kerberos KDC and the related tokens being analogous to TGTs 
and service tickets
c. an agreement that there are interesting attributes about the 
authentication event that may be useful in cross cluster trust for SSO - such 
as a rating of authentication strength and number of factors, etc
d. that existing Hadoop tokens - ie. delegation, job, block access - 
will all continue to work and that we are initially looking at alternatives to 
the KDC, TGTs and service tickets
2. authentication mechanism discovery by clients - Daryn Sharp has done a bunch 
of work around this and our SSO solution may want to consider a similar 
mechanism for discovering trusted IDPs and service endpoints
3. backward compatibility - kerberos shops need to just continue to work
4. some insight into where/how folks believe that token based authentication 
can be accomplished within existing contracts - SASL/GSSAPI, REST, web ui
5. what the establishment of a cross cutting concern community around security 
and what that means in terms of the Apache way - email lists, wiki, Jiras 
across projects, etc
6. dependencies, rolling updates, patching and how it related to hadoop 
projects versus packaging
7. collaboration road ahead

A number of breakout discussions were had outside of the designated design 
lounge session as well.

Takeaways for the immediate road ahead:
1. common-dev may be sufficient to discuss security related topics
a. many developers are already subscribed to it
b. there is not that much traffic there anyway
c. we can discuss a more security focused list if we like
2. we will discuss the establishment of a wiki space for a holistic view of 
security model, patterns, approaches, etc
3. we will begin discussion on common-dev in near-term for the following:
a. discuss and agree on the high level moving parts required for our 
goals for authentication: SSO service, tokens, token validation handlers, 
credential management tools, etc
b. discuss and agree on the natural seams across these moving parts and 
agree on collaboration by tackling various pieces in a divide and conquer 
approach
c. more than likely - the first piece that will need some immediate 
discussion will be the shape and form of the tokens
d. we will follow up or supplement discussions with POC code patches 
and/or specs attached to jiras

Overall, design lounge was rather effective for what we wanted to do - which 
was to bootstrap discussions and collaboration within the community at large. 
As always, no specific decisions have been made during this session and we can 
discuss any or all of this within common-dev and on related jiras.

Jiras related to the security development group and these discussions:

Centralized SSO/Token Server

Re: Fostering a Hadoop security dev community

2013-06-20 Thread Larry McCay

It would be great to have dedicated resources like these.
One thing missing for cross cutting concerns like security is a source of
truth for a holistic view of the entire model.
A dedicated wiki space would allow for this view and facilitate the filing
of Jiras that align with the big picture.

On Thu, Jun 20, 2013 at 12:31 PM, Kevin Minder kevin.min...@hortonworks.com
 wrote:

 Hi PMCs  Everyone,

 There are a number of significant, complex and overlapping efforts
 underway to improve the Hadoop security model.  Many involved are
 struggling to form this into a cohesive whole across the numerous Jiras and
 within the traffic of common-dev.  There has been a suggestion made that
 having two additional pieces of infrastructure might help.

 1) Establish a security-dev mailing list similar to hdfs-dev, yarn-dev,
 mapreduce-dev, etc. that would help us have more focused interaction on
 non-vulnerability security topics.  I understand that this might devalue
 common-dev somewhat but the benefits might outweigh that.

 2) Establish a corner of the wiki were cross cutting security design could
 be worked out more collaboratively than a doc rev upload mechanism.  I fear
 if we don't have this we will end up collaborating outside Apache
 infrastructure which seems inappropriate.  I understand the risk of losing
 context in the individual Jiras but again my sense is that the cohesiveness
 provided will outweigh the risk.

 I'm open to and interested in other suggestions for how others have solved
 these types of cross cutting collaboration challenges.

 Thanks.
 Kevin.

Re: Fostering a Hadoop security dev community

2013-06-20 Thread Larry McCay

That's a good question

I think that we could let the security vulnerability list know about it for
one thing.
There should be representatives of many - if not all - of the projects in
the ecosystem.

I suppose we could file a Jira for each to have someone represent their
security concerns to the larger security community?

Any suggestions or thoughts on those ideas would be great.


On Thu, Jun 20, 2013 at 1:31 PM, Alejandro Abdelnur t...@cloudera.comwrote:

 This sounds great,

 Is this restricted to the Hadoop project itself or the intention is to
 cover the whole Hadoop ecosystem? If the later, how are you planning to
 engage and sync up with the different projects?

 Thanks.


 On Thu, Jun 20, 2013 at 9:45 AM, Larry McCay lmc...@hortonworks.com
 wrote:

  It would be great to have dedicated resources like these.
  One thing missing for cross cutting concerns like security is a source of
  truth for a holistic view of the entire model.
  A dedicated wiki space would allow for this view and facilitate the
 filing
  of Jiras that align with the big picture.
 
  On Thu, Jun 20, 2013 at 12:31 PM, Kevin Minder 
  kevin.min...@hortonworks.com
   wrote:
 
   Hi PMCs  Everyone,
  
   There are a number of significant, complex and overlapping efforts
   underway to improve the Hadoop security model.  Many involved are
   struggling to form this into a cohesive whole across the numerous Jiras
  and
   within the traffic of common-dev.  There has been a suggestion made
 that
   having two additional pieces of infrastructure might help.
  
   1) Establish a security-dev mailing list similar to hdfs-dev, yarn-dev,
   mapreduce-dev, etc. that would help us have more focused interaction on
   non-vulnerability security topics.  I understand that this might
  devalue
   common-dev somewhat but the benefits might outweigh that.
  
   2) Establish a corner of the wiki were cross cutting security design
  could
   be worked out more collaboratively than a doc rev upload mechanism.  I
  fear
   if we don't have this we will end up collaborating outside Apache
   infrastructure which seems inappropriate.  I understand the risk of
  losing
   context in the individual Jiras but again my sense is that the
  cohesiveness
   provided will outweigh the risk.
  
   I'm open to and interested in other suggestions for how others have
  solved
   these types of cross cutting collaboration challenges.
  
   Thanks.
   Kevin.
  
 



 --
 Alejandro

Re: Fostering a Hadoop security dev community

2013-06-20 Thread Larry McCay

Yes, sorry for not explicitly stating it in my previous reply - this should
be a community built from representatives across the entire ecosystem.
My previous email was speaking to how we reach out to them.


On Thu, Jun 20, 2013 at 1:49 PM, Zheng, Kai kai.zh...@intel.com wrote:

 In my view it should be for the whole ecosystem. One inspiration of this
 is to ease the collaboration and discussion for the work on going about
 token based authentication and SSO, which absolutely targets the ecosystem,
 although the coming up libraries and facilities might reside in hadoop
 common umbrella.

 -Original Message-
 From: Alejandro Abdelnur [mailto:t...@cloudera.com]
 Sent: Friday, June 21, 2013 1:32 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: Fostering a Hadoop security dev community

 This sounds great,

 Is this restricted to the Hadoop project itself or the intention is to
 cover the whole Hadoop ecosystem? If the later, how are you planning to
 engage and sync up with the different projects?

 Thanks.


 On Thu, Jun 20, 2013 at 9:45 AM, Larry McCay lmc...@hortonworks.com
 wrote:

  It would be great to have dedicated resources like these.
  One thing missing for cross cutting concerns like security is a source
  of truth for a holistic view of the entire model.
  A dedicated wiki space would allow for this view and facilitate the
  filing of Jiras that align with the big picture.
 
  On Thu, Jun 20, 2013 at 12:31 PM, Kevin Minder 
  kevin.min...@hortonworks.com
   wrote:
 
   Hi PMCs  Everyone,
  
   There are a number of significant, complex and overlapping efforts
   underway to improve the Hadoop security model.  Many involved are
   struggling to form this into a cohesive whole across the numerous
   Jiras
  and
   within the traffic of common-dev.  There has been a suggestion made
   that having two additional pieces of infrastructure might help.
  
   1) Establish a security-dev mailing list similar to hdfs-dev,
   yarn-dev, mapreduce-dev, etc. that would help us have more focused
   interaction on non-vulnerability security topics.  I understand that
   this might
  devalue
   common-dev somewhat but the benefits might outweigh that.
  
   2) Establish a corner of the wiki were cross cutting security design
  could
   be worked out more collaboratively than a doc rev upload mechanism.
   I
  fear
   if we don't have this we will end up collaborating outside Apache
   infrastructure which seems inappropriate.  I understand the risk of
  losing
   context in the individual Jiras but again my sense is that the
  cohesiveness
   provided will outweigh the risk.
  
   I'm open to and interested in other suggestions for how others have
  solved
   these types of cross cutting collaboration challenges.
  
   Thanks.
   Kevin.
  
 



 --
 Alejandro

[jira] [Created] (HADOOP-9533) Hadoop SSO/Token Service

2013-05-01 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-9533:
---

 Summary: Hadoop SSO/Token Service
 Key: HADOOP-9533
 URL: https://issues.apache.org/jira/browse/HADOOP-9533
 Project: Hadoop Common
  Issue Type: New Feature
  Components: security
Reporter: Larry McCay


This is an umbrella Jira filing to oversee a set of proposals for introducing a 
new master service for Hadoop Single Sign On (HSSO).

There is an increasing need for pluggable authentication providers that 
authenticate both users and services as well as validate tokens in order to 
federate identities authenticated by trusted IDPs. These IDPs may be deployed 
within the enterprise or third-party IDPs that are external to the enterprise.

These needs speak to a specific pain point: which is a narrow integration path 
into the enterprise identity infrastructure. Kerberos is a fine solution for 
those that already have it in place or are willing to adopt its use but there 
remains a class of user that finds this unacceptable and needs to integrate 
with a wider variety of identity management solutions.

Another specific pain point is that of rolling and distributing keys. A related 
and integral part of the HSSO server is library called the Credential 
Management Framework (CMF), which will be a common library for easing the 
management of secrets, keys and credentials.

Initially, the existing delegation, block access and job tokens will continue 
to be utilized. There may be some changes required to leverage a PKI based 
signature facility rather than shared secrets. This is a means to simplify the 
solution for the pain point of distributing shared secrets.

This project will primarily centralize the responsibility of authentication and 
federation into a single service that is trusted across the Hadoop cluster and 
optionally across multiple clusters. This greatly simplifies a number of things 
in the Hadoop ecosystem:

1.  a single token format that is used across all of Hadoop regardless of 
authentication method
2.  a single service to have pluggable providers instead of all services
3.  a single token authority that would be trusted across the cluster/s and 
through PKI encryption be able to easily issue cryptographically verifiable 
tokens
4.  automatic rolling of the token authority’s keys and publishing of the 
public key for easy access by those parties that need to verify incoming tokens
5.  use of PKI for signatures eliminates the need for securely sharing and 
distributing shared secrets

In addition to serving as the internal Hadoop SSO service this service will be 
leveraged by the Knox Gateway from the cluster perimeter in order to acquire 
the Hadoop cluster tokens. The same token mechanism that is used for internal 
services will be used to represent user identities. Providing for interesting 
scenarios such as SSO across Hadoop clusters within an enterprise and/or into 
the cloud.

The HSSO service will be comprised of three major components and capabilities:

1.  Federating IDP – authenticates users/services and issues the common 
Hadoop token
2.  Federating SP – validates the token of trusted external IDPs and issues 
the common Hadoop token
3.  Token Authority – management of the common Hadoop tokens – including: 
a.  Issuance 
b.  Renewal
c.  Revocation

As this is a meta Jira for tracking this overall effort, the details of the 
individual efforts will be submitted along with the child Jira filings.

Hadoop-Common would seem to be the most appropriate home for such a service and 
its related common facilities. We will also leverage and extend existing common 
mechanisms as appropriate.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-9534) Credential Management Framework (CMF)

2013-05-01 Thread Larry McCay (JIRA)

Larry McCay created HADOOP-9534:
---

 Summary: Credential Management Framework (CMF)
 Key: HADOOP-9534
 URL: https://issues.apache.org/jira/browse/HADOOP-9534
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Reporter: Larry McCay


The credential management framework consists of library for securing, acquiring 
and rolling credentials for a given Hadoop service.

Specifically the library will provide:

1. Password Indirection or Aliasing
2. Management of identity and trust keystores
3. Rolling of key pairs and credentials
4. Discovery of externally provisioned credentials
5. Service specific CMF secret protection

Password Indirection or Aliasing:
By providing alias based access to actual secrets stored within a service 
specific JCEKS keystore, we are able to eliminate the need for any secret to be 
stored in clear text on the filesystem. This is a current redflag during 
security reviews for many customers.

Management of Identity and Trust Keystores:
Service specific identity and trust keystores will be managed by a combination 
of the HSSO service and CMF. 

Upon registration with the HSSO service a dependent service will be able 
discover externally provisioned keystores or have them created by the HSSO 
service on its behalf. The public key of the HSSO service will be provided to 
the service to be imported into its service specific trust store.

Service specific keystores and credential stores will be protected with the 
service specific CMF secret.

Rolling of Keypairs and Credentials:
The ability to automate the rolling of PKI keypairs and credentials provide the 
services a common facility for discovering new HSSO public keys and the need 
and means to roll their own credentials while being able to retain a number of 
previous values (as needed).

Discovery of Externally Provisioned Credentials:
For environments that want control over the certificate generation and 
provisioning, CMF provides the ability to discover preprovisioned artifacts 
based on naming conventions of the artifacts and the use of the service 
specific CMF secret to access the credentials within the keystores.

Service Specific CMF Secret Protection:
By providing a common facility to prompt for and optionally persist a service 
specific CMF secret at service installation/startup, we enable the ability to 
protect all the service specific security artifacts with this protected secret. 
It is protected with a combination of AES 128 bit encryption and file 
permissions set for only the service specific OS user.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

96 matches

Mail list logo