2017-09-20 Hadoop 3 release status update

2017-09-29 Thread Andrew Wang
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-29

After about a month of slip, RC0 has been sent out for a VOTE. Focus now
turns to GA, where we will attempt to keep the original beta1 target date
(early November).

Highlights:

   - RC0 vote was sent out on Thursday, two binding +1's so far.

Red flags:

   - Resource profiles still has a number of pending subtasks, which is
   concerning from a schedule perspective. I emailed Wangda about this, and we
   need to discuss with other key contributors.
   - Native services has one pending subtask but we haven't gotten
   follow-on reviews from Allen (who -1'd the earlier merge vote). Need to
   confirm that we've satisfied his feedback.

Previously tracked beta1 blockers that have been resolved or dropped:

   - YARN-6623 was pushed out of beta1 to GA, has been committed so we can
   drop it from tracking.
   - HADOOP-14897  (Loosen
   compatibility guidelines for native dependencies): Patch committed!

beta1 blockers:

   - None, RC0 is out

GA blockers:

   - YARN-7134
    -
AppSchedulingInfo
   has a dependency on capacity scheduler OPEN  : this one popped out of
   nowhere, I don't have an update yet.
   - YARN-7178
    - Add
   documentation for Container Update API OPEN : this also popped out of
   nowhere, no update yet.
   - YARN-7275
    - NM
   Statestore cleanup for Container updates OPEN : Ditto
   - YARN-4859
    - [Bug]
   Unable to submit a job to a reservation when using FairScheduler OPEN :
   Ditto
   - YARN-4827
    - Document
   configuration of ReservationSystem for FairScheduler OPEN : Ditto

Features merged for GA:

   - Erasure coding
  - People are looking more at the flaky tests and nice-to-haves
  - Some bugs reported and being fixed based on testing at Cloudera
  - Need to finish the 3.0 must-do's.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Sean has posted a new rev of the rolling upgrade script
  - Some YARN PB backward compat issues that we decided weren't
  blockers and are scheduled for GA
   - Classpath isolation (HADOOP-11656)
  - HADOOP-13917
 (Ensure
  nightly builds run the integration tests for the shaded client):
Resolved,
  Sean retriggered and determined that this works.
  - HADOOP-14771 is still floating, along with adding documentation.
   - Compat guide (HADOOP-13714
   )
  - A few subtasks are targeted at GA
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

Unmerged features:

   - Resource profiles (YARN-3926
    and YARN-7069
   ) (Wangda Tan)
  - This has been merged for 3.1.0, YARN-7069 tracks follow on work
  - ~7 patch available subtasks, I asked Wangda to set up a JIRA query
  for tracking this
   - HDFS router-based federation (HDFS-10467
   ) (Inigo Goiri and
   Chris Douglas)
   - Inigo sent out the merge vote
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan sent out a discuss thread for merge, thinking is early
  next week. Larry did a security-oriented review.
   - YARN native services (YARN-5079
   ) (Jian He)
  - Subtasks were filed to address Allen's review comments from the
  previous merge vote, only one pending
  - We need to confirm with Allen that this is ready to go, he hasn't
  been reviewing


Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread Jonathan Hung
Thanks Andrew and Larry for the feedback. I was hoping to start a merge
vote early next week, because of the 2.9 deadline. (I suppose meeting this
deadline depends on the outcome of this DISCUSS thread.) Appreciate any
questions you have on the JIRA.

To answer your questions Larry:
*Is this feature extending the existing YARM RM REST API?*
Yes, this feature adds another endpoint to the YARN RM REST API, for users
to send their configuration change requests.
*When it isn't enabled what is the API behavior?*
When disabled and API is called, nothing happens, it will return HTTP 400
bad request.
*Does it implement the trusted proxy pattern for proxies to be able to
impersonate users and most importantly to dictate what proxies would be
allowed to impersonate an admin for this API - which I assume will be
required?*
Right now there's a pluggable policy which controls which users can make
which configuration changes (see YARN-5949). The default policy is to only
allow YARN admins (i.e. users in yarn.admin.acl) to make changes. There's
also an implementation of a more relaxed policy which allows admins of
queues to make configuration modifications to their own queue. Not sure if
this answers your question.

Thanks,

Jonathan Hung

On Fri, Sep 29, 2017 at 12:01 PM, larry mccay  wrote:

> Hi Jonathan -
>
> Thank you for bringing this up for discussion!
>
> I would personally like to see a specific security review of features like
> this - especially ones that allow for remote access to configuration.
> I'll take a look at the JIRA and see whether I can come up with any
> concerns or questions and I would urge others to give it a pass from a
> security perspective as well.
>
> In addition, here are a couple questions of the top of my head:
>
> Is this feature extending the existing YARM RM REST API?
> When it isn't enabled what is the API behavior?
> Does it implement the trusted proxy pattern for proxies to be able to
> impersonate users and most importantly to dictate what proxies would be
> allowed to impersonate an admin for this API - which I assume will be
> required?
>
> --larry
>
> On Fri, Sep 29, 2017 at 2:44 PM, Andrew Wang 
> wrote:
>
>> Hi Jonathan,
>>
>> I'm okay with putting this into branch-3.0 for GA if it can be merged
>> within the next two weeks. Even though beta1 has slipped by a month, I
>> want
>> to stick to the targeted GA data of Nov 1st as much as possible. Of
>> course,
>> let's not sacrifice quality or stability for speed; if something's not
>> ready, let's defer it to 3.1.0.
>>
>> Subru, have you been able to review this feature from the 2.9.0
>> perspective? It'd add confidence if you think it's immediately ready for
>> merging to branch-2 for 2.9.0.
>>
>> Thanks,
>> Andrew
>>
>> On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > Starting this thread to discuss merging API-based scheduler
>> configuration
>> > to trunk/branch-2. The feature adds the framework for allowing users to
>> > modify scheduler configuration via REST or CLI using a configurable
>> backend
>> > (leveldb/zk are currently supported), and adds capacity scheduler
>> support
>> > for this. The umbrella JIRA is YARN-5734. All the required work for this
>> > feature is done and committed to branch YARN-5734, and a full diff has
>> been
>> > generated at YARN-7241.
>> >
>> > Regarding compatibility, this feature is configurable and turned off by
>> > default.
>> >
>> > The feature has been tested locally on a couple RMs (since it is an RM
>> > only change), with queue addition/removal/updates tested on single RM
>> > (leveldb) and two RMs (zk). Also we verified the original configuration
>> > update mechanism (via refreshQueues) is unaffected when the feature is
>> > off/not configured.
>> >
>> > Our original plan was to merge this to trunk (which is what the
>> YARN-7241
>> > diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
>> > what are your thoughts on also merging this to branch-3.0?
>> >
>> > Thanks!
>> >
>> > Jonathan Hung
>> >
>>
>
>


Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread larry mccay
Hi Jonathan -

Thank you for bringing this up for discussion!

I would personally like to see a specific security review of features like
this - especially ones that allow for remote access to configuration.
I'll take a look at the JIRA and see whether I can come up with any
concerns or questions and I would urge others to give it a pass from a
security perspective as well.

In addition, here are a couple questions of the top of my head:

Is this feature extending the existing YARM RM REST API?
When it isn't enabled what is the API behavior?
Does it implement the trusted proxy pattern for proxies to be able to
impersonate users and most importantly to dictate what proxies would be
allowed to impersonate an admin for this API - which I assume will be
required?

--larry

On Fri, Sep 29, 2017 at 2:44 PM, Andrew Wang 
wrote:

> Hi Jonathan,
>
> I'm okay with putting this into branch-3.0 for GA if it can be merged
> within the next two weeks. Even though beta1 has slipped by a month, I want
> to stick to the targeted GA data of Nov 1st as much as possible. Of course,
> let's not sacrifice quality or stability for speed; if something's not
> ready, let's defer it to 3.1.0.
>
> Subru, have you been able to review this feature from the 2.9.0
> perspective? It'd add confidence if you think it's immediately ready for
> merging to branch-2 for 2.9.0.
>
> Thanks,
> Andrew
>
> On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
> wrote:
>
> > Hi everyone,
> >
> > Starting this thread to discuss merging API-based scheduler configuration
> > to trunk/branch-2. The feature adds the framework for allowing users to
> > modify scheduler configuration via REST or CLI using a configurable
> backend
> > (leveldb/zk are currently supported), and adds capacity scheduler support
> > for this. The umbrella JIRA is YARN-5734. All the required work for this
> > feature is done and committed to branch YARN-5734, and a full diff has
> been
> > generated at YARN-7241.
> >
> > Regarding compatibility, this feature is configurable and turned off by
> > default.
> >
> > The feature has been tested locally on a couple RMs (since it is an RM
> > only change), with queue addition/removal/updates tested on single RM
> > (leveldb) and two RMs (zk). Also we verified the original configuration
> > update mechanism (via refreshQueues) is unaffected when the feature is
> > off/not configured.
> >
> > Our original plan was to merge this to trunk (which is what the YARN-7241
> > diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
> > what are your thoughts on also merging this to branch-3.0?
> >
> > Thanks!
> >
> > Jonathan Hung
> >
>


Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread Andrew Wang
Hi Jonathan,

I'm okay with putting this into branch-3.0 for GA if it can be merged
within the next two weeks. Even though beta1 has slipped by a month, I want
to stick to the targeted GA data of Nov 1st as much as possible. Of course,
let's not sacrifice quality or stability for speed; if something's not
ready, let's defer it to 3.1.0.

Subru, have you been able to review this feature from the 2.9.0
perspective? It'd add confidence if you think it's immediately ready for
merging to branch-2 for 2.9.0.

Thanks,
Andrew

On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
wrote:

> Hi everyone,
>
> Starting this thread to discuss merging API-based scheduler configuration
> to trunk/branch-2. The feature adds the framework for allowing users to
> modify scheduler configuration via REST or CLI using a configurable backend
> (leveldb/zk are currently supported), and adds capacity scheduler support
> for this. The umbrella JIRA is YARN-5734. All the required work for this
> feature is done and committed to branch YARN-5734, and a full diff has been
> generated at YARN-7241.
>
> Regarding compatibility, this feature is configurable and turned off by
> default.
>
> The feature has been tested locally on a couple RMs (since it is an RM
> only change), with queue addition/removal/updates tested on single RM
> (leveldb) and two RMs (zk). Also we verified the original configuration
> update mechanism (via refreshQueues) is unaffected when the feature is
> off/not configured.
>
> Our original plan was to merge this to trunk (which is what the YARN-7241
> diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
> what are your thoughts on also merging this to branch-3.0?
>
> Thanks!
>
> Jonathan Hung
>


[jira] [Created] (MAPREDUCE-6973) Remove comment reference of "_done" that was replaced with "_SUCCESS"

2017-09-29 Thread Mehul (JIRA)
Mehul created MAPREDUCE-6973:


 Summary: Remove comment reference of "_done" that was replaced 
with "_SUCCESS"
 Key: MAPREDUCE-6973
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6973
 Project: Hadoop Map/Reduce
  Issue Type: Wish
  Components: documentation
Affects Versions: 3.0.0-beta1
Reporter: Mehul
Priority: Trivial
 Fix For: 3.0.0-beta1


I went through couple of old JIRA issues and understood that earlier app was 
creating "_done" file on job has completed successfully. After some 
conversation by group decided to create "_SUCCESS" instead of "_done". However, 
while learning the code, I found there is one comment has reference of "_done" 
and would like to start with small contribution to fix it. 

Note: I would like to work on this trivial issue so can learn for standard 
process of contribution steps so that will help myself to come on track for 
future contribution that I would like to do. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-09-29 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/542/

[Sep 28, 2017 12:10:26 PM] (weichiu) HDFS-12458. TestReencryptionWithKMS fails 
regularly. Contributed by Xiao
[Sep 28, 2017 5:22:27 PM] (cliang) HDFS-12560. Remove the extra word it in 
HdfsUserGuide.md. Contributed by
[Sep 28, 2017 6:52:56 PM] (stevel) HADOOP-14768. Honoring sticky bit during 
Deletion when authorization is
[Sep 28, 2017 7:10:15 PM] (jlowe) YARN-7248. NM returns new SCHEDULED container 
status to older clients.
[Sep 28, 2017 8:04:03 PM] (subru) YARN-6962. Add support for updateContainers 
when allocating using
[Sep 28, 2017 9:38:30 PM] (jlowe) HADOOP-14902. LoadGenerator#genFile write 
close timing is incorrectly
[Sep 28, 2017 10:28:06 PM] (ctrezzo) YARN-7250. Update Shared cache client api 
to use URLs.
[Sep 28, 2017 11:41:09 PM] (wangda) YARN-6623. Add support to turn off 
launching privileged containers in


[Error replacing 'FILE' - Workspace is not accessible]

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6972) enable try-with-resources for RecordReader

2017-09-29 Thread Zoltan Haindrich (JIRA)
Zoltan Haindrich created MAPREDUCE-6972:
---

 Summary: enable try-with-resources for RecordReader
 Key: MAPREDUCE-6972
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6972
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Zoltan Haindrich



{{org.apache.hadoop.mapred.RecordReader}} has a close method; but doesn't 
implement closeable; it would be nice to add that - it would enable to use:

{code}
try( org.apache.hadoop.mapred.RecordReader recordReader = 
inputFormat.getRecordReader(... )   ){
 [...]
}
{code}

...supporting t-w-r makes it easier to throw exceptions more safely



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org