Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Carlo Aldo Curino
I haven't tested this, but I support the merge as the patch is very much
needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?

Thanks for this contribution!

Cheers,
Carlo

On Nov 29, 2017 6:34 PM, "Weiwei Yang"  wrote:

> Hi Sunil
>
> +1 from my side.
> Actually we have applied some of these patches to our production cluster
> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
> merge. I am pretty sure this feature will help a lot of users, especially
> those on cloud. Thanks for getting this done, great job!
>
> --
> Weiwei
>
> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
> rohithsharm...@apache.org>, wrote:
> +1, thanks Sunil for working on this feature!
>
> -Rohith Sharma K S
>
> On 24 November 2017 at 23:19, Sunil G  wrote:
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
> - YARN-6471. Support to add min/max resource configuration for a queue
> - YARN-7332. Compute effectiveCapacity per each resource vector
> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
> handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* 
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>


Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Weiwei Yang
Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since 
Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am 
pretty sure this feature will help a lot of users, especially those on cloud. 
Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S , 
wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G  wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* 

[VOTE] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Sunil G
Hi All,


Based on the discussion at [1], I'd like to start a vote to merge feature
branch

YARN-5881 to trunk. Vote will run for 7 days, ending Wednesday Dec 6 at
6:00PM PDT.


This branch adds support to configure queue capacity as absolute resource in

capacity scheduler. This will help admins who want fine control of
resources of queues.


Feature development is done at YARN-5881 [2], jenkins build is here
(YARN-7510 [3]).

All required tasks for this feature are committed. This feature changes
RM’s Capacity Scheduler only,

and we did extensive tests for the feature in the last couple of months
including performance tests.


Key points:

- The feature is turned off by default, and have to configure absolute
resource to enable same.

- Detailed documentation about how to use this feature is done as part of
[4].

- No major performance degradation is observed with this branch work. SLS
and UT performance

tests are done.


There were 11 subtasks completed for this feature.


Huge thanks to everyone who helped with reviews, commits, guidance, and

technical discussion/design, including Wangda Tan, Vinod Vavilapalli,
Rohith Sharma K S, Eric Payne .


[1] :
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhKhF1JCtR7ZFuZSEKQ4sBvN_n_tV5GHsbJ3YeyJP%2BP4Q%40mail.gmail.com%3E

[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7510

[4] : https://issues.apache.org/jira/browse/YARN-7533


Regards

Sunil and Wangda


Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Rohith Sharma K S
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G  wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>- YARN-6471. Support to add min/max resource configuration for a queue
>- YARN-7332. Compute effectiveCapacity per each resource vector
>- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* 
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>


[jira] [Resolved] (HADOOP-15075) Implement KnoxSSO for hadoop web UIs (hdfs, yarn, history server etc.)

2017-11-29 Thread Larry McCay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay resolved HADOOP-15075.
--
Resolution: Not A Problem

Closing as not a problem - since JWTRedirectAuthenticationHandler should cover 
this usecase. See HADOOP-11717.

> Implement KnoxSSO for hadoop web UIs (hdfs, yarn, history server etc.)
> --
>
> Key: HADOOP-15075
> URL: https://issues.apache.org/jira/browse/HADOOP-15075
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: hdfs-client, security
>Affects Versions: 3.0.0-alpha3
>Reporter: madhu raghavendra
> Fix For: site
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> Need to implement Knox SSO login feature for hadoop webUIs like HDFS 
> Namenode, Yarn RM, MR2 Job history server, spark etc. I know that we have 
> SPNEGO feature enabled, however having Knox SSO login feature seems to be a 
> good option



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15075) Implement KnoxSSO for hadoop web UIs (hdfs, yarn, history server etc.)

2017-11-29 Thread madhu raghavendra (JIRA)
madhu raghavendra created HADOOP-15075:
--

 Summary: Implement KnoxSSO for hadoop web UIs (hdfs, yarn, history 
server etc.)
 Key: HADOOP-15075
 URL: https://issues.apache.org/jira/browse/HADOOP-15075
 Project: Hadoop Common
  Issue Type: New Feature
  Components: hdfs-client
Affects Versions: 3.0.0-alpha3
Reporter: madhu raghavendra
 Fix For: site


Need to implement Knox SSO login feature for hadoop webUIs like HDFS Namenode, 
Yarn RM, MR2 Job history server, spark etc. I know that we have SPNEGO feature 
enabled, however having Knox SSO login feature seems to be a good feature.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-11-29 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/54/

[Nov 27, 2017 10:52:18 PM] (kihwal) HDFS-12754. Lease renewal can hit a 
deadlock. Contributed by Kuhu
[Nov 27, 2017 10:54:27 PM] (yufei) YARN-7363. ContainerLocalizer don't have a 
valid log4j config in case of
[Nov 28, 2017 5:42:41 AM] (yqlin) HDFS-12858. RBF: Add router admin commands 
usage in HDFS commands
[Nov 28, 2017 11:57:51 AM] (stevel) HADOOP-15042. Azure 
PageBlobInputStream.skip() can return negative value




-1 overall


The following subsystems voted -1:
asflicense unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Unreaped Processes :

   hadoop-common:1 
   hadoop-hdfs:12 
   bkjournal:5 
   hadoop-yarn-server-nodemanager:1 
   hadoop-yarn-server-timelineservice:1 
   hadoop-yarn-client:8 
   hadoop-yarn-applications-distributedshell:1 
   hadoop-mapreduce-client-app:1 
   hadoop-mapreduce-client-jobclient:15 
   hadoop-distcp:4 
   hadoop-extras:1 
   hadoop-sls:1 

Failed junit tests :

   hadoop.crypto.key.kms.server.TestKMS 
   hadoop.yarn.server.nodemanager.webapp.TestNMWebServer 
   
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
   hadoop.yarn.server.TestDiskFailures 
   hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken 
   hadoop.mapred.TestReduceFetch 
   hadoop.fs.slive.TestSlive 
   hadoop.mapred.TestLazyOutput 
   hadoop.fs.TestFileSystem 
   hadoop.conf.TestNoDefaultsJobConf 
   hadoop.fs.TestDFSIO 
   hadoop.mapred.TestJobSysDirWithDFS 
   hadoop.tools.TestDistCpSystem 
   hadoop.tools.TestIntegration 
   hadoop.tools.TestDistCpViewFs 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   hadoop.yarn.sls.TestReservationSystemInvariants 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.resourceestimator.service.TestResourceEstimatorService 

Timed out junit tests :

   org.apache.hadoop.log.TestLogLevel 
   org.apache.hadoop.hdfs.TestLeaseRecovery2 
   org.apache.hadoop.hdfs.TestRead 
   org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream 
   org.apache.hadoop.hdfs.TestDatanodeLayoutUpgrade 
   org.apache.hadoop.hdfs.TestReadWhileWriting 
   org.apache.hadoop.hdfs.TestDFSMkdirs 
   org.apache.hadoop.hdfs.TestDFSOutputStream 
   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs 
   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs 
   org.apache.hadoop.hdfs.TestDistributedFileSystem 
   org.apache.hadoop.hdfs.TestReplaceDatanodeFailureReplication 
   org.apache.hadoop.hdfs.TestDFSShell 
   org.apache.hadoop.contrib.bkjournal.TestBootstrapStandbyWithBKJM 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperJournalManager 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperAsHASharedDir 
   org.apache.hadoop.contrib.bkjournal.TestBookKeeperSpeculativeRead 
   org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater 
   
org.apache.hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServices
 
   org.apache.hadoop.yarn.client.TestRMFailover 
   org.apache.hadoop.yarn.client.cli.TestYarnCLI 
   org.apache.hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA 
   org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClientWithReservation 
   org.apache.hadoop.yarn.client.api.impl.TestYarnClient 
   org.apache.hadoop.yarn.client.api.impl.TestAMRMClient 
   org.apache.hadoop.yarn.client.api.impl.TestNMClient 
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell 
   org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup 
   org.apache.hadoop.mapred.lib.TestDelegatingInputFormat 
   org.apache.hadoop.mapred.TestClusterMRNotification 
   org.apache.hadoop.mapred.TestMiniMRClasspath 
   org.apache.hadoop.mapred.TestMRCJCFileInputFormat 
   org.apache.hadoop.mapred.TestClusterMapReduceTestCase 
   org.apache.hadoop.mapred.TestMRIntermediateDataEncryption 
   org.apache.hadoop.mapred.TestMRTimelineEventHandling 
   org.apache.hadoop.mapred.join.TestDatamerge 
   org.apache.hadoop.mapred.TestJobName 
   org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers 
   org.apache.hadoop.mapred.TestNetworkedJob 
   org.apache.hadoop.mapred.TestReduceFetchFromPartialMem 
   org.apache.hadoop.mapred.TestMROpportunisticMaps 
   org.apache.hadoop.mapred.TestMerge 
   

[jira] [Created] (HADOOP-15074) SequenceFile#Writer flush does not update the length of the written file.

2017-11-29 Thread Mukul Kumar Singh (JIRA)
Mukul Kumar Singh created HADOOP-15074:
--

 Summary: SequenceFile#Writer flush does not update the length of 
the written file.
 Key: HADOOP-15074
 URL: https://issues.apache.org/jira/browse/HADOOP-15074
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mukul Kumar Singh
Assignee: Mukul Kumar Singh


SequenceFile#Writer flush does not update the length of the file. This happens 
because as part of the flush, {{UPDATE_LENGTH}} flag is not passed to the 
DFSOutputStream#hsync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-11-29 Thread Sunil G
Thanks Eric. Appreciate the support in verifying the feature.
YARN-7575 is closed now.

- Sunil


On Tue, Nov 28, 2017 at 11:15 PM Eric Payne
 wrote:

> Thanks Sunil for the great work on this feature.
> I looked through the design document, reviewed the code, and tested out
> branch YARN-5881. The design makes sense and the code looks like it is
> implementing the desing in a sensible way. However, I have encountered a
> couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
> to track my findings. Basically, here's a summary:
>
> The design document from YARN-5881 says that for max-capacity:
> 3)  For each queue, we require: a) if max-resource not set, it
> automatically set to parent.max-resource
>
> When I try not setting
> anyyarn.scheduler.capacity..maximum-capacity, the RMUI
> scheduler page refuses to render. It looks like it's in
> CapacitySchedulerPage$LeafQueueInfoBlock.
>
> Also... A job will run in the leaf queue with no max capacity set and it
> will grow to the max capacity of the cluster, but if I add resources to the
> node, the job won't grow any more even though it has pending resources.
>
> Thanks,Eric
>
>
>   From: Sunil G 
>  To: "yarn-...@hadoop.apache.org" ; Hadoop
> Common ; Hdfs-dev <
> hdfs-...@hadoop.apache.org>; "mapreduce-...@hadoop.apache.org" <
> mapreduce-...@hadoop.apache.org>
>  Sent: Friday, November 24, 2017 11:49 AM
>  Subject: [DISCUSS] Merge Absolute resource configuration support in
> Capacity Scheduler (YARN-5881) to trunk
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>   - YARN-6471. Support to add min/max resource configuration for a queue
>   - YARN-7332. Compute effectiveCapacity per each resource vector
>   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>   handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* 
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>


[jira] [Created] (HADOOP-15073) SequenceFile.Reader will unexpectedly quit next() iterator while the file ends with sync and appended

2017-11-29 Thread Howard Lee (JIRA)
Howard Lee created HADOOP-15073:
---

 Summary: SequenceFile.Reader will unexpectedly quit next() 
iterator while the file ends with sync and appended
 Key: HADOOP-15073
 URL: https://issues.apache.org/jira/browse/HADOOP-15073
 Project: Hadoop Common
  Issue Type: Bug
  Components: common
Affects Versions: 3.0.0-alpha3, 2.8.2, 2.7.4, 2.6.3
Reporter: Howard Lee


The SequenceFile.Writer will insert SYNC into file every SYNC_INTERVAL.
In the case that SequenceFile ends with SYNC coincidentally, and another Writer 
open it with mode AppendIfExits, there meets the BUG.
For the AppendIfExits set _appendMode _ to _true_ , the init method will insert 
another SYNC. In such case, there will be two SYNC MAKR continuously.
In SequenceFile.Reader, the method readRecordLength will only test SYNC once, 
when there's two SYNC MARK, the _length_ will be -1(The begining of another 
SYNC), which means the end of file causing the next method quit unexpectedly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org