[jira] [Created] (YARN-8020) when DRF is used, preemption does not trigger due to incorrect idealAssigned

2018-03-09 Thread kyungwan nam (JIRA)
kyungwan nam created YARN-8020:
--

 Summary: when DRF is used, preemption does not trigger due to 
incorrect idealAssigned
 Key: YARN-8020
 URL: https://issues.apache.org/jira/browse/YARN-8020
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: kyungwan nam


I’ve met that Inter Queue Preemption does not work.
It happens when DRF is used and submitting application with a large number of 
vcores.

IMHO, idealAssigned can be set incorrectly by following code.
{code}
// This function "accepts" all the resources it can (pending) and return
// the unused ones
Resource offer(Resource avail, ResourceCalculator rc,
Resource clusterResource, boolean considersReservedResource) {
  Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
  Resources.subtract(getMax(), idealAssigned),
  Resource.newInstance(0, 0));
  // accepted = min{avail,
  //   max - assigned,
  //   current + pending - assigned,
  //   # Make sure a queue will not get more than max of its
  //   # used/guaranteed, this is to make sure preemption won't
  //   # happen if all active queues are beyond their guaranteed
  //   # This is for leaf queue only.
  //   max(guaranteed, used) - assigned}
  // remain = avail - accepted
  Resource accepted = Resources.min(rc, clusterResource,
  absMaxCapIdealAssignedDelta,
  Resources.min(rc, clusterResource, avail, Resources
  /*
   * When we're using FifoPreemptionSelector (considerReservedResource
   * = false).
   *
   * We should deduct reserved resource from pending to avoid excessive
   * preemption:
   *
   * For example, if an under-utilized queue has used = reserved = 20.
   * Preemption policy will try to preempt 20 containers (which is not
   * satisfied) from different hosts.
   *
   * In FifoPreemptionSelector, there's no guarantee that preempted
   * resource can be used by pending request, so policy will preempt
   * resources repeatly.
   */
  .subtract(Resources.add(getUsed(),
  (considersReservedResource ? pending : pendingDeductReserved)),
  idealAssigned)));
{code}

let’s say,

* cluster resource : 
* idealAssigned(assigned): 
* avail: 
* current: 
* pending: 

current + pending - assigned: 
min ( avail, (current + pending - assigned) ) : 
accepted: 

as a result, idealAssigned will be , which does not 
trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [EVENT] HDFS Bug Bash: March 12

2018-03-09 Thread 俊平堵
I cannot join this in person as I am currently oversea. but +1 on this
event.
Thanks for organizing this, Chris! Please let me know if anything else I
can help here.

Thanks,

Junping

2018-03-05 16:03 GMT-08:00 Chris Douglas :

> [Cross-posting, as this affects the rest of the project]
>
> Hey folks-
>
> As discussed last month [1], the HDFS build hasn't been healthy
> recently. We're dedicating a bug bash to stabilize the build and
> address some longstanding issues with our unit tests. We rely on our
> CI infrastructure to keep the project releasable, and in its current
> state, it's not protecting us from regressions. While we probably
> won't achieve all our goals in this session, we can develop the
> conditions for reestablishing a firm foundation.
>
> If you're new to the project, please consider attending and
> contributing. Committers often prioritize large or complicated
> patches, and the issues that make the project livable don't get enough
> attention. A bug bash is a great opportunity to pull reviewers'
> collars, and fix the annoyances that slow us all down.
>
> If you're a committer, please join us! While some of the proposed
> repairs are rote, many unit tests rely on implementation details and
> non-obvious invariants. We need domain experts to help untangle
> complex dependencies and to prevent breakage of deliberate, but
> counter-intuitive code.
>
> We're collecting tasks in wiki [2] and will include a dial-in option
> for folks who aren't local.
>
> Meetup has started charging for creating new events, so we'll have to
> find another way to get an approximate headcount and publish the
> address. Please ping me if you have a preferred alternative. -C
>
> [1]: https://s.apache.org/nEoQ
> [2]: https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=75965105
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-09 Thread Owen O'Malley
Hi Joep,

On Tue, Mar 6, 2018 at 6:50 PM, J. Rottinghuis 
wrote:

Obviously when people do want to use Ozone, then having it in the same repo
> is easier. The flipside is that, separate top-level project in the same
> repo or not, it adds to the Hadoop releases.
>

Apache projects are about the group of people who are working together.
There is a large overlap between the team working on HDFS and Ozone, which
is a lot of the motivation to keep project overhead to a minimum and not
start a new project.

Using the same releases or separate releases is a distinct choice. Many
Apache projects, such as Common and Maven, have multiple artifacts that
release independently. In Hive, we have two sub-projects that release
indepdendently: Hive Storage API, and Hive.

One thing we did during that split to minimize the challenges to the
developers was that Storage API and Hive have the same master branch.
However, since they have different releases, they have their own release
branches and release numbers.

If there is a change in Ozone and a new release needed, it would have to
> wait for a Hadoop release. Ditto if there is a Hadoop release and there is
> an issue with Ozone. The case that one could turn off Ozone through a Maven
> profile works only to some extend.
> If we have done a 3.x release with Ozone in it, would it make sense to do
> a 3.y release with y>x without Ozone in it? That would be weird.
>

Actually, if Ozone is marked as unstable/evolving (we should actually have
an even stronger warning for a feature preview), we could remove it in a
3.x. If a user picks up a feature before it is stable, we try to provide a
stable platform, but mistakes happen. Introducing an incompatible change to
the Ozone API between 3.1 and 3.2 wouldn't be good, but it wouldn't be the
end of the world.

.. Owen


[jira] [Created] (YARN-8019) RM webproxy uses the client truststore specified in ssl-client.xml

2018-03-09 Thread Aki Tanaka (JIRA)
Aki Tanaka created YARN-8019:


 Summary: RM webproxy uses the client truststore specified in 
ssl-client.xml
 Key: YARN-8019
 URL: https://issues.apache.org/jira/browse/YARN-8019
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0
Reporter: Aki Tanaka


A Yarn ResourceManager's web proxy launches with Java default SSL certificate. 
Due to this behavior, the web proxy failed to validate a backend server's SSL 
certificate when the backend server listens with HTTPS using custom SSL 
certificate. 

 

For example, Spark launches Spark context web UI with custom SSL certificate 
when we enable SSL with "spark.ssl.trustStore" and "spark.ssl.keyStore" 
properties. In this case, Yarn web proxy cannot connect the Spark context web 
UI since the web proxy cannot verify the SSL cert 
("javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path building failed" error is 
returned).

 

We should add an option to set SSL trust store to Yarn RM web proxy. Attached a 
patch to Yarn web proxy, and this patch lets web proxy use an SSL custom 
trust-store if it is configured in ssl-client.xml



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Merging branch HDFS-7240 to trunk

2018-03-09 Thread sanjay Radia
Joep,  You raise a number of points:

(1) Ozone vs and object stores. “Some users would choose Ozone as that layer, 
some might use S3, others GCS, or Azure, or something else”.
(2) How HDSL/Ozone fits into Hadoop and whether it is necessary.
(3) You raise the release issue which we will respond in a separate email.

Let me respond to 1 & 2:
***Wrt to (1) Ozone vs other object stores***
Neither HDFS or Ozone has any real role in cloud except for temp data. The cost 
of local disk or EBS is so high that long term data storage on HDFS or even 
Ozone is prohibitive.
So why the hell create the KV namespace? We need to stabilize the HDSL where 
data is stored.  - We are targeting Hive and SPark apps to stabilize HDSL using 
real Hadoop apps over OzoneFS.
But HDSL/Ozone is not feature compatible with HDFS so how will users even use 
it for real to stability. Users can run HDFS and Ozone side by side in same 
cluster and have two namespace (just like in Federation) and run apps on both: 
run some hive and spark apps on Ozone and others that need full HDFS feature 
(encryption) on HDFS. As it becomes stable they can start using HDSL/Ozone for 
production use for a portion of their data.



***Wrt to (2) HDSL/Ozone fitting into Hadoop and why the same repository***
Ozone KV is a temporary step. Real goal is to put NN on top of HDSL, We have 
shown how to do that in the roadmap that Konstantine and Chris D asked. 
Milestone 1 is feasible and doesn't require removal of FSN lock. We have also 
shown several cases of sharing other code in future (protocol engine). This 
co-development will be easier if in the same repo. Over time HDSL + ported NN  
will create a new HDFS and become feature compatible - some of the feature will 
come for free because they are in NN and will port over to the new NN, Some are 
in block layer (erasure code) and will have to be added to HDSL.

--- You compare with Yarn, HDFS and Common. HDFS and Yarn are independent but 
both depend on Hadoop common (e.g. HBase runs on HDFS without Yarn).   HDSL and 
Ozone will depend on Hadoop common, Indeed the new protocol engine of HDSL 
might move to Hadoop common or HDFS. We have made sure that there are no 
dependencies of HDFS on HDSL or currently.


***The Repo issue and conclusion***
HDFS community will need to work together as we evolve old HDFS to use HDSL, 
new protocol engine and Raft. and together evolve to a newer more powerful set 
of sub components. It is important that they are in same repo and that we can 
share code through both private interface. We are not trying to build a 
competing Object store but to improve HDFS and fixing scalability fundamentally 
is hard and we are asking for an environment for that to happen easily over the 
next year while heeding to the stability concerns of HDFS developers (eg we  
remove compile time dependency, maven profile). This work is not being done by 
members of foreign project trying to insert code in Hadoop, but by Hadoop/HDFS 
developers with given track record s and are active participation in Hadoop and 
HDFS. Our jobs depend on HDFS/Hadoop stability - destabilizing is the last 
thing we want to do; we have responded every constructive feedback 


sanjay


> On Mar 6, 2018, at 6:50 PM, J. Rottinghuis  wrote:
> 
> Sorry for jumping in late into the fray of this discussion.
> 
> It seems Ozone is a large feature. I appreciate the development effort and
> the desire to get this into the hands of users.
> I understand the need to iterate quickly and to reduce overhead for
> development.
> I also agree that Hadoop can benefit from a quicker release cycle. For our
> part, this is a challenge as we have a large installation with multiple
> clusters and thousands of users. It is a constant balance between jumping
> to the newest release and the cost of this integration and test at our
> scale, especially when things aren't backwards compatible. We try to be
> good citizens and upstream our changes and contribute back.
> 
> The point was made that splitting the projects such as common and Yarn
> didn't work and had to be reverted. That was painful and a lot of work for
> those involved for sure. This project may be slightly different in that
> hadoop-common, Yarn and HDFS made for one consistent whole. One couldn't
> run a project without the other.
> 
> Having a separate block management layer with possibly multiple block
> implementation as pluggable under the covers would be a good future
> development for HDFS. Some users would choose Ozone as that layer, some
> might use S3, others GCS, or Azure, or something else.
> If the argument is made that nobody will be able to run Hadoop as a
> consistent stack without Ozone, then that would be a strong case to keep
> things in the same repo.
> 
> Obviously when people do want to use Ozone, then having it in the same repo
> is easier. The flipside is that, separate top-level project in the same
> repo or not, it adds to the Hadoop 

[jira] [Created] (YARN-8018) Yarn service: Add support for initiating service upgrade

2018-03-09 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8018:
---

 Summary: Yarn service: Add support for initiating service upgrade
 Key: YARN-8018
 URL: https://issues.apache.org/jira/browse/YARN-8018
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


Add support for initiating service upgrade which includes the following main 
changes:
 # Service API to initiate upgrade
 # Persist service version on hdfs
 # Start the upgraded version of service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8017) Validate the application ID has been persisted to the service definition prior to use

2018-03-09 Thread Shane Kumpf (JIRA)
Shane Kumpf created YARN-8017:
-

 Summary: Validate the application ID has been persisted to the 
service definition prior to use
 Key: YARN-8017
 URL: https://issues.apache.org/jira/browse/YARN-8017
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Shane Kumpf


The service definition is persisted to disk prior to launching the application. 
Once the application is launched, the service definition is updated to include 
the application ID. If submit fails, the application ID is never added to the 
previously persisted service definition.

When this occurs, attempting to stop or destroy the application results in a 
NPE while trying to get the application ID from the service definition, making 
it impossible to clean up.
{code:java}
2018-03-02 18:28:05,512 INFO 
org.apache.hadoop.yarn.service.utils.ServiceApiUtil: Loading service definition 
from 
hdfs://y7001.yns.hortonworks.com:8020/user/hadoopuser/.yarn/services/skumpfcents/skumpfcents.json
2018-03-02 18:28:05,525 WARN 
org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.api.records.ApplicationId.fromString(ApplicationId.java:111)
at 
org.apache.hadoop.yarn.service.client.ServiceClient.getAppId(ServiceClient.java:1106)
at 
org.apache.hadoop.yarn.service.client.ServiceClient.actionStop(ServiceClient.java:363)
at 
org.apache.hadoop.yarn.service.webapp.ApiServer$4.run(ApiServer.java:251)
at 
org.apache.hadoop.yarn.service.webapp.ApiServer$4.run(ApiServer.java:243)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8016) Provide a common interface for queues mapping rules

2018-03-09 Thread Zian Chen (JIRA)
Zian Chen created YARN-8016:
---

 Summary: Provide a common interface for queues mapping rules
 Key: YARN-8016
 URL: https://issues.apache.org/jira/browse/YARN-8016
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Zian Chen
Assignee: Zian Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [EVENT] HDFS Bug Bash: March 12

2018-03-09 Thread Chris Douglas
Hey folks-

Remember we're getting together on Monday to fix some "quality of
life" issues in the build. Please RSVP [1] so we can preprint badges,
that sort of thing. The meetup site only allows for 6 hour events, but
we have the room through 6:00PM PST and can keep the bridge/channel
open for other timezones that want to keep going.

We'll post remote access information when we get it. Please update the
wiki [2] as you think of issues we should address. -C

[1] https://meetingstar.io/event/fk13172f1d75KN
[2]: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75965105

On Tue, Mar 6, 2018 at 7:48 PM, Chris Douglas  wrote:
> Found a meetup alternative (thanks Subru):
> https://meetingstar.io/event/fk13172f1d75KN
>
> So we can get a rough headcount, please add (local) if you plan to
> attend in-person. -C
>
>
> On Mon, Mar 5, 2018 at 4:03 PM, Chris Douglas  wrote:
>> [Cross-posting, as this affects the rest of the project]
>>
>> Hey folks-
>>
>> As discussed last month [1], the HDFS build hasn't been healthy
>> recently. We're dedicating a bug bash to stabilize the build and
>> address some longstanding issues with our unit tests. We rely on our
>> CI infrastructure to keep the project releasable, and in its current
>> state, it's not protecting us from regressions. While we probably
>> won't achieve all our goals in this session, we can develop the
>> conditions for reestablishing a firm foundation.
>>
>> If you're new to the project, please consider attending and
>> contributing. Committers often prioritize large or complicated
>> patches, and the issues that make the project livable don't get enough
>> attention. A bug bash is a great opportunity to pull reviewers'
>> collars, and fix the annoyances that slow us all down.
>>
>> If you're a committer, please join us! While some of the proposed
>> repairs are rote, many unit tests rely on implementation details and
>> non-obvious invariants. We need domain experts to help untangle
>> complex dependencies and to prevent breakage of deliberate, but
>> counter-intuitive code.
>>
>> We're collecting tasks in wiki [2] and will include a dial-in option
>> for folks who aren't local.
>>
>> Meetup has started charging for creating new events, so we'll have to
>> find another way to get an approximate headcount and publish the
>> address. Please ping me if you have a preferred alternative. -C
>>
>> [1]: https://s.apache.org/nEoQ
>> [2]: 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75965105

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-03-09 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/

[Mar 8, 2018 8:02:34 AM] (yqlin) HADOOP-15296. Fix a wrong link for RBF in the 
top page. Contributed by
[Mar 8, 2018 10:13:36 AM] (wwei) YARN-8011.
[Mar 8, 2018 11:15:46 AM] (stevel) HADOOP-15292. Distcp's use of pread is 
slowing it down. Contributed by
[Mar 8, 2018 11:24:06 AM] (stevel) HADOOP-15273.distcp can't handle remote 
stores with different checksum
[Mar 8, 2018 5:32:05 PM] (inigoiri) HDFS-13232. RBF: ConnectionManager's 
cleanup task will compare each
[Mar 8, 2018 6:17:02 PM] (xiao) HADOOP-15280. TestKMS.testWebHDFSProxyUserKerb 
and
[Mar 8, 2018 6:23:36 PM] (sunilg) YARN-7944. [UI2] Remove master node link from 
headers of application




-1 overall


The following subsystems voted -1:
findbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
   org.apache.hadoop.yarn.api.records.Resource.getResources() may expose 
internal representation by returning Resource.resources At Resource.java:by 
returning Resource.resources At Resource.java:[line 234] 

Failed junit tests :

   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage 
   hadoop.yarn.server.TestDiskFailures 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-compile-javac-root.txt
  [296K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/whitespace-eol.txt
  [9.2M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/whitespace-tabs.txt
  [288K]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/xml.txt
  [4.0K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [320K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [48K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/716/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [84K]

Powered by Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org