[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934867#comment-13934867
 ] 

Hudson commented on YARN-1771:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/509/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935021#comment-13935021
 ] 

Hudson commented on YARN-1771:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935100#comment-13935100
 ] 

Hudson commented on YARN-1771:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935310#comment-13935310
 ] 

Chris Douglas commented on YARN-1771:
-

bq. It would be great if you could commit this to branch-2.4 too...

Sure, np. Done

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-14 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935315#comment-13935315
 ] 

Sangjin Lee commented on YARN-1771:
---

Thanks!

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934396#comment-13934396
 ] 

Hudson commented on YARN-1771:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5325 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5325/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934426#comment-13934426
 ] 

Sangjin Lee commented on YARN-1771:
---

Thanks Chris! It would be great if you could commit this to branch-2.4 too...

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932203#comment-13932203
 ] 

Chris Douglas commented on YARN-1771:
-

I just skimmed the patch, but it lgtm. The LoadingCache impl is very clean, and 
only caching over the course of a container localization relieves one of any 
practical responsibility to limit the cache size (that said, might as well add 
something fixed). Only minor, optional nits: If a path is invalid/inaccessible, 
it might make sense to memoize the failure, also. {{FSDownload::isPublic}} can 
be package-private (and annotated w/ {{\@VisibleForTesting}} for the unit test, 
rather than public.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-12 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932441#comment-13932441
 ] 

Sangjin Lee commented on YARN-1771:
---

Good point about making isPublic() package-private. I'll make that change.

I'm also going to make changes to memoize failures. The only slight hesitation 
I have is normally that would be quite rare, but I think it's a good thing to 
have.

I did think about making the stat cache longer-lived. But the complexity of 
managing its size as well as the values getting quite stale dissuaded me from 
it. Let me know if you agree...

I'll post a new patch soon.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932651#comment-13932651
 ] 

Chris Douglas commented on YARN-1771:
-

bq. I'm also going to make changes to memoize failures. The only slight 
hesitation I have is normally that would be quite rare, but I think it's a good 
thing to have.

Agreed, I doubt it will have a significant impact, here. In a 
shared/longer-lived cache it might be marginally more useful, but still rare.

bq. I did think about making the stat cache longer-lived. But the complexity of 
managing its size as well as the values getting quite stale dissuaded me from 
it. Let me know if you agree...

*nod* Since the goal is to reduce stress on the NN, deferring that complexity 
until necessary is a good plan.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932806#comment-13932806
 ] 

Hadoop QA commented on YARN-1771:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634327/yarn-1771.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3340//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3340//console

This message is automatically generated.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-11 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930487#comment-13930487
 ] 

Sangjin Lee commented on YARN-1771:
---

Thoughts on the patch? Thanks!

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923670#comment-13923670
 ] 

Hadoop QA commented on YARN-1771:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633321/yarn-1771.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3295//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3295//console

This message is automatically generated.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923176#comment-13923176
 ] 

Sangjin Lee commented on YARN-1771:
---

I have been looking into this from the perspective of reducing the number of 
unnecessary getFileStatus calls (and thereby reducing the pressure on the name 
node). So for now I'm gravitating towards a solution that caches the 
getFileStatus calls for the duration of a container initialization (i.e. 
resource localization). It would be pretty effective (reducing the number of 
calls from (m + 3)*n to n + (small constant)).

I'll upload a patch for your review shortly. Thanks!

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923226#comment-13923226
 ] 

Sangjin Lee commented on YARN-1771:
---

I have created a status cache at the LocalizerContext level, and let FSDownload 
utilize the cache when querying the file status for the parent directories.

I considered using a simple synchronized map and ConcurrentHashMap, but settled 
on using guava's LoadingCache. The issue with the localization pattern is that 
it is bursty. Most of the downloads happen in parallel, and thus most of these 
getFileStatus calls also go out in a burst. With a synchronized map, the 
problem is that these calls would be unnecessarily serialized (as it needs to 
acquire a global lock for this map). With a ConcurrentHashMap, calls can be 
concurrent, but with a simple ConcurrentMap usage it becomes harder to avoid 
extra getFileStatus calls.

The LoadingCache maintains concurrency *and* limits the getFileStatus calls to 
strictly one call per path (I added the unit test to verify that).

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923294#comment-13923294
 ] 

Hadoop QA commented on YARN-1771:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633247/yarn-1771.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/3282//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3282//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3282//console

This message is automatically generated.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918112#comment-13918112
 ] 

Jason Lowe commented on YARN-1771:
--

Here's a thought to possibly avoid checking each directory level individually: 
what if the NM simply tried to read the file as the user requesting it to be 
public?  The NM should already have the necessary tokens to access the 
resource, so it should be able to use doAs to read the file as the requesting 
user.  The rationale for this approach being that if the user can read the 
resource and is asking for it to be public then they can trivially make the 
data public themselves by copying to /tmp and make the copy publicly accessible.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918247#comment-13918247
 ] 

Sangjin Lee commented on YARN-1771:
---

Would it be a little weaker condition than the current public check? The 
current check calls for the READ permission by others.

One possible case here is if the user has a group READ permission on the file 
(but others' READ permission is off). Then the user's doAs would succeed even 
though others do not have the READ permission.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918301#comment-13918301
 ] 

Jason Lowe commented on YARN-1771:
--

Yes, it would be a weaker condition check, but I'm wondering if the weaker 
check still meets the security needs of the dist cache.

A user is requesting a resource to be publicly localized.  If they have read 
permissions to it then even if others lack access then the original user can 
trivially work around that obstacle by copying to a publicly accessible 
location (e.g.: /tmp).  So in that sense the user has a legitimate way to make 
the resource data public even if it isn't right now.

A subsequent request for the same resource would check the timestamp doing the 
same doAs logic, so if another user doesn't have access then they won't 
localize.  It's true that the other user's container can still access the 
resource by avoiding explicit localization and instead scanning/scraping the 
local public distcache area directly once it runs.  However the original user 
who requested the resource asked for it to be public and has the means to make 
it public, so they probably aren't concerned that the public can access it.

This approach would also be useful to the shared cache design in YARN-1492, 
where it was calling for the ability to make something a public resource 
directly from a user's staging area.

There may be some security concerns that I've missed, but if this ends up being 
a possibility then it would eliminate all of the parent directory stat calls on 
public localization.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918548#comment-13918548
 ] 

Chris Douglas commented on YARN-1771:
-

The simpler check doesn't seem to have any practical issues. Since the cache is 
keyed on Paths, the case where a user can refer to an object without access to 
it seems pretty esoteric. As long as the public cache runs with lowered 
privileges, and the check isn't necessary to verify that the public resource 
isn't private to YARN. Copying with the user's HDFS credentials avoids that, 
though that seems like a heavyweight solution if reducing getFileStatus calls 
is the only motivation.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918565#comment-13918565
 ] 

Jason Lowe commented on YARN-1771:
--

Today the public cache localizes as the NM user, so the public checking is 
important to avoid a security problem where the user could convince the NM to 
localize a file for which the user does not have privileges but the NM user 
does (e.g.: please localize that other job's .jhist file, aggregated logs, 
etc.).  So I think we need some kind of access check, either as the requesting 
user or explicit access checks like it does today, to avoid a malicious client 
obtaining access to private files via the NM.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918605#comment-13918605
 ] 

Gera Shegalov commented on YARN-1771:
-

Orthogonal to this we have been discussing adding a FileStatus[] 
getFileStatus(Path f) API that returns FileStatus for each path component of f 
in a single RPC. Interested in comments about this idea.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918625#comment-13918625
 ] 

Chris Douglas commented on YARN-1771:
-

bq. Orthogonal to this we have been discussing adding a FileStatus[] 
getFileStatus(Path f) API that returns FileStatus for each path component of f 
in a single RPC.

Symlinks might be awkward to support, but that discussion is for a separate 
ticket. Do you have a JIRA ref?

bq. So I think we need some kind of access check, either as the requesting user 
or explicit access checks like it does today, to avoid a malicious client 
obtaining access to private files via the NM.

An HDFS nobody account?

A cache would probably be correct in almost all cases, though. Since the check 
is only performed when the resource is localized, there could be cases where 
the filesystem is never in the cached state, but those are rare (and as Sandy 
points out, already in the current design). To attack the cache, the writer 
would need to take an unprotected directory, change its permissions, then 
populate it with private data (whose attributes are guessable). Expiring after 
short internals and not populating the cache with failed localization attempts 
could help mitigate its effectiveness.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918649#comment-13918649
 ] 

Jason Lowe commented on YARN-1771:
--

Agreed, a nobody account would make the check similarly cheap.

I also like the idea of caching these a bit more rather than pinging the 
namenode each time a new container arrives with an existing resource requested. 
 That latter idea is similar to what Koji was asking for way back in 
MAPREDUCE-2011.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-03 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918661#comment-13918661
 ] 

Gera Shegalov commented on YARN-1771:
-

bq. Symlinks might be awkward to support, but that discussion is for a separate 
ticket. Do you have a JIRA ref?

Now I do: HDFS-6045

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-02-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916516#comment-13916516
 ] 

Sandy Ryza commented on YARN-1771:
--

MAPREDUCE-4907 fixed a similar issue (redundant getFileInfos for parent 
directories) on the client side.  

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-02-28 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916658#comment-13916658
 ] 

Sangjin Lee commented on YARN-1771:
---

Thanks [~sandyr], I'll consider doing a similar fix. First off, there are just 
redundant checks for the same file (3). Then we could cache the result of the 
statuses for the parent directories (for the duration of the resource 
localization).

Do you worry about the stats for directories possibly changing? I know it's 
highly unlikely, and it may even be impossible, but just curious.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-02-28 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1391#comment-1391
 ] 

Sandy Ryza commented on YARN-1771:
--

bq. Do you worry about the stats for directories possibly changing?
That will cause problems no matter what.  There's still a lag between when the 
file info is checked and when other operations are performed.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)