[jira] [Commented] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-23 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237689#comment-17237689
 ] 

Wangda Tan commented on MAPREDUCE-7309:
---

Thanks [~pbacsko], the latest patch looks better than v2. Can you check 
checkstyle and junit and see if it is related to the patch.

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7309-003.patch, MAPREDUCE-7309.001.patch, 
> MAPREDUCE-7309.002.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-23 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7309:
-

Assignee: Peter Bacsko  (was: Wangda Tan)

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Wangda Tan
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7309-003.patch, MAPREDUCE-7309.001.patch, 
> MAPREDUCE-7309.002.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-20 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236497#comment-17236497
 ] 

Wangda Tan commented on MAPREDUCE-7309:
---

Attached ver.2 patch, [~snemeth], [~shuzirra] can you help to review it?

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: MAPREDUCE-7309.001.patch, MAPREDUCE-7309.002.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-20 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7309:
--
Attachment: MAPREDUCE-7309.002.patch

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: MAPREDUCE-7309.001.patch, MAPREDUCE-7309.002.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-20 Thread Wangda Tan (Jira)
Wangda Tan created MAPREDUCE-7309:
-

 Summary: Improve performance of reading resource request for 
mapper/reducers from config
 Key: MAPREDUCE-7309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Affects Versions: 3.3.0, 3.2.0, 3.1.0, 3.0.0
Reporter: Wangda Tan
 Attachments: MAPREDUCE-7309.001.patch

This is an issue could affect all the releases which includes YARN-6927. 

Basically, we use regex match repeatly when we read mapper/reducer resource 
request from config files. When we have large config file, and large number of 
splits, it could take a long time.  

We saw AM could take hours to parse config when we have 200k+ splits, with a 
large config file (hundreds of kbs). 

We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-20 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7309:
--
Attachment: MAPREDUCE-7309.001.patch

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Wangda Tan
>Priority: Major
> Attachments: MAPREDUCE-7309.001.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-20 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7309:
--
Status: Patch Available  (was: Open)

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.3.0, 3.2.0, 3.1.0, 3.0.0
>Reporter: Wangda Tan
>Priority: Major
> Attachments: MAPREDUCE-7309.001.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7309) Improve performance of reading resource request for mapper/reducers from config

2020-11-20 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7309:
-

Assignee: Wangda Tan

> Improve performance of reading resource request for mapper/reducers from 
> config
> ---
>
> Key: MAPREDUCE-7309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: MAPREDUCE-7309.001.patch
>
>
> This is an issue could affect all the releases which includes YARN-6927. 
> Basically, we use regex match repeatly when we read mapper/reducer resource 
> request from config files. When we have large config file, and large number 
> of splits, it could take a long time.  
> We saw AM could take hours to parse config when we have 200k+ splits, with a 
> large config file (hundreds of kbs). 
> We should do proper caching for pre-configured resource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7298) Distcp doesn't close the job after the job is completed

2020-10-01 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205866#comment-17205866
 ] 

Wangda Tan commented on MAPREDUCE-7298:
---

Thanks [~aasha], I just added you to contributor list.

> Distcp doesn't close the job after the job is completed
> ---
>
> Key: MAPREDUCE-7298
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7298
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: MAPREDUCE-7298.01.patch, MAPREDUCE-7298.02.patch
>
>
> Distcp doesn't close the job after the job is completed. This leads to leaked 
> Truststore Reloader Threads.
> The fix is to close the job once it is complete. job.close internally calls 
> yarnClient.close(), which then calls timelineConnector.serviceStop() . This 
> destroys the sslFactory cleaning up the ReloadingX509TrustManager.
> Without the patch for each distcp job, a new ReloadingX509TrustManager is 
> created which creates a new thread. These threads are never killed and they 
> remain like that till HS2 is restarted. With the close, the thread will be 
> cleaned up once the job is completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7298) Distcp doesn't close the job after the job is completed

2020-10-01 Thread Wangda Tan (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7298:
-

Assignee: Aasha Medhi

> Distcp doesn't close the job after the job is completed
> ---
>
> Key: MAPREDUCE-7298
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7298
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: MAPREDUCE-7298.01.patch, MAPREDUCE-7298.02.patch
>
>
> Distcp doesn't close the job after the job is completed. This leads to leaked 
> Truststore Reloader Threads.
> The fix is to close the job once it is complete. job.close internally calls 
> yarnClient.close(), which then calls timelineConnector.serviceStop() . This 
> destroys the sslFactory cleaning up the ReloadingX509TrustManager.
> Without the patch for each distcp job, a new ReloadingX509TrustManager is 
> created which creates a new thread. These threads are never killed and they 
> remain like that till HS2 is restarted. With the close, the thread will be 
> cleaned up once the job is completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7298) Distcp doesn't close the job after the job is completed

2020-10-01 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205770#comment-17205770
 ] 

Wangda Tan commented on MAPREDUCE-7298:
---

+1, thanks for the patch.

> Distcp doesn't close the job after the job is completed
> ---
>
> Key: MAPREDUCE-7298
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7298
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Reporter: Aasha Medhi
>Priority: Major
> Attachments: MAPREDUCE-7298.01.patch, MAPREDUCE-7298.02.patch
>
>
> Distcp doesn't close the job after the job is completed. This leads to leaked 
> Truststore Reloader Threads.
> The fix is to close the job once it is complete. job.close internally calls 
> yarnClient.close(), which then calls timelineConnector.serviceStop() . This 
> destroys the sslFactory cleaning up the ReloadingX509TrustManager.
> Without the patch for each distcp job, a new ReloadingX509TrustManager is 
> created which creates a new thread. These threads are never killed and they 
> remain like that till HS2 is restarted. With the close, the thread will be 
> cleaned up once the job is completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7172) Wildcard functionality of -libjar is broken when jars are located in same remote FS

2018-12-12 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719260#comment-16719260
 ] 

Wangda Tan commented on MAPREDUCE-7172:
---

Thanks [~templedf] for trying this.  

Per my understanding, the code flow is: 

Inside org.apache.hadoop.mapreduce.JobResourceUploader#uploadLibJars

1) Mkdir for the job submission libjar dir (on remote FS path):
{code}
Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir);
{code}

2) Copy remote files:
{code}
if (newURI == null) {
  Path newPath =
  copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);
{code}

Important: shared cache libjar should be disabled:
{code}
if (scConfig.isSharedCacheLibjarsEnabled()) {
{code} 

3) Inside copyRemoteFiles: 
{code}
if (FileUtil.compareFs(remoteFs, jtFs)) {
  return originalPath;
}
{code} 
If remoteFS == jtFS, nothing will be copied. And returns originalPath. (So the 
returned newPath is same as originalPath).

4) Call addFileToClasspath, but last parameter will be false. (So file will be 
added to classpath, but won't be downloaded by NM).
{code}
DistributedCache.addFileToClassPath(new Path(newURI.getPath()), conf,
jtFs, false);
{code}

5) When wildcard is enabled, and no fragment (please make sure no fragment). 
{code}
  if (useWildcard && !foundFragment) {
// Add the whole directory to the cache using a wild card
Path libJarsDirWildcard =
jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));
DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
  } 
{code}
The whole remote libjar directory will be localized. But because of 3), files 
won't be copied to remote libjar.

I will try to spend some time today to reproduce this issue on a local cluster, 
if you have time, could u check my comment as well? :). 

> Wildcard functionality of -libjar is broken when jars are located in same 
> remote FS
> ---
>
> Key: MAPREDUCE-7172
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7172
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> We recently found that when -libjar specified jars on the same remote FS, 
> jars will not be properly added to classpath. 
> The reason is MAPREDUCE-6719 added the wildcard functionality, but the follow 
> logic assumes files are all placed under job's submission directory. (Inside 
> JobResourceUploader)
> {code:java}
> if (useWildcard && !foundFragment) {
>   // Add the whole directory to the cache using a wild card
>   Path libJarsDirWildcard =
>   jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));
>   DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
> }{code}
> However, in the same method, specified resources will be only uploaded when 
> two FSes are different, see copyRemoteFiles:
> {code:java}
> if (FileUtil.compareFs(remoteFs, jtFs)) {
>   return originalPath;
> } {code}
> Workaround of this issue is pass:
> mapreduce.client.libjars.wildcard = false.
> When the MR job got launched. 
> Example commandline to reproduce this issue is: 
> {code:java}
> hadoop jar abc.jar org.ABC -libjars 
> "wasb://host/path1/jar1,wasb://host/path2/jar2..."{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7172) Wildcard functionality of -libjar is broken when jars are located in same remote FS

2018-12-11 Thread Wangda Tan (JIRA)
Wangda Tan created MAPREDUCE-7172:
-

 Summary: Wildcard functionality of -libjar is broken when jars are 
located in same remote FS
 Key: MAPREDUCE-7172
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7172
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wangda Tan


We recently found that when -libjar specified jars on the same remote FS, jars 
will not be properly added to classpath. 

The reason is MAPREDUCE-6719 added the wildcard functionality, but the follow 
logic assumes files are all placed under job's submission directory. (Inside 
JobResourceUploader)
{code:java}
if (useWildcard && !foundFragment) {
  // Add the whole directory to the cache using a wild card
  Path libJarsDirWildcard =
  jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));
  DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
}{code}
However, in the same method, specified resources will be only uploaded when two 
FSes are different, see copyRemoteFiles:
{code:java}
if (FileUtil.compareFs(remoteFs, jtFs)) {
  return originalPath;
} {code}
Workaround of this issue is pass:

mapreduce.client.libjars.wildcard = false.

When the MR job got launched. 

Example commandline to reproduce this issue is: 
{code:java}
hadoop jar abc.jar org.ABC -libjars 
"wasb://host/path1/jar1,wasb://host/path2/jar2..."{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7172) Wildcard functionality of -libjar is broken when jars are located in same remote FS

2018-12-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718010#comment-16718010
 ] 

Wangda Tan commented on MAPREDUCE-7172:
---

[~templedf], [~sjlee0], if you got chance, could u take a look at the issue? 

> Wildcard functionality of -libjar is broken when jars are located in same 
> remote FS
> ---
>
> Key: MAPREDUCE-7172
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7172
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> We recently found that when -libjar specified jars on the same remote FS, 
> jars will not be properly added to classpath. 
> The reason is MAPREDUCE-6719 added the wildcard functionality, but the follow 
> logic assumes files are all placed under job's submission directory. (Inside 
> JobResourceUploader)
> {code:java}
> if (useWildcard && !foundFragment) {
>   // Add the whole directory to the cache using a wild card
>   Path libJarsDirWildcard =
>   jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));
>   DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
> }{code}
> However, in the same method, specified resources will be only uploaded when 
> two FSes are different, see copyRemoteFiles:
> {code:java}
> if (FileUtil.compareFs(remoteFs, jtFs)) {
>   return originalPath;
> } {code}
> Workaround of this issue is pass:
> mapreduce.client.libjars.wildcard = false.
> When the MR job got launched. 
> Example commandline to reproduce this issue is: 
> {code:java}
> hadoop jar abc.jar org.ABC -libjars 
> "wasb://host/path1/jar1,wasb://host/path2/jar2..."{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-11-30 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705006#comment-16705006
 ] 

Wangda Tan commented on MAPREDUCE-7167:
---

Thanks [~saurabhpant], I couldn't see this diff from my vim setup, however from 
the logic your screenshot makes sense.

I'm not entirely sure if this is an incompatible change. [~vinodkv], [~jlowe], 
[~ajisakaa], mind you share your thoughts here? See: 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?focusedCommentId=16703957=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16703957

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Assignee: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, 
> image-2018-11-29-14-53-58-176.png, image-2018-11-29-14-54-28-254.png, 
> nremoved.txt, nremoved.txt, patch1128.patch, patch1128.patch, 
> patch1128trunk.patch, withn.txt, withn.txt
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-11-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7167:
-

Assignee: Saurabh

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Assignee: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, nremoved.txt, 
> nremoved.txt, patch1128.patch, patch1128.patch, patch1128trunk.patch, 
> withn.txt, withn.txt
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-11-29 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703898#comment-16703898
 ] 

Wangda Tan commented on MAPREDUCE-7167:
---

[~saurabhpant], sorry I couldn't find the empty lines between events for both 
files, mind to double check?

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, nremoved.txt, 
> nremoved.txt, patch1128.patch, patch1128.patch, patch1128trunk.patch, 
> withn.txt, withn.txt
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-11-29 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703566#comment-16703566
 ] 

Wangda Tan commented on MAPREDUCE-7167:
---

[~saurabhpant], thanks for filing the JIRA. 

Could u upload job history file before and after the patch? jhist file is a 
public contract between MR app and other metrics monitoring frameworks. Even if 
there're extra line breaks inside the file, removing such things could be 
considered as an incompatible change. So if you could upload files after the 
patch applied, it gonna be very helpful.

And btw please don't set fix version, that is used by committer once the patch 
got committed. 

If you could provide your JIRA id, I can add you to contributors list.

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, patch1128.patch, 
> patch1128.patch, patch1128trunk.patch
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-11-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7167:
--
Fix Version/s: (was: 3.2.1)
   (was: 3.3.0)
   (was: 3.1.2)

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, patch1128.patch, 
> patch1128.patch, patch1128trunk.patch
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-11-29 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7167:
--
Target Version/s: 3.3.0, 3.2.1

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, patch1128.patch, 
> patch1128.patch, patch1128trunk.patch
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7162) TestEvents#testEvents fails

2018-11-27 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700685#comment-16700685
 ] 

Wangda Tan commented on MAPREDUCE-7162:
---

Apologize for introducing the issue, Thanks [~uranus], [~ajisakaa] to get the 
issue resolved.

> TestEvents#testEvents fails
> ---
>
> Key: MAPREDUCE-7162
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7162
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Critical
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7162.001.patch, MAPREDUCE-7162.002.patch, 
> MAPREDUCE-7162.003.patch
>
>
> Mapreduce unit test is broken by 
> https://issues.apache.org/jira/browse/MAPREDUCE-7158 . 
> *I think we should keep the data consistent to avoid corruption when output, 
> so I roll back the previous code and attach the patch.*
> Broken location _is 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents#testEvents._
> {code:java}
> org.codehaus.jackson.JsonParseException: Illegal unquoted character 
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in 
> name
> at [Source: java.io.DataInputStream@25618e91; line: 23, column: 418]
> at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
> at 
> org.codehaus.jackson.impl.JsonParserMinimalBase._throwUnquotedSpace(JsonParserMinimalBase.java:482)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1446)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.parseFieldName(Utf8StreamParser.java:1410)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1283)
> at 
> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
> at org.apache.avro.io.JsonDecoder.doArrayNext(JsonDecoder.java:367)
> at org.apache.avro.io.JsonDecoder.arrayNext(JsonDecoder.java:361)
> at org.apache.avro.io.ValidatingDecoder.arrayNext(ValidatingDecoder.java:189)
> at 
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:222)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
> at 
> org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
> at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:101)
> at 
> org.apache.hadoop.mapreduce.jobhistory.TestEvents.testEvents(TestEvents.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7158:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.1
   3.3.0
   3.1.2
   Status: Resolved  (was: Patch Available)

Committed to branch-3.1/3.2/trunk, thanks [~zichensun]!

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Zichen Sun
>Assignee: Zichen Sun
>Priority: Major
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7158-001.patch
>
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-13 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685550#comment-16685550
 ] 

Wangda Tan commented on MAPREDUCE-7158:
---

+1, patch LGTM, thanks [~zichensun]. 
Also added you to contributor list so you can assign tickets to yourself in the 
future.

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Zichen Sun
>Assignee: Zichen Sun
>Priority: Major
> Attachments: MAPREDUCE-7158-001.patch
>
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7158) Inefficient Flush Logic in JobHistory EventWriter

2018-11-13 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7158:
-

Assignee: Zichen Sun

> Inefficient Flush Logic in JobHistory EventWriter
> -
>
> Key: MAPREDUCE-7158
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7158
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Zichen Sun
>Assignee: Zichen Sun
>Priority: Major
> Attachments: MAPREDUCE-7158-001.patch
>
>
> In HDFS, if the flush is implemented to send server request to actually 
> commit the pending writes on the storage service side, we could observe in 
> the benchmark runs that the MR jobs are taking much longer. From 
> investigation we see the current implementation for writing events doesn't 
> look right:
> EventWriter# write()
> This flush is redundant and this statement should be removed. It defeats the 
> purpose of having a separate flush function itself.
> Encoder.flush calls flush of the underlying output stream
> After patching with the fix the MR jobs could complete normally, please 
> kindly find the patch in attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7125) JobResourceUploader creates LocalFileSystem when it's not necessary

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7125:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk/branch-3.1, thanks [~gezapeti].

> JobResourceUploader creates LocalFileSystem when it's not necessary 
> 
>
> Key: MAPREDUCE-7125
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7125
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Reporter: Peter Cseh
>Assignee: Peter Cseh
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: MAPREDUCE-7125.001.patch
>
>
> When the property {{mapreduce.job.log4j-properties-file}} is set a local 
> filesystem is created even if it's never used:
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobResourceUploader.java#L858-L866
> The localFS should only be created when required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7125) JobResourceUploader creates LocalFileSystem when it's not necessary

2018-09-24 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7125:
--
Fix Version/s: 3.1.2
   3.2.0

> JobResourceUploader creates LocalFileSystem when it's not necessary 
> 
>
> Key: MAPREDUCE-7125
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7125
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Reporter: Peter Cseh
>Assignee: Peter Cseh
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: MAPREDUCE-7125.001.patch
>
>
> When the property {{mapreduce.job.log4j-properties-file}} is set a local 
> filesystem is created even if it's never used:
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobResourceUploader.java#L858-L866
> The localFS should only be created when required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7125) JobResourceUploader creates LocalFileSystem when it's not necessary

2018-09-24 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626420#comment-16626420
 ] 

Wangda Tan commented on MAPREDUCE-7125:
---

Fix looks good, +1, committing the patch.

> JobResourceUploader creates LocalFileSystem when it's not necessary 
> 
>
> Key: MAPREDUCE-7125
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7125
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Reporter: Peter Cseh
>Assignee: Peter Cseh
>Priority: Major
> Attachments: MAPREDUCE-7125.001.patch
>
>
> When the property {{mapreduce.job.log4j-properties-file}} is set a local 
> filesystem is created even if it's never used:
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobResourceUploader.java#L858-L866
> The localFS should only be created when required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7118:
--
Fix Version/s: 3.1.1
   3.2.0

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-7118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: MAPREDUCE-7118.001.patch
>
>
> MAPREDUCE-4503 made distributed cache conflicts break job submission, but 
> this was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately 
> the latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When 
> Oozie, Pig, and other downstream projects that can occasionally generate 
> distributed cache conflicts move to Hadoop 3.x the workflows that used to 
> work on 0.23 and 2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-19 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7118:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-3.1/trunk, thanks [~jlowe]

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-7118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: MAPREDUCE-7118.001.patch
>
>
> MAPREDUCE-4503 made distributed cache conflicts break job submission, but 
> this was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately 
> the latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When 
> Oozie, Pig, and other downstream projects that can occasionally generate 
> distributed cache conflicts move to Hadoop 3.x the workflows that used to 
> work on 0.23 and 2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-16 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545608#comment-16545608
 ] 

Wangda Tan commented on MAPREDUCE-7118:
---

Thanks [~jlowe], to me this is a valid issue and we should fix this in 3.x 
releases. Patch LGTM. +1, will push to branch-3.1 and trunk if no objections.

[~yzhangal], any objections to put this to 3.0.4 release? 

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-7118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-7118.001.patch
>
>
> MAPREDUCE-4503 made distributed cache conflicts break job submission, but 
> this was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately 
> the latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When 
> Oozie, Pig, and other downstream projects that can occasionally generate 
> distributed cache conflicts move to Hadoop 3.x the workflows that used to 
> work on 0.23 and 2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7101) Add config parameter to allow JHS to alway scan user dir irrespective of modTime

2018-06-15 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514312#comment-16514312
 ] 

Wangda Tan commented on MAPREDUCE-7101:
---

Pushed to branch-3.1 as well.

> Add config parameter to allow JHS to alway scan user dir irrespective of 
> modTime
> 
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Thomas Marquardt
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 3.1.1
>
> Attachments: MAPREDUCE-7101.001.patch, MAPREDUCE-7101.001.patch
>
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
> public synchronized void scanIfNeeded(FileStatus fs) {
>   long newModTime = fs.getModificationTime();
>   if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
>   scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7101) Add config parameter to allow JHS to alway scan user dir irrespective of modTime

2018-06-15 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7101:
--
Fix Version/s: 3.1.1

> Add config parameter to allow JHS to alway scan user dir irrespective of 
> modTime
> 
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Thomas Marquardt
>Priority: Critical
> Fix For: 2.10.0, 3.2.0, 3.1.1
>
> Attachments: MAPREDUCE-7101.001.patch, MAPREDUCE-7101.001.patch
>
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
> public synchronized void scanIfNeeded(FileStatus fs) {
>   long newModTime = fs.getModificationTime();
>   if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
>   scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7101) Revisit behavior of JHS scan file behavior

2018-06-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508468#comment-16508468
 ] 

Wangda Tan commented on MAPREDUCE-7101:
---

Reported offline by [~deepesh]: 

After remove the timestamp check completely, we saw a NPE when trying to get 
the job: 
{code:java}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:324)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:302)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:440)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:637)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184)
at org.apache.hadoop.mapreduce.tools.CLI.getJob(CLI.java:530)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:268)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1274){code}
I think there're some corner cases to take care when get job status.

Given both Rohith and me are quite busy in other stuffs, as discussed offline, 
[~asuresh] could you help to take a look at this issue? I can help with reviews.

> Revisit behavior of JHS scan file behavior
> --
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
> public synchronized void scanIfNeeded(FileStatus fs) {
>   long newModTime = fs.getModificationTime();
>   if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
>   scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7101) Revisit behavior of JHS scan file behavior

2018-06-09 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507097#comment-16507097
 ] 

Wangda Tan commented on MAPREDUCE-7101:
---

[~ste...@apache.org] / [~ehiggs] / [~rohithsharma]

Thanks for your suggestions.  To me there're two major options.

1. As mentioned by [~rohithsharma], Make the behavior (skip dir timestamp 
check) to be configurable to avoid surprising users of HDFS-backed clusters. 
This config can be marked as private/unstable so we change this in the future. 

2. Considering cloud storages are not identical as mentioned, another approach 
is to add a option to make the whole {{scanIntermediateDirectory}} becomes 
pluggable folder-scan policy, and {{UserLogDir}} is private to the policy. We 
can implement list by recursive=true for some FS and =false for others, and can 
poll special files, etc.

I would not prefer to {{turn off the directory timestamp check}}, and I prefer 
#1 over #2. 

Thoughts? Can we can a conclusion for this so we can start fixing the problem 
sooner if possible?

 

> Revisit behavior of JHS scan file behavior
> --
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
> public synchronized void scanIfNeeded(FileStatus fs) {
>   long newModTime = fs.getModificationTime();
>   if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
>   scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7101) Revisit behavior of JHS scan file behavior

2018-05-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496959#comment-16496959
 ] 

Wangda Tan commented on MAPREDUCE-7101:
---

Regarding to semantics of directory's modification time, adding [~steve_l] / 
[~arpitagarwal] for suggestions.

Regarding to this scan behavior, I propose to entirely remove the if check: 
{code}
  if (modTime != newModTime
  || (scanTime/1000) == (modTime/1000)
  || (scanTime/1000 + 1) == (modTime/1000)) {
  // ...
  }
{code} 

Not sure how bad it could impact performance. Wanna to hear thoughts from 
[~jlowe] / [~vinodkv]

> Revisit behavior of JHS scan file behavior
> --
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
> public synchronized void scanIfNeeded(FileStatus fs) {
>   long newModTime = fs.getModificationTime();
>   if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
>   scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7101) Revisit behavior of JHS scan file behavior

2018-05-31 Thread Wangda Tan (JIRA)
Wangda Tan created MAPREDUCE-7101:
-

 Summary: Revisit behavior of JHS scan file behavior
 Key: MAPREDUCE-7101
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wangda Tan


Currently, the JHS scan directory if the modification of *directory* changed: 

{code} 
public synchronized void scanIfNeeded(FileStatus fs) {
  long newModTime = fs.getModificationTime();
  if (modTime != newModTime) {
<... omitted some logics ...>
// reset scanTime before scanning happens
scanTime = System.currentTimeMillis();
Path p = fs.getPath();
try {
  scanIntermediateDirectory(p);
{code}

This logic relies on an assumption that, the directory's modification time will 
be updated if a file got placed under the directory.

However, the semantic of directory's modification time is not consistent in 
different FS implementations. For example, MAPREDUCE-6680 fixed some issues of 
truncated modification time. And HADOOP-12837 mentioned on S3, the directory's 
modification time is always 0.

I think we need to revisit behavior of this logic to make it to more robustly 
work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448926#comment-16448926
 ] 

Wangda Tan commented on MAPREDUCE-7086:
---

[~sershe], oh that's my bad, moving the JIRA somehow cleaned the assignee, 
apologize for that. Jason has assigned this to you.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7086:
-

Assignee: (was: Sergey Shelukhin)
 Key: MAPREDUCE-7086  (was: HADOOP-15403)
 Project: Hadoop Map/Reduce  (was: Hadoop Common)

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448750#comment-16448750
 ] 

Wangda Tan commented on MAPREDUCE-7086:
---

I just moved this project to MAPREDUCE.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7077) Pipe mapreduce job fails with Permission denied for jobTokenPassword

2018-04-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7077:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.1
   3.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~ajisakaa] and thanks helps from [~tasanuma0829] / [~suma.shivaprasad].

> Pipe mapreduce job fails with Permission denied for jobTokenPassword
> 
>
> Key: MAPREDUCE-7077
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7077
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Akira Ajisaka
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: MAPREDUCE-7077.01.patch
>
>
> Steps:
> Launch wordcount example with pipe
> {code}
> /usr/hdp/current/hadoop-client/bin/hadoop pipes 
> "-Dhadoop.pipes.java.recordreader=true" 
> "-Dhadoop.pipes.java.recordwriter=true" -input pipeInput -output pipeOutput 
> -program bin/wordcount{code}
> The application fails with below stacktrace
> {code:title=AM}
> attempt_1517534613368_0041_r_00_2 is : 0.0
> 2018-02-02 02:40:51,071 ERROR [IPC Server handler 16 on 43391] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1517534613368_0041_r_00_2 - exited : 
> java.io.FileNotFoundException: 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1517534613368_0041/jobTokenPassword
>  (Permission denied)
>  at java.io.FileOutputStream.open0(Native Method)
>  at java.io.FileOutputStream.open(FileOutputStream.java:270)
>  at java.io.FileOutputStream.(FileOutputStream.java:213)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:236)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:401)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1026)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:703)
>  at 
> org.apache.hadoop.mapred.pipes.Application.writePasswordToLocalFile(Application.java:173)
>  at org.apache.hadoop.mapred.pipes.Application.(Application.java:109)
>  at 
> org.apache.hadoop.mapred.pipes.PipesReducer.startApplication(PipesReducer.java:87)
>  at org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:65)
>  at org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:38)
>  at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7077) Pipe mapreduce job fails with Permission denied for jobTokenPassword

2018-04-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436338#comment-16436338
 ] 

Wangda Tan commented on MAPREDUCE-7077:
---

+1, will commit shortly. 

Thanks [~ajisakaa] for the quick fix.

> Pipe mapreduce job fails with Permission denied for jobTokenPassword
> 
>
> Key: MAPREDUCE-7077
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7077
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Akira Ajisaka
>Priority: Critical
> Attachments: MAPREDUCE-7077.01.patch
>
>
> Steps:
> Launch wordcount example with pipe
> {code}
> /usr/hdp/current/hadoop-client/bin/hadoop pipes 
> "-Dhadoop.pipes.java.recordreader=true" 
> "-Dhadoop.pipes.java.recordwriter=true" -input pipeInput -output pipeOutput 
> -program bin/wordcount{code}
> The application fails with below stacktrace
> {code:title=AM}
> attempt_1517534613368_0041_r_00_2 is : 0.0
> 2018-02-02 02:40:51,071 ERROR [IPC Server handler 16 on 43391] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1517534613368_0041_r_00_2 - exited : 
> java.io.FileNotFoundException: 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1517534613368_0041/jobTokenPassword
>  (Permission denied)
>  at java.io.FileOutputStream.open0(Native Method)
>  at java.io.FileOutputStream.open(FileOutputStream.java:270)
>  at java.io.FileOutputStream.(FileOutputStream.java:213)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:236)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:401)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1026)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:703)
>  at 
> org.apache.hadoop.mapred.pipes.Application.writePasswordToLocalFile(Application.java:173)
>  at org.apache.hadoop.mapred.pipes.Application.(Application.java:109)
>  at 
> org.apache.hadoop.mapred.pipes.PipesReducer.startApplication(PipesReducer.java:87)
>  at org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:65)
>  at org.apache.hadoop.mapred.pipes.PipesReducer.reduce(PipesReducer.java:38)
>  at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:445)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7036) ASF License warning in hadoop-mapreduce-client

2018-04-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436337#comment-16436337
 ] 

Wangda Tan commented on MAPREDUCE-7036:
---

Thanks [~tasanuma0829] / [~ajisakaa]. 

> ASF License warning in hadoop-mapreduce-client
> --
>
> Key: MAPREDUCE-7036
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7036
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: MAPREDUCE-7036.1.patch
>
>
> it occurred in MAPREDUCE-7021 and MAPREDUCE-7034.
> {noformat}
> Lines that start with ? in the ASF License report indicate files that do 
> not have an Apache license header: !? 
> /testptch/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/jobTokenPassword
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7036) ASF License warning in hadoop-mapreduce-client

2018-04-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434516#comment-16434516
 ] 

Wangda Tan commented on MAPREDUCE-7036:
---

Thanks [~suma.shivaprasad] and offline help from [~vinodkv] to figure out the 
root case. 

I suggest to revert this patch, such ASF license warning does not exist in 
Hadoop code, which is accidentally introduced by yetus, I think the fix should 
be landed in yetus.

Thoughts? [~ajisakaa] / [~tasanuma0829]

> ASF License warning in hadoop-mapreduce-client
> --
>
> Key: MAPREDUCE-7036
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7036
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: MAPREDUCE-7036.1.patch
>
>
> it occurred in MAPREDUCE-7021 and MAPREDUCE-7034.
> {noformat}
> Lines that start with ? in the ASF License report indicate files that do 
> not have an Apache license header: !? 
> /testptch/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/jobTokenPassword
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6823) FileOutputFormat to support configurable PathOutputCommitter factory

2018-03-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409057#comment-16409057
 ] 

Wangda Tan commented on MAPREDUCE-6823:
---

While doing Jira scan, I could not find the commit message from git log. 
[~ste...@apache.org], could you help to double confirm?

> FileOutputFormat to support configurable PathOutputCommitter factory
> 
>
> Key: MAPREDUCE-6823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0-alpha2
> Environment: Targeting S3 as the output of work
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> MAPREDUCE-6823-002.patch, MAPREDUCE-6823-002.patch, MAPREDUCE-6823-004.patch
>
>
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which 
> can talk direct to the S3A Filesystem for more efficient operations, better 
> failure modes, and, most critically, as part of HADOOP-13345, atomic commit 
> of output. The normal committer relies on directory rename() being atomic for 
> this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat 
> (and implicitly, all subclasses which don't have their own custom committer), 
> to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or 
> new {{S3AOutputCommitter}} depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 
> API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7059) Downward Compatibility issue: MR job fails because of unknown setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster

2018-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7059:
--
Fix Version/s: (was: 3.2.0)

> Downward Compatibility issue: MR job fails because of unknown 
> setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster
> ---
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Critical
> Fix For: 3.1.0, 3.0.2
>
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch, 
> MAPREDUCE-7059.006.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
> 

[jira] [Updated] (MAPREDUCE-7061) SingleCluster setup document needs to be updated

2018-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7061:
--
Fix Version/s: (was: 3.2.0)
   3.1.0

> SingleCluster setup document needs to be updated
> 
>
> Key: MAPREDUCE-7061
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7061
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HDFS-13160.00.patch
>
>
> The following document needs an update:
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
> We need to set mapreduce.application.classpath, without that we cannot launch 
> MR jobs using yarn.
>   CLASSPATH for MR applications. A comma-separated list of CLASSPATH entries. 
> If mapreduce.application.framework is set then this must specify the 
> appropriate classpath for that archive, and the name of the archive must be 
> present in the classpath. If mapreduce.app-submission.cross-platform is 
> false, platform-specific environment vairable expansion syntax would be used 
> to construct the default CLASSPATH entries.
>  
> So, add the default value for the tarball download setup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7047) Make HAR tool support IndexedLogAggregtionController

2018-03-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7047:
--
Fix Version/s: (was: 3.2.0)
   3.1.0

> Make HAR tool support IndexedLogAggregtionController
> 
>
> Key: MAPREDUCE-7047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7047
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: MAPREDUCE-7047.trunk.1.patch, 
> MAPREDUCE-7047.trunk.2.patch, MAPREDUCE-7047.trunk.3.patch
>
>
> In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a 
> tool to combine aggregated logs into HAR files which currently only work for 
> TFileLogAggregationFileController. We should make it support 
> IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7047) Make HAR tool support IndexedLogAggregtionController

2018-03-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7047:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, thanks [~xgong] and thanks review from [~rkanter]!

> Make HAR tool support IndexedLogAggregtionController
> 
>
> Key: MAPREDUCE-7047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7047
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7047.trunk.1.patch, 
> MAPREDUCE-7047.trunk.2.patch, MAPREDUCE-7047.trunk.3.patch
>
>
> In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a 
> tool to combine aggregated logs into HAR files which currently only work for 
> TFileLogAggregationFileController. We should make it support 
> IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7047) Make HAR tool support IndexedLogAggregtionController

2018-03-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399541#comment-16399541
 ] 

Wangda Tan commented on MAPREDUCE-7047:
---

Thanks  [~xgong]/[~rkanter], will commit tomorrow if no objections.

> Make HAR tool support IndexedLogAggregtionController
> 
>
> Key: MAPREDUCE-7047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7047
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: MAPREDUCE-7047.trunk.1.patch, 
> MAPREDUCE-7047.trunk.2.patch, MAPREDUCE-7047.trunk.3.patch
>
>
> In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a 
> tool to combine aggregated logs into HAR files which currently only work for 
> TFileLogAggregationFileController. We should make it support 
> IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7047) Make HAR tool support IndexedLogAggregtionController

2018-03-09 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393879#comment-16393879
 ] 

Wangda Tan commented on MAPREDUCE-7047:
---

+1, thanks [~xgong]. [~rkanter], do you want to take a look at the patch? I 
think you're more familiar with the code :)

> Make HAR tool support IndexedLogAggregtionController
> 
>
> Key: MAPREDUCE-7047
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7047
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Major
> Attachments: MAPREDUCE-7047.trunk.1.patch
>
>
> In https://issues.apache.org/jira/browse/MAPREDUCE-6415, we have created a 
> tool to combine aggregated logs into HAR files which currently only work for 
> TFileLogAggregationFileController. We should make it support 
> IndexedLogAggregtionController as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster

2018-02-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379344#comment-16379344
 ] 

Wangda Tan commented on MAPREDUCE-7059:
---

[~yangjiandan], thanks for reporting the issue and working on the fix. Just set 
target version to 3.1.0/3.0.1 and assigned JIRA to you, you have permission to 
assign JIRAs to yourself in the future.

> Compatibility issue: job submission fails with RpcNoSuchMethodException when 
> submitting to 2.x cluster
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Critical
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> 

[jira] [Assigned] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster

2018-02-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7059:
-

Assignee: Jiandan Yang 

> Compatibility issue: job submission fails with RpcNoSuchMethodException when 
> submitting to 2.x cluster
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Critical
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2665)
>   at 
> 

[jira] [Updated] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster

2018-02-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-7059:
--
Target Version/s: 3.1.0

> Compatibility issue: job submission fails with RpcNoSuchMethodException when 
> submitting to 2.x cluster
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Critical
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2665)
>   at 
> 

[jira] [Commented] (MAPREDUCE-7055) MR jobs are failing with Could not find or load main class for MRAppMaster

2018-02-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368761#comment-16368761
 ] 

Wangda Tan commented on MAPREDUCE-7055:
---

Thanks [~rohithsharma] for reporting this issue, I can reproduce the same issue 
with Rohith's config on latest trunk. And the problem doesn't exist after 
revert YARN-7677. 

[~jlowe], [~ebadger], can we revert YARN-7677? 

> MR jobs are failing with Could not find or load main class for MRAppMaster
> --
>
> Key: MAPREDUCE-7055
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7055
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Rohith Sharma K S
>Priority: Blocker
> Attachments: app-logs.zip, conf.zip
>
>
> It is observed that MR jobs are failing with *Error: Could not find or load 
> main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster" even though 
> HADOOP_MAPRED_HOME is set in mapred-site.xml
> Tried building tar.gz in branch-3.0 and seems works fine with same 
> configurations. But in branch-3.1 and trunk, it is failing. I got 
> launch_container.sh for both  and compared classpath  exported before 
> launching AM. Both classpath entries are same, but AM launching is failing 
> with above mentioned error. 
> Its better to confirm as 3.1 release is going to happen soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6654) Possible NPE in JobHistoryEventHandler#handleEvent

2018-02-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6654:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> Possible NPE in JobHistoryEventHandler#handleEvent
> --
>
> Key: MAPREDUCE-6654
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6654
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6654-v2.1.patch, MAPREDUCE-6654-v2.patch, 
> MAPREDUCE-6654.patch
>
>
> I have seen NPE thrown from {{JobHistoryEventHandler#handleEvent}}:
> {noformat}
> 2016-03-14 16:42:15,231 INFO [Thread-69] 
> org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:382)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1651)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1147)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:573)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:620)
> {noformat}
> In the version this exception is thrown, the 
> [line|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L586]
>  is:
> {code:java}mi.writeEvent(historyEvent);{code}
> IMHO, this may be caused by an exception in a previous step. Specifically, in 
> the kerberized environment, when creating event writer which calls to decrypt 
> EEK, the connection to KMS failed. Exception below:
> {noformat} 
> 2016-03-14 16:41:57,559 ERROR [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error 
> JobHistoryEventHandler in handleEvent: EventType: AM_STARTED
> java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:520)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:505)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:779)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:185)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>   at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1420)
>   at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedOutputStream(DFSClient.java:1522)
>   at 
> 

[jira] [Commented] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2018-02-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361777#comment-16361777
 ] 

Wangda Tan commented on MAPREDUCE-6315:
---

Thanks [~jira.shegalov], I temporarily moved it to 3.2.0, if you have cycle to 
finish, please feel free to put it to 3.1.0.

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch, MAPREDUCE-6315.002.patch, 
> MAPREDUCE-6315.003.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2018-02-12 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6315:
--
Target Version/s: 3.2.0  (was: 3.1.0)

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch, MAPREDUCE-6315.002.patch, 
> MAPREDUCE-6315.003.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6654) Possible NPE in JobHistoryEventHandler#handleEvent

2018-02-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353295#comment-16353295
 ] 

Wangda Tan commented on MAPREDUCE-6654:
---

We plan to start merge vote of 3.1.0 on Feb 18, please let me know if any plan 
to finish this by Feb 18 or we need to move it to 3.2.0.

> Possible NPE in JobHistoryEventHandler#handleEvent
> --
>
> Key: MAPREDUCE-6654
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6654
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6654-v2.1.patch, MAPREDUCE-6654-v2.patch, 
> MAPREDUCE-6654.patch
>
>
> I have seen NPE thrown from {{JobHistoryEventHandler#handleEvent}}:
> {noformat}
> 2016-03-14 16:42:15,231 INFO [Thread-69] 
> org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:382)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1651)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1147)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:573)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:620)
> {noformat}
> In the version this exception is thrown, the 
> [line|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L586]
>  is:
> {code:java}mi.writeEvent(historyEvent);{code}
> IMHO, this may be caused by an exception in a previous step. Specifically, in 
> the kerberized environment, when creating event writer which calls to decrypt 
> EEK, the connection to KMS failed. Exception below:
> {noformat} 
> 2016-03-14 16:41:57,559 ERROR [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error 
> JobHistoryEventHandler in handleEvent: EventType: AM_STARTED
> java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:520)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:505)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:779)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:185)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>   at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1420)
>   at 
> 

[jira] [Commented] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2018-02-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353294#comment-16353294
 ] 

Wangda Tan commented on MAPREDUCE-6315:
---

We plan to start merge vote of 3.1.0 on Feb 18, please let me know if any plan 
to finish this by Feb 18 or we need to move it to 3.2.0.

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch, MAPREDUCE-6315.002.patch, 
> MAPREDUCE-6315.003.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7017) Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts

2018-01-12 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned MAPREDUCE-7017:
-

Assignee: jiayuhan-it

> Too many times of meaningless invocation in TaskAttemptImpl#resolveHosts
> 
>
> Key: MAPREDUCE-7017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7017
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.0.0-alpha4
>Reporter: jiayuhan-it
>Assignee: jiayuhan-it
> Attachments: MAPREDUCE-7017.001.patch
>
>
>   MRAppMaster uses {{TaskAttemptImpl::resolveHosts}} to determine the 
> dataLocalHosts for each task when the location of data split is IP, which 
> will call a lot of times ( taskNum * dfsReplication) of function 
> {{InetAddress::getByName}} and most of the funcition calls are redundant.  
> When the job has a great number of tasks and the speed of DNS resolution is 
> not fast enough, it will take a lot of time at this stage before the job 
> running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6895) Job end notification not send due to YarnRuntimeException

2017-06-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043083#comment-16043083
 ] 

Wangda Tan commented on MAPREDUCE-6895:
---

I just realized Jason might be traveling, I'm going to commit the patch 
tomorrow if no opposite opinions.

> Job end notification not send due to YarnRuntimeException
> -
>
> Key: MAPREDUCE-6895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.4.1, 2.8.0, 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: MAPREDUCE-6895.001.patch
>
>
> MRAppMaster.this.stop() throw out YarnRuntimeException as below log shows, it 
> caused job end notification not send.
> {quote}
> 2017-05-24 12:14:02,165 WARN [Thread-693] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.nio.channels.ClosedChannelException
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:531)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:360)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1476)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1090)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:605)
> Caused by: java.nio.channels.ClosedChannelException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1528)
> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at 
> org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754)
> at 
> org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088)
> at org.apache.avro.io.JsonEncoder.flush(JsonEncoder.java:67)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:67)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:886)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:520)
> ... 11 more
> 2017-05-24 12:14:02,165 INFO [Thread-693] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye!
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6895) Job end notification not send due to YarnRuntimeException

2017-06-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037542#comment-16037542
 ] 

Wangda Tan commented on MAPREDUCE-6895:
---

Thanks [~zhaoyunjiong], however I'm not sure if this part is by intentional or 
not. Hope get to another set of eyes to take a look, [~jlowe], do you know any 
contexts of this logic?

> Job end notification not send due to YarnRuntimeException
> -
>
> Key: MAPREDUCE-6895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.4.1, 2.8.0, 2.7.3
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
> Attachments: MAPREDUCE-6895.001.patch
>
>
> MRAppMaster.this.stop() throw out YarnRuntimeException as below log shows, it 
> caused job end notification not send.
> {quote}
> 2017-05-24 12:14:02,165 WARN [Thread-693] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Graceful stop failed
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.nio.channels.ClosedChannelException
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:531)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:360)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
> at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1476)
> at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1090)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:554)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:605)
> Caused by: java.nio.channels.ClosedChannelException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1528)
> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at 
> org.codehaus.jackson.impl.Utf8Generator._flushBuffer(Utf8Generator.java:1754)
> at 
> org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1088)
> at org.apache.avro.io.JsonEncoder.flush(JsonEncoder.java:67)
> at 
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:67)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:886)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:520)
> ... 11 more
> 2017-05-24 12:14:02,165 INFO [Thread-693] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye!
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

2016-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6803:
--
Summary: MR AppMaster should assign container that is closest to the data  
(was: YARN shoud allocate container that is closest to the data)

> MR AppMaster should assign container that is closest to the data
> 
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6803) MR AppMaster should assign container that is closest to the data

2016-10-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612792#comment-15612792
 ] 

Wangda Tan commented on MAPREDUCE-6803:
---

Moved to MR. [~djp] if you have chance, could you look at the patch?

Thanks

> MR AppMaster should assign container that is closest to the data
> 
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6803) YARN shoud allocate container that is closest to the data

2016-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6803:
--
Labels: oct16-medium  (was: oct16-hard)

> YARN shoud allocate container that is closest to the data
> -
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Moved] (MAPREDUCE-6803) YARN shoud allocate container that is closest to the data

2016-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan moved YARN-3856 to MAPREDUCE-6803:
-

Affects Version/s: (was: 2.7.0)
  Component/s: (was: scheduler)
   applicationmaster
  Key: MAPREDUCE-6803  (was: YARN-3856)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> YARN shoud allocate container that is closest to the data
> -
>
> Key: MAPREDUCE-6803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
> Environment: Hadoop cluster with multi-level network hierarchy
>Reporter: jaehoon ko
>  Labels: oct16-medium
> Attachments: YARN-3856.001.patch, YARN-3856.002.patch
>
>
> Currently, given a Container request for a host, ResourceManager allocates a 
> Container with following priorities (RMContainerAllocator.java):
>  - Requested host
>  - a host in the same rack as the requested host
>  - any host
> This can lead to a sub-optimal allocation if Hadoop cluster is deployed on 
> multi-level networked hosts (which is typical). For example, let's suppose a 
> network architecture with one core switches, two aggregate switches, four ToR 
> switches, and 8 hosts. Each switch has two downlinks. Rack IDs of hosts are 
> as follows:
> h1, h2: /c/a1/t1
> h3, h4: /c/a1/t2
> h5, h6: /c/a2/t3
> h7, h8: /c/a2/t4
> To allocate a container for data in h1, Hadoop first tries h1 itself, then 
> h2, then any of h3 ~ h8. Clearly, h3 or h4 are better than h5~h8 in terms of 
> network distance and bandwidth. However, current implementation choose one 
> from h3~h8 with equal probabilities.
> This limitation is more obvious when considering hadoop clusters deployed on 
> VM or containers. In this case, only the VMs or containers running in the 
> same physical host are considered rack local, and actual rack-local hosts are 
> chosen with same probabilities as far hosts.
> The root cause of this limitation is that RMContainerAllocator.java performs 
> exact matching on rack id to find a rack local host. Alternatively, we can 
> perform longest-prefix matching to find a closest host. Using the same 
> network architecture as above, with longest-prefix matching, hosts are 
> selected with the following priorities:
>  h1
>  h2
>  h3 or h4
>  h5 or h6 or h7 or h8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6310:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk/branch-2/branch-2.8, thanks [~vinodkv] and [~gtCarrera9]!

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt, MAPREDUCE-6310-06132018.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428813#comment-15428813
 ] 

Wangda Tan commented on MAPREDUCE-6310:
---

+1 to latest patch, unit test failures are not related, will commit shortly.

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt, MAPREDUCE-6310-06132018.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-07-26 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6310:
--
Target Version/s: 2.8.0, 3.0.0-alpha1  (was: 2.8.0)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-05-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283210#comment-15283210
 ] 

Wangda Tan commented on MAPREDUCE-6513:
---

Credit to [~varun_saxena] for working on this patch!

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6108) ShuffleError OOM while reserving memory by MergeManagerImpl

2016-05-13 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved MAPREDUCE-6108.
---
Resolution: Cannot Reproduce

No responses, close as cannot reproduce, please reopen it if anybody see this 
issue again. 

Thanks.

> ShuffleError OOM while reserving memory by MergeManagerImpl
> ---
>
> Key: MAPREDUCE-6108
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6108
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.5.1
>Reporter: Dongwook Kwon
>Priority: Critical
>
> Shuffle has OOM issue from time to time.  
> Such as this email reported.
> http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201408.mbox/%3ccabwxxjnk-on0xtrmurijd8sdgjjtamsvqw2czpm3oekj3ym...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6108) ShuffleError OOM while reserving memory by MergeManagerImpl

2016-05-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281053#comment-15281053
 ] 

Wangda Tan commented on MAPREDUCE-6108:
---

[~kasha], [~vinodkv] is this still an issue in existing code base? Can we close 
as not-reproducible if it cannot be reproduced?

Thanks,

> ShuffleError OOM while reserving memory by MergeManagerImpl
> ---
>
> Key: MAPREDUCE-6108
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6108
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.5.1
>Reporter: Dongwook Kwon
>Priority: Critical
>
> Shuffle has OOM issue from time to time.  
> Such as this email reported.
> http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-dev/201408.mbox/%3ccabwxxjnk-on0xtrmurijd8sdgjjtamsvqw2czpm3oekj3ym...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

2016-05-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274660#comment-15274660
 ] 

Wangda Tan commented on MAPREDUCE-6689:
---

Yeah this is majorly caused by inaccurate headroom calculation. IMO, headroom 
responded from scheduler is not trustable. It missed many constraints, for 
example, blacklist, hard locality, etc. And it gonna be expensive to calculate 
everything inside scheduler. So MAPREDUCE-6302 is a good solution to me. 
However, we shouldn't "forget" this decision and re-schedule the reducers on 
the same shot.

Thanks for reviews, [~jlowe]/[~varun_saxena].

> MapReduce job can infinitely increase number of reducer resource requests
> -
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: MAPREDUCE-6689.1.patch
>
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273364#comment-15273364
 ] 

Wangda Tan commented on MAPREDUCE-6514:
---

[~vinodkv],

This behavior is commented by [~rohithsharma] above in above comment,
bq. As a result, number of containers count in the ask is increased as 
explained in the below...

And because of MAPREDUCE-6302, it is possible that MR AM cancel all reducer 
requests and re-add all reducer requests at the same heartbeat, so the 
#containers increases fast in RM side and finally it becomes a enormous number. 
Since MAPREDUCE-6302 is included by branch-2.6/2.7, we need to back port this 
patch to branch-2.6/2.7 as well.


> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6689:
--
Attachment: MAPREDUCE-6689.1.patch

> MapReduce job can infinitely increase number of reducer resource requests
> -
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: MAPREDUCE-6689.1.patch
>
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

2016-05-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272988#comment-15272988
 ] 

Wangda Tan commented on MAPREDUCE-6689:
---

Uploaded patch for this (on top of MAPREDUCE-6514)

> MapReduce job can infinitely increase number of reducer resource requests
> -
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: MAPREDUCE-6689.1.patch
>
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6514:
--
Target Version/s: 2.6.4, 2.8.0, 2.7.3  (was: 2.8.0)

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6514:
--
Status: Patch Available  (was: Open)

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6514:
--
Affects Version/s: (was: 2.7.1)
   2.7.2
   2.6.3

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6514:
--
Affects Version/s: (was: 2.6.3)
   (was: 2.7.2)

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6514:
--
Priority: Blocker  (was: Critical)

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6514:
--
Attachment: MAPREDUCE-6514.02.patch

Rebased patch to latest trunk, and addressed comments from [~vinodkv]:

Use rampDownReduces to cancel all reducer requests.

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

2016-05-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272774#comment-15272774
 ] 

Wangda Tan commented on MAPREDUCE-6689:
---

Thanks [~haibochen] pointing MAPREDUCE-6514.

MAPREDUCE-6514 is one of cause of this problem, but it causes big trouble after 
MAPREDUCE-6302 committed. 

Offline discussed with [~varun_saxena], I will rebase & upload a patch to 
MAPREDUCE-6514 later. And for this JIRA, I will fix cancel all then add all 
reducer requests in this JIRA.

Application log is available at: 
https://www.dropbox.com/s/ckx1z993lt4ymh2/app.log.zip?dl=0. (It is too large to 
be uploaded to JIRA)

> MapReduce job can infinitely increase number of reducer resource requests
> -
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests

2016-05-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6689:
--
Summary: MapReduce job can infinitely increase number of reducer resource 
requests  (was: MapReduce job can infinitely increasing number of reducer 
resource requests)

> MapReduce job can infinitely increase number of reducer resource requests
> -
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increasing number of reducer resource requests

2016-05-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271713#comment-15271713
 ] 

Wangda Tan commented on MAPREDUCE-6689:
---

And also, following logic seems not correct to me:
{code}
  private void clearAllPendingReduceRequests() {
LOG.info("Ramping down all scheduled reduces:"
+ scheduledRequests.reduces.size());
for (ContainerRequest req : scheduledRequests.reduces.values()) {
  pendingReduces.add(req);
}
scheduledRequests.reduces.clear();
  }
{code}

Instead of calling {{scheduledRequests.reducers.clear()}}, it should call 
{{decContainerReq}} for each {{scheduledRequests.reduces}}. Existing logic will 
not modify remoteRequest table.

> MapReduce job can infinitely increasing number of reducer resource requests
> ---
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increasing number of reducer resource requests

2016-05-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271710#comment-15271710
 ] 

Wangda Tan commented on MAPREDUCE-6689:
---

One of the quick solution for this issue is: modify {{preemptReducesIfNeeded}} 
to returned if preemption happens. If preemption happens, skip the next 
{{scheduleReduces}}.

CC: [~kasha], [~jlowe].

> MapReduce job can infinitely increasing number of reducer resource requests
> ---
>
> Key: MAPREDUCE-6689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
>
> We have seen this issue from one of our clusters: when running terasort 
> map-reduce job, some mappers failed after reducer started, and then MR AM 
> tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every 
> RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
> (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every 
> MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so 
> we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling 
> anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6689) MapReduce job can infinitely increasing number of reducer resource requests

2016-05-04 Thread Wangda Tan (JIRA)
Wangda Tan created MAPREDUCE-6689:
-

 Summary: MapReduce job can infinitely increasing number of reducer 
resource requests
 Key: MAPREDUCE-6689
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker


We have seen this issue from one of our clusters: when running terasort 
map-reduce job, some mappers failed after reducer started, and then MR AM tries 
to preempt reducers to schedule these failed mappers.

After that, MR AM enters an infinite loop, for every 
RMContainerAllocator#heartbeat run, it:

- In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. 
(total scheduled reducers = 1024)
- Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).

As a result, we can see total #requested-containers increased 1024 for every 
MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so we 
get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.

And this bug also triggered YARN-4844, which makes RM stop scheduling anything.

Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-19 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-5817:
--
Fix Version/s: (was: 2.8.0)
   2.7.3

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248114#comment-15248114
 ] 

Wangda Tan commented on MAPREDUCE-5817:
---

Updated fix version.

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.7.3
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-18 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6513:
--
Attachment: MAPREDUCE-6513.3_1.branch-2.7.patch

Rebased branch-2.7 patch.

Since MAPREDUCE-6513 is on top of MAPREDUCE-5465, and scope of MAPREDUCE-5465 
seems too big to pull into branch-2.7. I just manually resolved a couple of 
conflicts. Ran related unit tests, all passed.

[~varun_saxena], [~vinodkv], could you take a final look at attached patch?

Thanks,

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.7.patch, MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247181#comment-15247181
 ] 

Wangda Tan commented on MAPREDUCE-5817:
---

Thanks [~sjlee0], committing now.

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242220#comment-15242220
 ] 

Wangda Tan commented on MAPREDUCE-6513:
---

Committed to branch-2.8.

We need to backport MAPREDUCE-5817 to branch-2.7 before this patch, otherwise 
it will cause a couple of conflicts. Waiting for suggestions from Sangjin and 
Karthik regarding backporting of MAPREDUCE-5817.

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242217#comment-15242217
 ] 

Wangda Tan commented on MAPREDUCE-5817:
---

Forgot to mention: MAPREDUCE-6513 depends on this patch. It can apply to 
branch-2.7 cleanly after backporting this patch.

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed

2016-04-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242211#comment-15242211
 ] 

Wangda Tan commented on MAPREDUCE-5817:
---

[~sjlee0]/[~kasha],

Should we backport this patch to branch-2.7? Any concerns?

Thanks,

> Mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, 
> mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6513:
--
Attachment: MAPREDUCE-6513.3_1.branch-2.8.patch

Committed MAPREDUCE-4785 to branch-2.7/branch-2.8. Attached a new patch.

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch, 
> MAPREDUCE-6513.3_1.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4785) TestMRApp occasionally fails

2016-04-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242000#comment-15242000
 ] 

Wangda Tan commented on MAPREDUCE-4785:
---

Backported this issue to branch-2.8 and branch-2.7, MAPREDUCE-6513 depends on 
it.

> TestMRApp occasionally fails
> 
>
> Key: MAPREDUCE-4785
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4785
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, test
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Fix For: 2.7.3
>
> Attachments: mapreduce4785.001.patch, mapreduce4785.prelim.patch
>
>
> TestMRApp is failing occasionally with this error:
> {noformat}
> testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestMRApp): Expecting 2 
> more completion events for killed expected:<4> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4785) TestMRApp occasionally fails

2016-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-4785:
--
Fix Version/s: (was: 2.9.0)
   2.7.3

> TestMRApp occasionally fails
> 
>
> Key: MAPREDUCE-4785
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4785
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, test
>Affects Versions: 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Fix For: 2.7.3
>
> Attachments: mapreduce4785.001.patch, mapreduce4785.prelim.patch
>
>
> TestMRApp is failing occasionally with this error:
> {noformat}
> testUpdatedNodes(org.apache.hadoop.mapreduce.v2.app.TestMRApp): Expecting 2 
> more completion events for killed expected:<4> but was:<2>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6513:
--
Attachment: MAPREDUCE-6513.3.branch-2.8.patch

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6513:
--
Attachment: (was: MAPREDUCE-6513-1-branch-2.8.patch)

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch, MAPREDUCE-6513.3.branch-2.8.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-14 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated MAPREDUCE-6513:
--
Attachment: MAPREDUCE-6513-1-branch-2.8.patch

Committed to branch-2 / trunk.

Thanks [~varun_saxena] for working on the patch, and thanks 
[~devaraj.k]/[~cchen317]/[~sunilg]/[~vinodkv]/[~rohithsharma] for reviews!

Rebased & attached patch for branch-2.8, pending Jenkins.

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513-1-branch-2.8.patch, 
> MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, MAPREDUCE-6513.03.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >