[jira] [Assigned] (HADOOP-1540) Support file exclusion list in distcp

2020-04-17 Thread Steven Rand (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand reassigned HADOOP-1540:
---

Assignee: (was: Steven Rand)

> Support file exclusion list in distcp
> -
>
> Key: HADOOP-1540
> URL: https://issues.apache.org/jira/browse/HADOOP-1540
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 2.6.0
>Reporter: Senthil Subramanian
>Priority: Minor
>  Labels: BB2015-05-TBR, distcp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-1540.009.patch
>
>
> There should be a way to ignore specific paths (eg: those that have already 
> been copied over under the current srcPath). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16569) improve TLS performance for NativeAzureFileSystem

2019-09-12 Thread Steven Rand (Jira)
Steven Rand created HADOOP-16569:


 Summary: improve TLS performance for NativeAzureFileSystem
 Key: HADOOP-16569
 URL: https://issues.apache.org/jira/browse/HADOOP-16569
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Steven Rand


Several tickets have already discussed the performance issues of GCM ciphers on 
jdk8 in relation to cloud connectors:

* HADOOP-16050, HADOOP-16371 (s3a)
* HADOOP-15669 (abfs)
* HADOOP-15965 (adls)

However, we haven't gotten around to fixing this for Azure Blob Store yet (the 
wasbs scheme).

It would be helpful to reuse the {{SSLSocketFactoryEx}} class that was added in 
HADOOP-15669. The {{getInstrumentedContext}} method in 
{{AzureNativeFileSystemStore}} can already mutate the connection to the blob 
store and inject that socket factory, so it's not a hard change to make.

This isn't necessarily blocked on HADOOP-16371, but it would  be helpful if we 
merged that PR first, since it moves the custom {{SSLSocketFactory}} that uses 
wildfly-openssl out of the code that's specific to ABFS and into a place where 
the Azure Blob Store code can access it.

Tagging [~stakiar] since you've been working on this for the other cloud 
connectors



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16264) [JDK11] Track failing Hadoop unit tests

2019-04-23 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824699#comment-16824699
 ] 

Steven Rand commented on HADOOP-16264:
--

[~smeng], would it makes more sense to run the tests on trunk instead of 
branch-3.1.2? My guess is that on branch-3.1.2 you're going to run into a lot 
of failures that have already been fixed, e.g., HADOOP-12760 like you mention 
above.

> [JDK11] Track failing Hadoop unit tests
> ---
>
> Key: HADOOP-16264
> URL: https://issues.apache.org/jira/browse/HADOOP-16264
> Project: Hadoop Common
>  Issue Type: Task
>Affects Versions: 3.1.2
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: test-run1.tgz
>
>
> Although there are still a lot of work before we could compile Hadoop with 
> JDK 11 (HADOOP-15338), it is possible to compile Hadoop with JDK 8 and run 
> (e.g. HDFS NN/DN,YARN NM/RM) on JDK 11 at this moment.
> But after compiling branch-3.1.2 with JDK 8, I ran unit tests with JDK 11 and 
> there are a LOT of unit test failures (44 out of 96 maven projects contain at 
> least one unit test failures according to maven reactor summary). This may 
> well indicate some functionalities are actually broken on JDK 11. Some of 
> them already have a jira number. Some of them might have been fixed in 3.2.0. 
> Some of them might share the same root cause.
> By definition, this jira should be part of HADOOP-15338. But the goal of this 
> one is just to keep track of unit test failures and (hopefully) resolve all 
> of them soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16000) Remove TLSv1 and SSLv2Hello from the default value of hadoop.ssl.enabled.protocols

2018-12-11 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718486#comment-16718486
 ] 

Steven Rand commented on HADOOP-16000:
--

+1, though I think it's worth noting that {{hadoop.ssl.enabled.protocols}} has 
a limited impact until HADOOP-15169 is fixed.

> Remove TLSv1 and SSLv2Hello from the default value of 
> hadoop.ssl.enabled.protocols
> --
>
> Key: HADOOP-16000
> URL: https://issues.apache.org/jira/browse/HADOOP-16000
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Akira Ajisaka
>Assignee: Gabor Bota
>Priority: Major
>
> {code:title=core-default.xml}
>   public static final String SSL_ENABLED_PROTOCOLS_DEFAULT =
>   "TLSv1,SSLv2Hello,TLSv1.1,TLSv1.2";
> {code}
> TLSv1 and SSLv2Hello are considered to be vulnerable. Let's remove these by 
> default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15811) Optimizations for Java's TLS performance

2018-10-02 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636316#comment-16636316
 ] 

Steven Rand commented on HADOOP-15811:
--

Worth noting, the intrinsic methods mentioned above are on by default starting 
in 8u152 ([https://bugs.openjdk.java.net/browse/JDK-8154945,] 
[https://www.oracle.com/technetwork/java/javase/8u152-relnotes-3850503.html).]  

> Optimizations for Java's TLS performance
> 
>
> Key: HADOOP-15811
> URL: https://issues.apache.org/jira/browse/HADOOP-15811
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 1.0.0
>Reporter: Daryn Sharp
>Priority: Major
>
> Java defaults to using /dev/random and disables intrinsic methods used in hot 
> code paths.  Both cause highly synchronized impls to be used that 
> significantly degrade performance.
> * -Djava.security.egd=file:/dev/urandom
> * -XX:+UseMontgomerySquareIntrinsic
> * -XX:+UseMontgomeryMultiplyIntrinsic
> * -XX:+UseSquareToLenIntrinsic
> * -XX:+UseMultiplyToLenIntrinsic
> These settings significantly boost KMS server performance.  Under load, 
> threads are not jammed in the SSLEngine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2018-07-23 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553421#comment-16553421
 ] 

Steven Rand commented on HADOOP-14937:
--

Sorry, I've dropped the ball on this one, and haven't investigated further. I 
don't imagine I'll be able to get to it anytime soon either – feel free to 
reassign or close.

> initial part uploads seem to block unnecessarily in S3ABlockOutputStream
> 
>
> Key: HADOOP-14937
> URL: https://issues.apache.org/jira/browse/HADOOP-14937
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: yjp_threads.png
>
>
> From looking at a YourKit snapshot of an FsShell process running a {{hadoop 
> fs -put file:///... s3a://...}}, it seems that the first part in the 
> multipart upload doesn't begin to upload until n of the 
> {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is 
> the value of {{fs.s3a.fast.upload.active.blocks}}.
> To hopefully clarify a bit, the series of events that I expected to see with 
> {{fs.s3a.fast.upload.active.blocks}} set to 4 is:
> 1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
> off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
> 2. As soon as that happens, a thread begins to upload that part. Meanwhile, 
> the main thread continues to buffer data into off-heap memory.
> 3. Once another part has been buffered into off-heap memory, a separate 
> thread uploads that part, and so on.
> Whereas what I think the YK snapshot shows happening is:
> 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
> off-heap memory.
> 2. Four threads start to upload one part each at the same time.
> I've attached a picture of the "Threads" tab to show what I mean. Basically 
> the times at which the first four {{s3a-transfer-shared-pool}} threads start 
> to upload are roughly the same, whereas I would've expected them to be more 
> staggered.
> I'm actually not sure whether this is the expected behavior or not, so feel 
> free to close if this doesn't come as a surprise to anyone.
> For some context, I've been trying to get a sense for roughly which values of 
> {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
> that I found confusing is that a part size of 5 MB seems to outperform a part 
> size of 64 MB up until files that are upwards of about 500 MB in size. This 
> seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
> would've expected the overhead of those to become costly at small part sizes. 
> My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
> to wait until 256 MB are buffered before we can start uploading, while with 5 
> MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
> gives the smaller parts the advantage for smaller files.
> I'm happy to submit a patch if this is in fact a problem, but wanted to check 
> to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2017-10-09 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196808#comment-16196808
 ] 

Steven Rand commented on HADOOP-14937:
--

Understood, thanks for the info. I'll try to have a look through the code over 
the next few days.

> initial part uploads seem to block unnecessarily in S3ABlockOutputStream
> 
>
> Key: HADOOP-14937
> URL: https://issues.apache.org/jira/browse/HADOOP-14937
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
> Attachments: yjp_threads.png
>
>
> From looking at a YourKit snapshot of an FsShell process running a {{hadoop 
> fs -put file:///... s3a://...}}, it seems that the first part in the 
> multipart upload doesn't begin to upload until n of the 
> {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is 
> the value of {{fs.s3a.fast.upload.active.blocks}}.
> To hopefully clarify a bit, the series of events that I expected to see with 
> {{fs.s3a.fast.upload.active.blocks}} set to 4 is:
> 1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
> off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
> 2. As soon as that happens, a thread begins to upload that part. Meanwhile, 
> the main thread continues to buffer data into off-heap memory.
> 3. Once another part has been buffered into off-heap memory, a separate 
> thread uploads that part, and so on.
> Whereas what I think the YK snapshot shows happening is:
> 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
> off-heap memory.
> 2. Four threads start to upload one part each at the same time.
> I've attached a picture of the "Threads" tab to show what I mean. Basically 
> the times at which the first four {{s3a-transfer-shared-pool}} threads start 
> to upload are roughly the same, whereas I would've expected them to be more 
> staggered.
> I'm actually not sure whether this is the expected behavior or not, so feel 
> free to close if this doesn't come as a surprise to anyone.
> For some context, I've been trying to get a sense for roughly which values of 
> {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
> that I found confusing is that a part size of 5 MB seems to outperform a part 
> size of 64 MB up until files that are upwards of about 500 MB in size. This 
> seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
> would've expected the overhead of those to become costly at small part sizes. 
> My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
> to wait until 256 MB are buffered before we can start uploading, while with 5 
> MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
> gives the smaller parts the advantage for smaller files.
> I'm happy to submit a patch if this is in fact a problem, but wanted to check 
> to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2017-10-08 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14937:
-
Attachment: yjp_threads.png

> initial part uploads seem to block unnecessarily in S3ABlockOutputStream
> 
>
> Key: HADOOP-14937
> URL: https://issues.apache.org/jira/browse/HADOOP-14937
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
> Attachments: yjp_threads.png
>
>
> From looking at a YourKit snapshot of an FsShell process running a {{hadoop 
> fs -put file:///... s3a://...}}, it seems that the first part in the 
> multipart upload doesn't begin to upload until n of the 
> {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is 
> the value of {{fs.s3a.fast.upload.active.blocks}}.
> To hopefully clarify a bit, the series of events that I expected to see with 
> {{fs.s3a.fast.upload.active.blocks}} set to 4 is:
> 1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
> off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
> 2. As soon as that happens, a thread begins to upload that part. Meanwhile, 
> the main thread continues to buffer data into off-heap memory.
> 3. Once another part has been buffered into off-heap memory, a separate 
> thread uploads that part, and so on.
> Whereas what I think the YK snapshot shows happening is:
> 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
> off-heap memory.
> 2. Four threads start to upload one part each at the same time.
> I've attached a picture of the "Threads" tab to show what I mean. Basically 
> the times at which the first four {{s3a-transfer-shared-pool}} threads start 
> to upload are roughly the same, whereas I would've expected them to be more 
> staggered.
> I'm actually not sure whether this is the expected behavior or not, so feel 
> free to close if this doesn't come as a surprise to anyone.
> For some context, I've been trying to get a sense for roughly which values of 
> {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
> that I found confusing is that a part size of 5 MB seems to outperform a part 
> size of 64 MB up until files that are upwards of about 500 MB in size. This 
> seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
> would've expected the overhead of those to become costly at small part sizes. 
> My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
> to wait until 256 MB are buffered before we can start uploading, while with 5 
> MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
> gives the smaller parts the advantage for smaller files.
> I'm happy to submit a patch if this is in fact a problem, but wanted to check 
> to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2017-10-08 Thread Steven Rand (JIRA)
Steven Rand created HADOOP-14937:


 Summary: initial part uploads seem to block unnecessarily in 
S3ABlockOutputStream
 Key: HADOOP-14937
 URL: https://issues.apache.org/jira/browse/HADOOP-14937
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-beta1
Reporter: Steven Rand


>From looking at a YourKit snapshot of an FsShell process running a {{hadoop fs 
>-put file:///... s3a://...}}, it seems that the first part in the multipart 
>upload doesn't begin to upload until n of the {{s3a-transfer-shared-pool}} 
>threads are able to start uploading, where n is the value of 
>{{fs.s3a.fast.upload.active.blocks}}.

To hopefully clarify a bit, the series of events that I expected to see with 
{{fs.s3a.fast.upload.active.blocks}} set to 4 is:

1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
2. As soon as that happens, a thread begins to upload that part. Meanwhile, the 
main thread continues to buffer data into off-heap memory.
3. Once another part has been buffered into off-heap memory, a separate thread 
uploads that part, and so on.

Whereas what I think the YK snapshot shows happening is:

1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
off-heap memory.
2. Four threads start to upload one part each at the same time.

I've attached a picture of the "Threads" tab to show what I mean. Basically the 
times at which the first four {{s3a-transfer-shared-pool}} threads start to 
upload are roughly the same, whereas I would've expected them to be more 
staggered.

I'm actually not sure whether this is the expected behavior or not, so feel 
free to close if this doesn't come as a surprise to anyone.

For some context, I've been trying to get a sense for roughly which values of 
{{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
that I found confusing is that a part size of 5 MB seems to outperform a part 
size of 64 MB up until files that are upwards of about 500 MB in size. This 
seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
would've expected the overhead of those to become costly at small part sizes. 
My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
to wait until 256 MB are buffered before we can start uploading, while with 5 
MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
gives the smaller parts the advantage for smaller files.

I'm happy to submit a patch if this is in fact a problem, but wanted to check 
to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13712) S3A open to avoid needless HEAD on the successful execution path

2017-09-11 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162390#comment-16162390
 ] 

Steven Rand commented on HADOOP-13712:
--

[~ste...@apache.org], I'm wondering whether it would be reasonable to add a new 
method to S3AFileSystem which is similar to {{open()}}, except that:

* The caller is responsible for providing the length of the file.
* The caller accepts that not all guarantees of {{FileSystem.open}} apply, 
i.e., we won't raise an FNFE if the file doesn't exist.
* We don't call {{getFileStatus}}, and instead just use the given length when 
constructing the S3AInputStream.

That way most callers can continue to call S3AFileSystem.open (and won't be 
affected), while callers who already know the length of the file and are okay 
with the weaker guarantees can use the new method and skip the getFileStatus 
call. The use case I have in mind is applications that already make a call to 
an external metastore/catalog type thing before trying to read a file, and get 
the info about its length and existence from there.

Do you think this would be a reasonable addition? If so I'm happy to submit a 
patch.

> S3A open to avoid needless HEAD on the successful execution path
> 
>
> Key: HADOOP-13712
> URL: https://issues.apache.org/jira/browse/HADOOP-13712
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>
> S3A's open() operation does a {{getFileStatus()}} check to see if a file is 
> not a directory before opening with a GET. That initial check will take up at 
> least one HEAD request if the file is present, more if it isn't.
> As the GET itself performs the existence check, it is needless. A successful 
> GET of a path which doesn't end in "/" means a file was there. The only 
> reason a getFileStatus call is needed is to choose which error message to 
> display if the path isn't there: is it an FNFE or is it path-is-directory.
> Proposed: reorder the code to do the GET; only if that fails fallback to 
> getFileStatus()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-1540) Support file exclusion list in distcp

2017-03-27 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand reassigned HADOOP-1540:
---

Assignee: Steven Rand  (was: Rich Haase)

> Support file exclusion list in distcp
> -
>
> Key: HADOOP-1540
> URL: https://issues.apache.org/jira/browse/HADOOP-1540
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: util
>Affects Versions: 2.6.0
>Reporter: Senthil Subramanian
>Assignee: Steven Rand
>Priority: Minor
>  Labels: BB2015-05-TBR, distcp
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-1540.009.patch
>
>
> There should be a way to ignore specific paths (eg: those that have already 
> been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14030) PreCommit TestKDiag failure

2017-03-08 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902534#comment-15902534
 ] 

Steven Rand commented on HADOOP-14030:
--

Also seeing this in HADOOP-14062.

> PreCommit TestKDiag failure
> ---
>
> Key: HADOOP-14030
> URL: https://issues.apache.org/jira/browse/HADOOP-14030
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.0.0-alpha3
>Reporter: John Zhuge
>
> https://builds.apache.org/job/PreCommit-HADOOP-Build/11523/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
> {noformat}
> Tests run: 13, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 2.175 sec 
> <<< FAILURE! - in org.apache.hadoop.security.TestKDiag
> testKeytabAndPrincipal(org.apache.hadoop.security.TestKDiag)  Time elapsed: 
> 0.05 sec  <<< ERROR!
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> f...@example.com from keytab 
> /testptch/hadoop/hadoop-common-project/hadoop-common/target/keytab 
> javax.security.auth.login.LoginException: Unable to obtain password from user
>   at 
> com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1355)
>   at org.apache.hadoop.security.KDiag.loginFromKeytab(KDiag.java:630)
>   at org.apache.hadoop.security.KDiag.execute(KDiag.java:396)
>   at org.apache.hadoop.security.KDiag.run(KDiag.java:236)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.security.KDiag.exec(KDiag.java:1047)
>   at org.apache.hadoop.security.TestKDiag.kdiag(TestKDiag.java:119)
>   at 
> org.apache.hadoop.security.TestKDiag.testKeytabAndPrincipal(TestKDiag.java:162)
> testFileOutput(org.apache.hadoop.security.TestKDiag)  Time elapsed: 0.033 sec 
>  <<< ERROR!
> org.apache.hadoop.security.KerberosAuthException: Login failure for user: 
> f...@example.com from keytab 
> /testptch/hadoop/hadoop-common-project/hadoop-common/target/keytab 
> javax.security.auth.login.LoginException: Unable to obtain password from user
>   at 
> com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1355)
>   at org.apache.hadoop.security.KDiag.loginFromKeytab(KDiag.java:630)
>   at org.apache.hadoop.security.KDiag.execute(KDiag.java:396)
>   at org.apache.hadoop.security.KDiag.run(KDiag.java:236)
>   at 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-08 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902526#comment-15902526
 ] 

Steven Rand commented on HADOOP-14062:
--

It looks like HADOOP-14030 strikes again. Should I re-trigger Jenkins to try to 
get a green build?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062.004.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.dummy.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-08 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Status: Patch Available  (was: Reopened)

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062.004.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.dummy.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-08 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: HADOOP-14062.004.patch

Attaching a new patch for trunk.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062.004.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.dummy.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-08 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902008#comment-15902008
 ] 

Steven Rand commented on HADOOP-14062:
--

Yes, will do, hopefully by the end of today.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.dummy.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-08 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901942#comment-15901942
 ] 

Steven Rand commented on HADOOP-14062:
--

Yes, it looks like the changes in YARN-6218 conflict with this change. For 
example, in that commit, the method {{tearDown}} was renamed to {{teardown}}, 
so in this patch we call a method that doesn't exist anymore. Want me to make a 
separate patch to fix this?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.dummy.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-08 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901826#comment-15901826
 ] 

Steven Rand commented on HADOOP-14062:
--

Thanks [~jianhe] for reviewing!

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.dummy.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-07 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900470#comment-15900470
 ] 

Steven Rand commented on HADOOP-14062:
--

Looks like tests are also broken with the dummy patch?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.dummy.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-06 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897814#comment-15897814
 ] 

Steven Rand commented on HADOOP-14062:
--

Looks like Jenkins hasn't tested the dummy patch -- do I need to click "Submit 
Patch" to make it do that?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> HADOOP-14062-branch-2.8.0.dummy.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-03-02 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892915#comment-15892915
 ] 

Steven Rand commented on HADOOP-14062:
--

[~jianhe], I'm not really sure what to do about this since we can't get the 
tests to pass, but the failures don't seem related to the patch as far as I can 
tell. Do you have any suggestions for how to proceed?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-23 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881130#comment-15881130
 ] 

Steven Rand commented on HADOOP-14062:
--

For me when I run with the patch on branch-2.8.0 I get:

{code}
  TestApplicationClientProtocolOnHA.testGetApplicationReportOnHA:69 »  test 
time...
  TestApplicationClientProtocolOnHA.testCancelDelegationTokenOnHA:213 »  test 
ti...
  
TestApplicationClientProtocolOnHA.testGetClusterNodesOnHA:104->Object.wait:502->Object.wait:-2
 »
{code}

And then without the patch:

{code}
  TestApplicationClientProtocolOnHA.testCancelDelegationTokenOnHA:213 »  test 
ti...
  TestApplicationClientProtocolOnHA.testGetQueueInfoOnHA:113 »  test timed out 
a...
  TestApplicationClientProtocolOnHA.testForceKillApplicationOnHA:189 »  test 
tim...
  TestApplicationClientProtocolOnHA.testGetContainerReportOnHA:150 »  test 
timed...
  TestResourceTrackerOnHA.testResourceTrackerOnHA:64 »  test timed out after 
150..
{code}

But it doesn't seem deterministic -- I get different failures on different test 
runs, both with and without the patch.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-23 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880957#comment-15880957
 ] 

Steven Rand commented on HADOOP-14062:
--

[~jianhe], I can't tell how real the test failures are. I ran the ones 
mentioned above locally with the same patch for branch-2.8.0 and in most cases 
they succeeded:

* I ran TestGetGroups once locally and it succeeded.
* I ran TestAMRMProxy three times locally. Once it failed, twice it succeeded.
* I ran TestYarnCLI once locally and it succeeded.
* I ran TestAMRMClient several times locally both with and without my patch. 
Every time at least one test failed, but it was inconsistent which test or 
tests timed out.
* I ran TestYarnClient once locally and it succeeded.
* I ran TestNMClient once locally and it succeed.

How do you recommend I proceed? 

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-22 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878293#comment-15878293
 ] 

Steven Rand commented on HADOOP-14062:
--

[~jianhe], it looks like the test failure is unrelated, since other people are 
seeing it too in HADOOP-14030?

The patch {{HADOOP-14062.003.patch}} is for trunk, and 
{{HADOOP-14062-branch-2.8.0.005.patch}} is for branch-2.8.0. Should I also make 
a separate patch file for branch-2?

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-17 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872658#comment-15872658
 ] 

Steven Rand commented on HADOOP-14062:
--

For me the tests also succeed if I comment out that code, but I think it's only 
because the new test happens to run last. When I add 
{{@FixMethodOrder(MethodSorters.NAME_ASCENDING)}} to the class, the test that 
runs immediately after the new test ({{testAllocationWithBlacklist}}) fails if 
I comment out that code. I think it's because that test calls 
{{amClient.init(conf)}}, and since we call 
{{conf.unset(CommonConfigurationKeysPublic.HADOOP_RPC_PROTECTION);}} in the new 
test, there's a mismatch between the client and the server.

So I think the two options are:

1. Don't remove the code in your above comment
2. Do remove that code, but also remove 
{{conf.unset(CommonConfigurationKeysPublic.HADOOP_RPC_PROTECTION);}}. When I do 
this all tests succeed regardless of order. 

[~jianhe] I'll defer to you on which option you prefer. Both seem okay to me. 
The first is a smaller change, since if we do the second, all tests after 
{{testAMRMClientWithSaslEncryption}} run with SASL RPC.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-15 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: HADOOP-14062.003.patch

[~jianhe] I've attached new patches which create a single new test in 
{{TestAMRMClient}}. There's some weirdness in that they have to create a new 
instance of {{MiniYARNCluster}} just for that one test, and I wasn't sure of 
the best way to handle it, so happy to further edit the patches if there's a 
better way.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062.003.patch, HADOOP-14062-branch-2.8.0.004.patch, 
> HADOOP-14062-branch-2.8.0.005.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-14 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: HADOOP-14062-branch-2.8.0.005.patch

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, HADOOP-14062-branch-2.8.0.005.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-09 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: (was: YARN-6013-branch-2.8.0.003.patch)

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-09 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: (was: YARN-6013-branch-2.8.0.002.patch)

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-09 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: HADOOP-14062.002.patch
HADOOP-14062-branch-2.8.0.004.patch

Attaching updated patches for branch-2.8.0 and trunk.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, HADOOP-14062.002.patch, 
> HADOOP-14062-branch-2.8.0.004.patch, YARN-6013-branch-2.8.0.002.patch, 
> YARN-6013-branch-2.8.0.003.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-08 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858853#comment-15858853
 ] 

Steven Rand commented on HADOOP-14062:
--

No, I unfortunately have no clue. Honestly I'm fairly perplexed by the whole 
situation.

I was hoping that [~daryn] might be able to weigh in, since from looking at the 
history for {{Client}} and {{SaslRpcClient}} it seems like you have a good 
understanding of this code, and were able to fix a break in SASL RPC encryption 
in HADOOP-9816.

In any case, I'll add a comment and will fix the patch.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, 
> YARN-6013-branch-2.8.0.002.patch, YARN-6013-branch-2.8.0.003.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-08 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: HADOOP-14062.001.patch

Reproduced the issue on trunk, and also that the same change to {{Client}} 
makes the tests pass. Attaching the same patch for trunk.

Today I'll try to find a nicer way of writing a unit test.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: HADOOP-14062.001.patch, 
> YARN-6013-branch-2.8.0.002.patch, YARN-6013-branch-2.8.0.003.patch, 
> yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-07 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857540#comment-15857540
 ] 

Steven Rand commented on HADOOP-14062:
--

Sorry, I can't get trunk to compile cleanly in IntelliJ, so I can't run any 
tests. I'll try again tomorrow, and if I can't run the tests I can just make a 
dist and deploy that + run TestDFSIO with hadoop.rpc.protection=privacy set.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, 
> YARN-6013-branch-2.8.0.003.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-07 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857477#comment-15857477
 ] 

Steven Rand commented on HADOOP-14062:
--

I'm not sure, but I'll check. I was targeting 2.8 since the 2.8.0 release is 
happening soon.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, 
> YARN-6013-branch-2.8.0.003.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
>   at 
> 

[jira] [Updated] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-07 Thread Steven Rand (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rand updated HADOOP-14062:
-
Attachment: YARN-6013-branch-2.8.0.003.patch

I'm attaching a patch with a unit test, but would definitely want to refactor 
the way the test is done. Ideally I think I'd add a test to either 
{{TestSaslRPC}} or {{TestAMRMClient}}, but I haven't found a way of doing so 
yet, so the patch in its current form duplicates lots of code in 
{{TestAMRMClient}}. 

I'll work on improving it, but wanted to have at least something by the end of 
today, and this approach was the most obvious / easiest to implement.

The test in {{TestAMRMClientWithEncryptedRpc}} fails without the change to 
{{Client}}, and succeeds with the change.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, 
> YARN-6013-branch-2.8.0.003.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855464#comment-15855464
 ] 

Steven Rand commented on HADOOP-14062:
--

[~jianhe], I haven't found a good way to add a unit test for this issue to 
{{org.apache.hadoop.ipc.TestSaslRPC}} yet. I will keep trying tomorrow.

I did notice that in {{org.apache.hadoop.yarn.client.api.impl.TestAMRMClient}}, 
if I add {{conf.set(CommonConfigurationKeysPublic.HADOOP_RPC_PROTECTION, 
"privacy");}} to the {{setup()}} method, then these tests fail with the same 
EOFException as in the description of this JIRA:

* testAMRMClient
* testAMRMClientMatchStorage

However, after I apply my patch, all of the tests in {{TestAMRMClient}} 
succeed, which seem like a good sign.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855312#comment-15855312
 ] 

Steven Rand commented on HADOOP-14062:
--

Correct, TestDFSIO fails without the patch and succeeds with the patch. The 
reason why TestDFSIO fails is because of the issue with 
{{ApplicationMasterProtocolPBClientImpl.allocate}} that I initially reported -- 
it's the same problem, just reproduced with MapReduce instead of with Spark.

I am working on a unit test now and will hopefully have a new patch by the end 
of tonight.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> 

[jira] [Comment Edited] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855170#comment-15855170
 ] 

Steven Rand edited comment on HADOOP-14062 at 2/7/17 3:25 AM:
--

[~jianhe], the server logs are for a different time range, but correspond to 
another instance of the same problem happening. I have never seen any errors or 
warning in the RM's log when this problem occurs -- it appears to be entirely 
client-side. I can reproduce the issue again and attach AM and RM logs from the 
same time if it would be helpful, but the contents of the RM log will be the 
same as they are in the current attachment.

I will write a unit test, but am also hoping for feedback on whether the 
approach taken in the current patch even makes sense. I can't tell whether the 
problem is that the unwrapped input stream needs to be wrapped in a 
{{BufferedInputStream}}, or whether it's that the first four bytes of the 
unwrapped input stream are supposed to be the length of the stream but instead 
are something else.

EDIT: I forgot to say that yes, I have tested my patch using TestDFSIO and the 
patch resolves the issue as far as I can tell.


was (Author: steven rand):
[~jianhe], the server logs are for a different time range, but correspond to 
another instance of the same problem happening. I have never seen any errors or 
warning in the RM's log when this problem occurs -- it appears to be entirely 
client-side. I can reproduce the issue again and attach AM and RM logs from the 
same time if it would be helpful, but the contents of the RM log will be the 
same as they are in the current attachment.

I will write a unit test, but am also hoping for feedback on whether the 
approach taken in the current patch even makes sense. I can't tell whether the 
problem is that the unwrapped input stream needs to be wrapped in a 
{{BufferedInputStream}}, or whether it's that the first four bytes of the 
unwrapped input stream are supposed to be the length of the stream but instead 
are something else.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at 

[jira] [Commented] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled

2017-02-06 Thread Steven Rand (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855170#comment-15855170
 ] 

Steven Rand commented on HADOOP-14062:
--

[~jianhe], the server logs are for a different time range, but correspond to 
another instance of the same problem happening. I have never seen any errors or 
warning in the RM's log when this problem occurs -- it appears to be entirely 
client-side. I can reproduce the issue again and attach AM and RM logs from the 
same time if it would be helpful, but the contents of the RM log will be the 
same as they are in the current attachment.

I will write a unit test, but am also hoping for feedback on whether the 
approach taken in the current patch even makes sense. I can't tell whether the 
problem is that the unwrapped input stream needs to be wrapped in a 
{{BufferedInputStream}}, or whether it's that the first four bytes of the 
unwrapped input stream are supposed to be the length of the stream but instead 
are something else.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when 
> RPC privacy is enabled
> --
>
> Key: HADOOP-14062
> URL: https://issues.apache.org/jira/browse/HADOOP-14062
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Steven Rand
>Priority: Critical
> Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), 
> {{ApplicationMasterProtocolPBClientImpl.allocate}} sometimes (but not always) 
> fails with an EOFException. I've reproduced this with Spark 2.0.2 built 
> against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, 
> org.apache.commons.lang.RandomStringUtils.random(1024, 
> "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData 
> hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
> setup for JobId: job_1482189777425_0004, File: 
> hdfs://:8020/tmp/hadoop-yarn/staging//.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482189777425_0004: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit= knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking 
> ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying after 
> sleeping for 3ms.
> java.io.EOFException: End of File Exception between local host is: 
> "/"; destination host is: "":8030; 
> : java.io.EOFException; For more details see:  
> http://wiki.apache.org/hadoop/EOFException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1428)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1338)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy80.allocate(Unknown Source)
>   at 
>