[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2020-02-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033751#comment-17033751
 ] 

Steve Loughran commented on HADOOP-13811:
-

HADOOP-16823 shows we get this under load of large DeleteObjects requests; we 
are treating that as a throttle event to retry

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Priority: Major
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-09-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164603#comment-16164603
 ] 

Steve Loughran commented on HADOOP-13811:
-

I should add, "com.amazonaws.AmazonClientException: Failed to sanitize XML 
document destined for handler class " is a tricky one to decide what to do. On 
idempotent calls, its retriable, On non-idempotent calls, given that this is 
the response being parsed. not.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-09-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164598#comment-16164598
 ] 

Steve Loughran commented on HADOOP-13811:
-

translateException() will consider AbortedException to be an interriuption in 
HADOOP-14531; convert to InterruptedIOE & so retry logic will not attempt to 
retry on it. so: no recovery, but hopefully better reporting

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-03-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932838#comment-15932838
 ] 

Steve Loughran commented on HADOOP-13811:
-

Looking at the AWS SDK, 

1. {{AbortedException}} is only ever raised on a thread interrupt; it could be 
translated
2. that log that [~fabbri] saw, "Unable to close response InputStream ..." is 
just a log @ error of the exception raised when the XML parser closes the input 
stream: it's not the actual point where something was thrown, but just the 
errors in the close() call. The stuff we'd log @ debug in our own code.

I propose translateException has a special handler for an aborted exception at 
the base of the call chain; if thrown raises in InterrupteIOE. Or we actually 
set the interrupted bit on the thread again? That'd be purer, but more of a 
change in the system operation, potentially



> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-03-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932806#comment-15932806
 ] 

Steve Loughran commented on HADOOP-13811:
-

+stack trace of spark dataframes, this time on Hadoop 2.8.0 RC3. This is same 
situation as before: error rising during stream interrupt/teardown, where I 
think an interrupted exception is being converted to an abort
{code}
2017-03-20 15:09:31,440 [JobGenerator] WARN  dstream.FileInputDStream 
(Logging.scala:logWarning(87)) - Error finding new files
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
spark-cloud/S3AStreamingSuite/streaming/streaming/: 
com.amazonaws.AmazonClientException: Failed to sanitize XML document destined 
for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler:
 Failed to sanitize XML document destined for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1638)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1393)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1369)
at org.apache.hadoop.fs.Globber.listStatus(Globber.java:76)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:234)
at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1704)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:2030)
at 
org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:205)
at 
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:149)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
at scala.Option.orElse(Option.scala:289)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
at 
org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:36)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
at scala.Option.orElse(Option.scala:289)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
at 
org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:36)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
at 
org.apache.spark.s

[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-03-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925120#comment-15925120
 ] 

Steve Loughran commented on HADOOP-13811:
-

Looks like its how the XML parser in the SDK handles an EOF. For now, document 
in s3a troubleshooting. 


it shows there's nothing in terms of recovery going on in some of the ops. We 
could consider doing some retries, but we'd have to be clear about what we 
would try to retry (read ops), and not (write ops). You can't even say delete 
is idempotent once you start allowing for concurrent callers

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Aaron Fabbri
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-03-14 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925084#comment-15925084
 ] 

Aaron Fabbri commented on HADOOP-13811:
---

Adding a related stack trace in case it helps.  This was not on trunk but on a 
CDH build with the latest s3a / s3guard code pulled in.  I was playing with a 
concurrent rename stress test.  Looks like I just lost my TCP connection?

{noformat}
2017-03-14 14:37:01,036 [pool-2-thread-114] ERROR 
transform.XmlResponsesSaxParser 
(XmlResponsesSaxParser.java:sanitizeXmlDocument(210)) - Unable to close 
response InputStream after failure sanitizing XML document
java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:191)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.read(InputRecord.java:480)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:944)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:901)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
at 
com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at 
com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at 
com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at 
com.amazonaws.thirdparty.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:266)
at 
com.amazonaws.thirdparty.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at 
com.amazonaws.thirdparty.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at 
com.amazonaws.thirdparty.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:215)
at 
com.amazonaws.thirdparty.apache.http.impl.io.ChunkedInputStream.close(ChunkedInputStream.java:316)
at 
com.amazonaws.thirdparty.apache.http.impl.execchain.ResponseEntityProxy.streamClosed(ResponseEntityProxy.java:140)
at 
com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.checkClose(EofSensorInputStream.java:228)
at 
com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.close(EofSensorInputStream.java:174)
at 
com.amazonaws.internal.SdkFilterInputStream.close(SdkFilterInputStream.java:89)
at 
com.amazonaws.event.ProgressInputStream.close(ProgressInputStream.java:211)
at 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:207)
at 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298)
at 
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)
at 
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)
at 
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at 
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at 
com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4126)
at 
com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:845)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1043)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFi

[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-02-21 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876307#comment-15876307
 ] 

Luke Miner commented on HADOOP-13811:
-

Steve,

I'll give the new JARs a shot. I change the setting and it seems to have fixed 
things, albeit at the cost of speed. Here's the new issue: 
https://issues.apache.org/jira/browse/HADOOP-14101

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-02-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874539#comment-15874539
 ] 

Steve Loughran commented on HADOOP-13811:
-

Luke, 

* grab the HDP 2.5 sandbox and take the JARs from there .. they have the input 
stream speedup, and it'd be interesting to see if that's enough to make the 
problems go away
* that new stack trace is different: can you file a new bug against it and 
we'll take a look at how to handle it. Looks like we will need to add some 
handling what is presumagly a race condition. For now, try setting 
fs.s3a.multiobjectdelete.enable to false.

Can I state that I do not recommend using s3 as a direct destination of  work, 
such as {{DataFrame.write()}}. Without list consistency there's a risk of 
written work not being discovered, so copied. S3guard, HADOOP-13345 will fix 
that along with an O1 committer, but it's not out yet

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-02-18 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873349#comment-15873349
 ] 

Luke Miner commented on HADOOP-13811:
-

[~steve_l] As an update, we're starting to see another error that happens 
randomly but at increasingly frequent intervals. Happens at the end of a job, I 
think when spark is trying to delete temp files. Seems like a race condition:

{code}
Exception in thread "main" 
com.amazonaws.services.s3.model.MultiObjectDeleteException: Status Code: 0, AWS 
Service: null, AWS Request ID: null, AWS Error Code: null, AWS Error Message: 
One or more objects could not be deleted, S3 Extended Request ID: null
at 
com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:1745)
at org.apache.hadoop.fs.s3a.S3AFileSystem.delete(S3AFileSystem.java:674)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:90)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
at Json2Pq$.main(json2pq.scala:179)
at Json2Pq.main(json2pq.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-12-08 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732908#comment-15732908
 ] 

Luke Miner commented on HADOOP-13811:
-

Sorry to be a pain about this. Would it be possible for you to share a prebuilt 
version with me? I'd love to get this fixed!

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-12-02 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716636#comment-15716636
 ] 

Luke Miner commented on HADOOP-13811:
-

I double checked and {{SPARK_HOME}} is unset and there doesn't appear to be any 
other {{spark-submit}} on the {{PATH}}. I'm stumped. Is there a prebuilt 
distribution I can get my hands on?

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-12-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714708#comment-15714708
 ] 

Steve Loughran commented on HADOOP-13811:
-

I don't see anything wrong with the build. do a quick git pull to make sure all 
of the hadoop code is up to date, though I'm not confident here, just from the 
lines where the stack is coming from.

one thing to consider: which process is being run here. That is, is there some 
other SPARK_HOME/bin being executed?. Make sure that {{SPARK_HOME}} is unset, 
that there aren't other copies of spark-submit on the {{PATH}}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-12-01 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713133#comment-15713133
 ] 

Luke Miner commented on HADOOP-13811:
-

So I printed out the classpath and it points to the snapshot build 
{{jar:file:/foo/hadoop-aws-2.9.0-SNAPSHOT.jar!/org/apache/hadoop/fs/s3a/S3AFileSystem.class}}

Could an earlier version of hadoop somehow crept in?

I built it from the PR that you'd indicated earlier: 
https://github.com/apache/spark/pull/12004

I used the following command on my Mac {{dev/make-distribution.sh 
-Pyarn,hadoop-2.7,hive,cloud -Pmesos -Dhadoop.version=2.9.0-SNAPSHOT}}

Is the problem with the {{hadoop-2.7}} bit?



> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-12-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711975#comment-15711975
 ] 

Steve Loughran commented on HADOOP-13811:
-

I don't see it either. Why not try logging the URL returned by 
{{this.getClass.getClassloader.getResource("/org/apache/hadoop/fs/s3a/S3AFileSystem.class")}}


FWIW, Hadoop 2.8 has a built in entry point, org.apache.hadoop.util.FindClass, 
whose sole purpose is to track down where things are coming from, and to assert 
that resources/classes are on the CP.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-30 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709517#comment-15709517
 ] 

Luke Miner commented on HADOOP-13811:
-

I've been looking through the classpath but can't find any obvious culprits. 
All I can think is that there is something wrong with my spark build. Is there 
anywhere that I could get a prebuilt version of spark with hadoop, and 
hadoop-aws? I've included the classpath returned by spark submit in case 
anything jumps out to you.

{code}
System properties:
spark.hadoop.parquet.block.size -> 2147483648
spark.hadoop.fs.s3a.impl -> org.apache.hadoop.fs.s3a.S3AFileSystem
spark.local.dir -> /raid0/spark
spark.mesos.coarse -> false
spark.hadoop.parquet.enable.summary-metadata -> false
spark.hadoop.fs.s3a.access.key -> 
spark.network.timeout -> 600
spark.executor.memory -> 16G
spark.hadoop.fs.s3n.multipart.uploads.enabled -> true
spark.rpc.message.maxSize -> 500
SPARK_SUBMIT -> true
spark.hadoop.fs.s3a.secret.key -> 
spark.jars.packages -> 
com.databricks:spark-avro_2.11:3.0.1,com.amazonaws:aws-java-sdk:1.11.60
spark.mesos.constraints -> priority:1
spark.task.cpus -> 1
spark.executor.extraJavaOptions -> -XX:+UseG1GC -XX:MaxPermSize=1G 
-XX:+HeapDumpOnOutOfMemoryError
spark.speculation -> false
spark.app.name -> Json2Pq
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version -> 2
spark.jars -> 
file:/tmp/foo/hadoop-aws-2.9.0-SNAPSHOT.jar,file:/data/foo/.ivy2/jars/com.databricks_spark-avro_2.11-3.0.1.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-1.11.60.jar,file:/data/foo/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar,file:/data/foo/.ivy2/jars/org.apache.avro_avro-1.7.6.jar,file:/data/foo/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar,file:/data/foo/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar,file:/data/foo/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar,file:/data/foo/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar,file:/data/foo/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar,file:/data/foo/.ivy2/jars/org.tukaani_xz-1.0.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-support-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-simpledb-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-servicecatalog-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-servermigration-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-simpleworkflow-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-storagegateway-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-route53-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-s3-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-importexport-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-sts-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-sqs-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-rds-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-redshift-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-elasticbeanstalk-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-glacier-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-iam-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-datapipeline-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-elasticloadbalancing-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-elasticloadbalancingv2-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-emr-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-elasticache-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-elastictranscoder-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-ec2-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-dynamodb-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-sns-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-budgets-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-cloudtrail-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-cloudwatch-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-logs-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-events-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-cognitoidentity-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-cognitosync-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-directconnect-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-cloudformation-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-cloudfront-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-kinesis-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazonaws_aws-java-sdk-opsworks-1.11.60.jar,file:/data/foo/.ivy2/jars/com.amazon

[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708386#comment-15708386
 ] 

Steve Loughran commented on HADOOP-13811:
-

That message telling you off for an s3 is a quirk of how services are loaded; 
ignore. We have pulled that from 2.8, HADOOP-13323.

What it does do is warn me: you still have Hadoop 2.7.x on the classpath.

So does the stack trace, again, it's a 2.7.x stack

{code}
maxKeys = conf.getInt(MAX_PAGING_KEYS, DEFAULT_MAX_PAGING_KEYS);
partSize = conf.getLong(MULTIPART_SIZE, DEFAULT_MULTIPART_SIZE);
multiPartThreshold = conf.getInt(MIN_MULTIPART_THRESHOLD,
  DEFAULT_MIN_MULTIPART_THRESHOLD);
{code}

By changing the property you've moved down one more line, but as the old code 
is still on your CP, you aren't getting anywhere. Assume all these errors are 
classpath related, fix them and then maybe there's a chance of the patched code 
being picked up.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-29 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706266#comment-15706266
 ] 

Luke Miner commented on HADOOP-13811:
-

Turns out I still had {{hadoop-aws:2.7.3}} in my spark conf file. I ended up 
including the {{hadoop-aws-2.9.0-SNAPSHOT.jar}} that I built using the 
instructions you gave above. I also bumped up amazon's {{aws-java-sdk}} to 
{{1.11.57}}. I'm still seeing the same error, only on a different line number 
now. Oddly, it also seems to be telling me that I should be using 
{{S3AFileSystem}} instead of {{S3FileSystem}}.

{code}
S3FileSystem is deprecated and will be removed in future releases. Use 
NativeS3FileSystem or S3AFileSystem instead.
16/11/29 19:00:19 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 177.7 KB, free 366.1 MB)
16/11/29 19:00:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 21.0 KB, free 366.1 MB)
16/11/29 19:00:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
10.0.229.45:52703 (size: 21.0 KB, free: 366.3 MB)
16/11/29 19:00:19 INFO SparkContext: Created broadcast 0 from textFile at 
json2pq.scala:130
Exception in thread "main" java.lang.NumberFormatException: For input string: 
"100M"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1320)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:234)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2904)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:101)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2941)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2923)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:265)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1957)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:928)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.collect(RDD.scala:927)
at Json2Pq$.main(json2pq.scala:130)
at Json2Pq.main(json2pq.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

I did also try to run it using {{hadoop-aws-3.0.0-alpha1.jar}} that is 
currently on central, but got this error instead. Perhaps because I'm running 
off a hadoop 2.9 snapshot.

{code}
Exception in thread "main" java.lang.IllegalArgumentException: Error while 
instantiating 'org.apache.spark.sql.internal.SessionState':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:965)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at 
org.apa

[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705056#comment-15705056
 ] 

Steve Loughran commented on HADOOP-13811:
-

you are getting an exception on line 248 of S3a, where it's calling {{partSize 
= conf.getLong(MULTIPART_SIZE, DEFAULT_MULTIPART_SIZE)}}.

Now, that line only exists on branch-2.7.x; it's not on the branch-2 line, 
where we use getLongBytes() to support byte unit measurement s (M, K, P, T, G). 
Which means that somehow you are getting to 2.7 binaries in your jobs.

You could make that error move off that line by changing the option 
{{fs.s3a.multipart.size}} to {{104857600}}. But you've still got that mix of 
binaries there, and that means your problem isn't going to go away.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-28 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15703276#comment-15703276
 ] 

Luke Miner commented on HADOOP-13811:
-

That worked great. It is running!

However, there's a new error, a NumberFormatException that I was not getting 
before:

{code}
Exception in thread "main" java.lang.NumberFormatException: For input string: 
"100M"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1320)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:248)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2904)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:101)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2941)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2923)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:265)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:236)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:322)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1957)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:928)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.collect(RDD.scala:927)
at Json2Pq$.main(json2pq.scala:130)
at Json2Pq.main(json2pq.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702045#comment-15702045
 ] 

Steve Loughran commented on HADOOP-13811:
-

thoughts. If you are trying to do a build of all the snapshots downstream, then 
yes, everything needs to be installed. Maybe it's the make distribution script 
that is missing things there. I use that for a distribution, but build for a 
local install with

mvn install -DskipTests -Pyarn,hive,hadoop-2.7,cloud 
-Dhadoop.version=2.9.0-SNAPSHOT 

That doesn't build the spark distribution; I tend to do both separately. And 
yes, it does take time. I generally kick them off in the morning after the 
hadoop build, then go and have coffee and catch up on emails. Pro tip: on OSX 
you can add {{; say moo}} and have your laptop make a noise when the build is 
finished

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701991#comment-15701991
 ] 

Steve Loughran commented on HADOOP-13811:
-

oops. I think you might need the sql module too. But I'm surprised that park 
code didn't make it in. 

Now, we have just stripped some bits of jackson from Hadoop branch 2 (all of 
jackson 1.9). I wonder if spark is expecting it.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-21 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15684576#comment-15684576
 ] 

Luke Miner commented on HADOOP-13811:
-

Okay built off the PR [https://github.com/apache/spark/pull/12004], with the 
following command

{code}
dev/make-distribution.sh -Pyarn,hadoop-2.7,hive,cloud 
-Dhadoop.version=2.9.0-SNAPSHOT -Pmesos
{code}

When I try to build my application against this build, I'm now missing a bunch 
of dependencies:

{code}
org.apache.hadoop
org.apache.spark.sql.types
org.json4s.jackson
com.fasterxml
org.apache.spark.sql.catalyst.analysis
{code}

Here's my build.sbt. I'm using sbt-assembly

{code}
name := "json2pq"

version := "1.2.1"

scalaVersion := "2.11.8"

resolvers += Resolver.mavenLocal

// spark
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0-SNAPSHOT"  % 
"provided" excludeAll (
  ExclusionRule("org.slf4j", "slf4j-api")
)
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0-SNAPSHOT" % 
"provided"

// other libraries
libraryDependencies += "org.json4s" %% "json4s-native" % "3.5.0"
libraryDependencies += "org.rogach" %% "scallop" % "2.0.5"
libraryDependencies += "com.github.nscala-time" %% "nscala-time" % "2.14.0"
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.5.0"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.7"

// test
libraryDependencies += "org.scalatest" % "scalatest_2.11" % "2.2.6" % "test"
libraryDependencies += "junit" % "junit" % "4.11" % "test"
libraryDependencies += "com.novocode" % "junit-interface" % "0.11" % "test"

assemblyOption in assembly := (assemblyOption in 
assembly).value.copy(includeScala = false)
net.virtualvoid.sbt.graph.Plugin.graphSettings
{code}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15683858#comment-15683858
 ] 

Steve Loughran commented on HADOOP-13811:
-

you'll probaby need a -Pmesos on the command line too

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-20 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15681899#comment-15681899
 ] 

Luke Miner commented on HADOOP-13811:
-

I built a snapshot of spark 2.1.0 with a build of hadoop 2.8.0. Specifically, I 
built with the following command:

{code}
dev/make-distribution.sh -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.8.0-SNAPSHOT
{code}

However, I'm getting the following error. Do I need to do some special build to 
get this to work with mesos?

{code}
16/11/20 21:48:24 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: 
'mesos://foo:5050/mesos'
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2549)
at org.apache.spark.SparkContext.(SparkContext.scala:505)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2312)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:852)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:844)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:844)
at Json2Pq$.main(json2pq.scala:126)
at Json2Pq.main(json2pq.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15679171#comment-15679171
 ] 

Steve Loughran commented on HADOOP-13811:
-

Looking at the stack trace of mine, it's a thread interrupt -> stream abort -> 
parser failure.

Assuming it is a network problem surfacing as errors in the XML parser (i.e. 
the input stream being closed prematurely), this *may* go away on a retry. But 
if a thread has been interrupted, it should really stay interrupted, rather 
than trigger a retry. That is: error handler to walk the exception tree and 
look for an {{AbortedException}} at the bottom, and upconvert to an 
InterruptedIOException().

I think that's separate from Luke's cause though.

Luke, are you in a position to build hadoop & Spark yourself? Or see if you can 
replicate this on the HDP sandbox, which contains much of the S3a phase II 
work, including the ignoring of exceptions in fake directory cleanup

http://hortonworks.com/products/sandbox/

Otherwise, I can build up a snapshot of spark trunk with Hadoop 2.8-snapshot; 
I'll share this with you and we can see if we can identify what's happening, or 
at least ignore it better when it's non-critical




> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15679156#comment-15679156
 ] 

Steve Loughran commented on HADOOP-13811:
-

I've seen this too. I was doing other things on the network at the time; I 
suspect this may jus be a network failure manifesting as an XML parser error.

{code}
2016-11-18 19:38:28,167 [JobGenerator] WARN  dstream.FileInputDStream 
(Logging.scala:logWarning(87)) - Error finding new files
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
spark-cloud/S3AStreamingSuite/streaming/streaming/: 
com.amazonaws.AmazonClientException: Failed to sanitize XML document destined 
for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler:
 Failed to sanitize XML document destined for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:116)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1584)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1339)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1315)
at org.apache.hadoop.fs.Globber.listStatus(Globber.java:76)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:234)
at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1770)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:1975)
at 
org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:205)
at 
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:149)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
at scala.Option.orElse(Option.scala:289)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
at 
org.apache.spark.streaming.dstream.MappedDStream.compute(MappedDStream.scala:36)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
at scala.Option.orElse(Option.scala:289)
at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
at 
org.apache.spark.streaming.dstream.FilteredDStream.compute(FilteredDStream.scala:36)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.

[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-15 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668288#comment-15668288
 ] 

Luke Miner commented on HADOOP-13811:
-

[~steve_l] anything you'd like me to do to try to diagnose the problem? I get 
this error every time I run this job.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-13 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15661864#comment-15661864
 ] 

Luke Miner commented on HADOOP-13811:
-

Yeah, this happens every time I run this job.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15660084#comment-15660084
 ] 

Steve Loughran commented on HADOOP-13811:
-

Luke: your stack is actually slightly different; XML parser is failing as the 
document doesn't close the opening XML tag. Unless s3 really is generating 
incomplete XML, causes are likely to be: connection close, httpclient reporting 
as end of doc, SAX parser failing. Not impossible, as it's only with a valid 
Content-Length header that an incomplete end of an HTTP 1.0 request surfaces 
(good horror story there from the distant past, 
[http://people.apache.org/~stevel/slides/when_web_services_go_bad.pdf]. But 
HTTP1.1 is stricter, I think.

Same error surfaced in minecraft once https://bugs.mojang.com/browse/MCL-1542

Luke: is this happening repeatedly?

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15660070#comment-15660070
 ] 

Steve Loughran commented on HADOOP-13811:
-

Luke's stack
{code}
   org.apache.spark.SparkException: Task failed while writing rows
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to commit task
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1345)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
... 8 more
Suppressed: java.lang.NullPointerException
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:147)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetFileFormat.scala:569)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$abortTask$1(WriterContainer.scala:282)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$2.apply$mcV$sp(WriterContainer.scala:258)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1354)
... 9 more
Caused by: com.amazonaws.AmazonClientException: Unable to unmarshall 
response (Failed to parse XML document with handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler).
 Response Code: 200, Response Text: OK
at 
com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3480)
at 
com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:604)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:962)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:1147)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:1136)
at 
org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:142)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at 
org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:400)
at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:117)
at 
org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetFileFormat.scala:569)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources

[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-11 Thread Luke Miner (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657802#comment-15657802
 ] 

Luke Miner commented on HADOOP-13811:
-

Per [SPARK-18402] I am getting this stack trace. Would be happy to test out 
Hadoop 2.8 and report back results.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657779#comment-15657779
 ] 

Steve Loughran commented on HADOOP-13811:
-

I think and SPARK-18402 are a symptom of the connection breaking, and the XML 
parser not handling it/reporting the issue very well.

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657744#comment-15657744
 ] 

Steve Loughran commented on HADOOP-13811:
-

I think what's happening here is that one thread is completing/shutting stuff 
down, but spark streaming is still scanning for new files. The thread's been 
interrupted on the read, which raises an AbortedException() (which extends 
AmazonClientException. Your XML parser catches it, but then throws a new 
AmazonClientException, rather than just rethrow the one that came in

[https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/transform/XmlResponsesSaxParser.java#L177]

This isn't generating very meaningful messages, not when "thread interrupted" 
is what we'd like to see

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2016-11-11 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15657741#comment-15657741
 ] 

Steve Loughran commented on HADOOP-13811:
-

This showed up during a test run against s3 ireland in the SPARK-7481 s3a 
integration tests.
{code}
2016-08-26 21:27:11,382 INFO  scheduler.JobScheduler 
(Logging.scala:logInfo(54)) - Finished job streaming job 1472243229000 ms.0 
from job set of time 1472243229000 ms
2016-08-26 21:27:11,382 INFO  scheduler.JobScheduler 
(Logging.scala:logInfo(54)) - Total delay: 2.382 s for time 1472243229000 ms 
(execution: 0.000 s)
2016-08-26 21:27:11,923 WARN  dstream.FileInputDStream 
(Logging.scala:logWarning(87)) - Error finding new files under 
s3a://hwdev-steve-ireland-new/test/testname/streaming/sub*
org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
test/testname/streaming/: com.amazonaws.AmazonClientException: Failed to 
sanitize XML document destined for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler:
 Failed to sanitize XML document destined for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:105)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1462)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1227)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1203)
at org.apache.hadoop.fs.s3a.S3AGlobber.listStatus(S3AGlobber.java:69)
at org.apache.hadoop.fs.s3a.S3AGlobber.doGlob(S3AGlobber.java:210)
at org.apache.hadoop.fs.s3a.S3AGlobber.glob(S3AGlobber.java:125)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:1853)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:1841)
...
at 
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:116)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:248)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:246)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:246)
at 
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
at 
org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Caused by: com.amazonaws.AmazonClientException: Failed to sanitize XML document 
destined for handler class 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
at 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:222)
at 
com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:299)
at 
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:77)
at 
com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:74)
at 
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at 
com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at 
com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:1072)
at 
com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:746)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3738)
at 
com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:653)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:881)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1435)
... 71 more
Caused by: com.amazonaws.AbortedException: 
at 
com.amazonaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:51)
at 
com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at 
com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
at sun.nio.cs.StreamDecod