Ismaël Mejía created BEAM-3681: ---------------------------------- Summary: Amazon S3 write breaks randomly Key: BEAM-3681 URL: https://issues.apache.org/jira/browse/BEAM-3681 Project: Beam Issue Type: Bug Components: sdk-java-extensions Affects Versions: 2.3.0, 2.4.0 Reporter: Ismaël Mejía
When executing a simple write on S3 with the direct runner. It seems to break when trying to write an 'empty' shard to S3. {code:java} Pipeline pipeline = Pipeline.create(options); pipeline .apply("CreateSomeData", Create.of("1", "2", "3")) .apply("WriteToFS", TextIO.write().to(options.getOutput())); pipeline.run();{code} The related exception is: {code:java} Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not well-formed or did not validate against our published schema (Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; S3 Extended Request ID: SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), S3 Extended Request ID: SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE= at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:312) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:206) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297) at org.apache.beam.samples.ingest.amazon.IngestToS3.main(IngestToS3.java:82) Caused by: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not well-formed or did not validate against our published schema (Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; S3 Extended Request ID: SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), S3 Extended Request ID: SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE= at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:563) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642) at org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not well-formed or did not validate against our published schema (Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 402E99C2F602AD09; S3 Extended Request ID: SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE=), S3 Extended Request ID: SDdU8AqW2mfZuG1xcKUSNeHiR0IUKcRCpZ1Wjx7sAor1CdYf8f+0dDIcQpvr3GXgqwsyk5PGWVE= at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272) at com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3065) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.copy(S3FileSystem.java:561) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$copy$4(S3FileSystem.java:495) at org.apache.beam.sdk.io.aws.s3.S3FileSystem.lambda$callTasks$8(S3FileSystem.java:642) at org.apache.beam.sdk.util.MoreFutures.lambda$supplyAsync$0(MoreFutures.java:100) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)