Re: Bad Digest error while doing aws s3 put
> On 9 Feb 2016, at 07:19, lmkwrote: > > Hi Dhimant, > As I had indicated in my next mail, my problem was due to disk getting full > with log messages (these were dumped into the slaves) and did not have > anything to do with the content pushed into s3. So, looks like this error > message is very generic and is thrown for various reasons. You may probably > have to do some more research to find out the cause of your problem.. > Please keep me posted once you fix this issue. Sorry, I could not be of much > help to you.. > > Regards > that's fun. s3n/s3a buffer their output until close() is called, then they do a full upload this breaks every assumption people have about file IO: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html -especially the bits in the code about close() being fast and harmless; now its O(data) and bad news if it fails. If your close() was failing due to lack of HDD space, it means that your tmp dir and log dir were on the same disk/volume, and that ran out of capacity HADOOP-11183 added an output variant which buffers in memory, primarily for faster output to rack-local storage supporting the s3 protocol. This is in ASF Hadoop 2.7, recent HDP and CDH releases. I don't know if it's in amazon EMR, because they have their own closed source EMR client (believed to be a modified ASF one with some special hooks to unstable s3 APIs) Anyway: I would run, not walk, to using s3a on Hadoop 2.7+, as its already better than s3a and getting better with every release - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
Hi Dhimant, As I had indicated in my next mail, my problem was due to disk getting full with log messages (these were dumped into the slaves) and did not have anything to do with the content pushed into s3. So, looks like this error message is very generic and is thrown for various reasons. You may probably have to do some more research to find out the cause of your problem.. Please keep me posted once you fix this issue. Sorry, I could not be of much help to you.. Regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p26174.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
I had similar problems with multi part uploads. In my case the real error was something else which was being masked by this issue https://issues.apache.org/jira/browse/SPARK-6560. In the end this bad digest exception was a side effect and not the original issue. For me it was some library version conflict on EMR. Depending on the size of the output files, you might try to just disable the multipart upload using fs.s3n.multipart.uploads.enabled Cheers, Eugen 2016-02-07 15:05 GMT-08:00 Steve Loughran: > > > On 7 Feb 2016, at 07:57, Dhimant wrote: > > > >at > > > com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:245) > >... 15 more > > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The > > Content-MD5 you specified did not match what we received. (Service: > Amazon > > S3; Status Code: 400; Error Code: BadDigest; Request ID: > 5918216A5901FCC8), > > S3 Extended Request ID: > > > QSxtYln/yXqHYpdr4BWosin/TAFsGlK1FlKfE5PcuJkNrgoblGzTNt74kEhuNcrJCRZ3mXq0oUo= > >at > > > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) > >at > > > com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) > >at > > > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) > >at > > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) > >at > > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3796) > >at > > > com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1482) > >at > > > com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:140) > >... 22 more > > > > This is amazon's own s3 client. nothing in the apache hadoop source tree. > Normally I'd say "use s3a to make s3n problems go away", but I don't know > what that does on Amazon's own EMR libraries > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Bad Digest error while doing aws s3 put
> On 7 Feb 2016, at 07:57, Dhimantwrote: > >at > com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:245) >... 15 more > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The > Content-MD5 you specified did not match what we received. (Service: Amazon > S3; Status Code: 400; Error Code: BadDigest; Request ID: 5918216A5901FCC8), > S3 Extended Request ID: > QSxtYln/yXqHYpdr4BWosin/TAFsGlK1FlKfE5PcuJkNrgoblGzTNt74kEhuNcrJCRZ3mXq0oUo= >at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) >at > com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) >at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) >at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) >at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3796) >at > com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1482) >at > com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:140) >... 22 more > This is amazon's own s3 client. nothing in the apache hadoop source tree. Normally I'd say "use s3a to make s3n problems go away", but I don't know what that does on Amazon's own EMR libraries - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
Hi , I am getting the following error while reading the huge data from S3 and after processing ,writing data to S3 again. Did you find any solution for this ? 16/02/07 07:41:59 WARN scheduler.TaskSetManager: Lost task 144.2 in stage 3.0 (TID 169, ip-172-31-7-26.us-west-2.compute.internal): java.io.IOException: exception in uploadSinglePart at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:248) at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.close(MultipartUploadOutputStream.java:469) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) at org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106) at java.io.FilterOutputStream.close(FilterOutputStream.java:160) at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:109) at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1080) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: exception in putObject at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:149) at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) at com.sun.proxy.$Proxy26.storeFile(Unknown Source) at com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:245) ... 15 more Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received. (Service: Amazon S3; Status Code: 400; Error Code: BadDigest; Request ID: 5918216A5901FCC8), S3 Extended Request ID: QSxtYln/yXqHYpdr4BWosin/TAFsGlK1FlKfE5PcuJkNrgoblGzTNt74kEhuNcrJCRZ3mXq0oUo= at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3796) at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1482) at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:140) ... 22 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p26167.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
This was a completely misleading error message.. The problem was due to a log message getting dumped to the stdout. This was getting accumulated in the workers and hence there was no space left on device after some time. When I re-tested with spark-0.9.1, the saveAsTextFile api threw no space left on device error after writing the same 48 files. On checking the master, it was all ok. But on checking the slaves, the stdout contributed to 99% of the root filesystem. On removing the particular log, it is now working fine in both the versions. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11642.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
Is it possible that the Content-MD5 changes during multipart upload to s3? But even then, it succeeds if I increase the cluster configuration.. For ex. it throws Bad Digest error after writing 48/100 files when the cluster is of 3 m3.2xlarge slaves it throws Bad Digest error after writing 64/100 files when the cluster is of 4 m3.2xlarge slaves it throws Bad Digest error after writing 86/100 files when the cluster is of 5 m3.2xlarge slaves it succeeds writing all the 100 files when the cluster is of 6 m3.2xlarge slaves.. Please clarify. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11421.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
Thanks Patrick. But why am I getting a Bad Digest error when I am saving large amount of data to s3? /Loss was due to org.apache.hadoop.fs.s3.S3Exception org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/spark_test%2Fsmaato_one_day_phase_2%2Fsmaato_2014_05_17%2F_temporary%2F_attempt_201408041624__m_65_165%2Fpart-00065' XML Error Message: ?xml version=1.0 encoding=UTF-8?ErrorCodeBadDigest/CodeMessageThe Content-MD5 you specified did not match what we received./MessageExpectedDigestlb2tDEVSSnRNM4pw6504Bg==/ExpectedDigestCalculatedDigestEL9UDBzFvTwJycA7Ii2KGA==/CalculatedDigestRequestId437F15C89D355081/RequestIdHostIdkJQI+c9edzBmT2Z9sbfAELYT/8R5ezLWeUgeIU37iPsq5KQm/qAXItunZY35wnYx/HostId/Error at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:82)/ As indicated earlier, I use the following command as an alternative to saveAsTextFile: /x.map(x = (NullWritable.get(), new Text(x.toString))).coalesce(100).saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](s3n://dest-dir/) In the above case, it succeeds till it writes some 48 part files out of 100 (but this 48 also is inconsistent) and then starts throwing the above error. The same works well if I increase the capacity of the cluster (say from 3 m3.2xlarge slaves to 6), or reduce the data size. Is there a possibility that the data is getting corrupt when the load increases? Please advice. I am stuck with this problem for the past couple of weeks. Thanks, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p11345.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Bad Digest error while doing aws s3 put
Hi I was using saveAsTextFile earlier. It was working fine. When we migrated to spark-1.0, I started getting the following error: java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 java.net.URLClassLoader$1.run(URLClassLoader.java:366) java.net.URLClassLoader$1.run(URLClassLoader.java:355) Hence I changed my code as follows: x.map(x = (NullWritable.get(), new Text(x.toString))).saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path) After this I am facing this problem when I write very huge data to s3. This also occurs while writing to some partitions only, say while writing to 240 partitions, it might succeed for 156 files and then it will start throwing the Bad Digest Error and then it hangs. Please advise. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p10780.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Bad Digest error while doing aws s3 put
Hi, I am getting the following error while trying save a large dataset to s3 using the saveAsHadoopFile command with apache spark-1.0. org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/spark_test%2Fsmaato_one_day_phase_2%2Fsmaato_2014_05_17%2F_temporary%2F_attempt_201407170658__m_36_276%2Fpart-00036' XML Error Message: ?xml version=1.0 encoding=UTF-8?ErrorCodeBadDigest/CodeMessageThe Content-MD5 you specified did not match what we received./MessageExpectedDigestN808DtNfYiTFzI+i2HxLEw==/ExpectedDigestCalculatedDigest66nS+2C1QqQmmcTeFpXOjw==/CalculatedDigestRequestId4FB3A3D60B187CE7/RequestIdHostIdH2NznP+RvwspekVHBMvgWGYAupKuO5YceSgmiLym6rOajOh5v5GnyM0VkO+dadyG/HostId/Error I have used the same command to write similar content with lesser data to s3 without any problem. When I googled this error message, they say it might be due to md5 checksum mismatch. But will this happen due to load? Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036.html Sent from the Apache Spark User List mailing list archive at Nabble.com.