[jira] [Comment Edited] (HADOOP-15822) zstd compressor can fail with a small output buffer

2018-10-08 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642000#comment-16642000
 ] 

Peter Bacsko edited comment on HADOOP-15822 at 10/8/18 3:28 PM:


I reproduced the problem. This is what happens if the sort buffer is 2047MiB.

{noformat}
...
2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling 
map output
2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart 
= 1267927860; bufend = 2082571562; bufvoid = 2146435072
2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 
316981960(1267927840); kvend = 91355880(365423520); length = 225626081/134152192
2018-10-08 08:15:04,126 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 
-1997752227 kvi 37170708(148682832)
2018-10-08 08:16:24,712 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: 
Finished spill 20
2018-10-08 08:16:24,712 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) 
equator -1997752227 kv 37170708(148682832) kvi 37170708(148682832)
2018-10-08 08:16:24,713 INFO [main] org.apache.hadoop.mapred.MapTask: Starting 
flush of map output
2018-10-08 08:16:24,713 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) 
equator -1997752227 kv 37170708(148682832) kvi 37170708(148682832)
2018-10-08 08:16:24,727 INFO [main] org.apache.hadoop.mapred.Merger: Merging 21 
sorted segments
2018-10-08 08:16:24,735 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,736 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,738 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,739 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,741 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,742 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,743 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,744 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,745 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,746 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,748 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,749 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,750 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,752 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,753 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,754 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,755 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,756 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,757 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,769 INFO [main] org.apache.hadoop.io.compress.CodecPool: 
Got brand-new decompressor [.zst]
2018-10-08 08:16:24,770 INFO [main] org.apache.hadoop.mapred.Merger: Down to 
the last merge-pass, with 21 segments left of total size: 35310116 bytes
2018-10-08 08:16:30,104 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1469)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1365)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
at org.apache.hadoop.io.Text.write(Text.java:330)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1163)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:727)
at 

[jira] [Comment Edited] (HADOOP-15822) zstd compressor can fail with a small output buffer

2018-10-05 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640338#comment-16640338
 ] 

Peter Bacsko edited comment on HADOOP-15822 at 10/5/18 8:58 PM:


[~jlowe] what a strange coincidence. I was also testing zstandard today and set 
{{mapreduce.task.io.sort.mb}} to 2047, which is the max I guess. Now the mapper 
was running on a 10GiB zstd compressed text file and then failed. The 
{{equator}} became a negative number and {{collect()}} threw 
{{ArrayIndexOutOfBoundsException}}. Mapper output compression was also enabled, 
probably that's what really matters here. It failed after like 40 minutes.

I'm not sure whether it's zstd or not, because I haven't had the time to try it 
with other codecs, but it's something worth keeping in mind.


was (Author: pbacsko):
[~jlowe] what strange coincidence. I was also testing zstandard today and set 
{{mapreduce.task.io.sort.mb}} to 2047, which is the max I guess. Now the mapper 
was running on a 10GiB zstd compressed text file and then failed. The 
{{equator}} became a negative number and {{collect()}} threw 
{{ArrayIndexOutOfBoundsException}}. Mapper output compression was also enabled, 
probably that's what really matters here. It failed after like 40 minutes.

I'm not sure whether it's zstd or not, because I haven't had the time to try it 
with other codecs, but it's something worth keeping in mind.

> zstd compressor can fail with a small output buffer
> ---
>
> Key: HADOOP-15822
> URL: https://issues.apache.org/jira/browse/HADOOP-15822
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: HADOOP-15822.001.patch, HADOOP-15822.002.patch
>
>
> TestZStandardCompressorDecompressor fails a couple of tests on my machine 
> with the latest zstd library (1.3.5).  Compression can fail to successfully 
> finalize the stream when a small output buffer is used resulting in a failed 
> to init error, and decompression with a direct buffer can fail with an 
> invalid src size error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org