[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2021-01-21 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269232#comment-17269232
 ] 

Steve Loughran commented on HADOOP-15171:
-

...if the problem is in hive's use of the class

* do we document the constraints?
* are they different for zlib compared to the rest?
* can we detect this in the codec and so at least print a warning for hive and 
other apps

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2021-01-21 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269230#comment-17269230
 ] 

Steve Loughran commented on HADOOP-15171:
-

so the problem here is that ORC is recycling the same decompressor as it seeks 
around a file and the zlib library doesn't allow this?

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2020-09-06 Thread Michael South (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191374#comment-17191374
 ] 

Michael South commented on HADOOP-15171:


Apology: A raw "closed, unfounded" is too brusque. [~sershe] did a really 
excellent job analyzing the bug, creating a minimal testcase, and identifying 
where the issue is. I can't imagine how many hours he spent plowing through the 
Orc and Zlib codebases and rerunning tests.

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2020-09-06 Thread Michael South (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191286#comment-17191286
 ] 

Michael South commented on HADOOP-15171:


Issue should be closed, unfounded. The Hive Orc driver creates a decompression 
object and repeatedly calling it to deflate Orc blocks. Its treating each block 
as an entirely separate chunk (stream), completely decompressing each with one 
call to ...{{_inflateBytesDirect()}}. However, it wasn't calling 
{{inflateReset()}} or {{inflateEnd()}} / {{inflateInit()}} between the streams, 
which naturally left things in a confused state. It appears to be fixed in 
trunk Hive.

Also, returning 0 for {{Z_BUF_ERROR}} or {{Z_NEED_DICT}} is correct, and should 
not throw an error. The Java decompression object is agnostic as to whether the 
application is working in stream or all-at-once mode. The only determination of 
which mode is active is whether the application (Hive Orc driver in this case) 
is passing the entire input in one chunk and is allocating sufficient space for 
all of the output. Therefore, the application must check for a zero return. If 
no-progress (zero return) is an impossible situation then it can throw an 
exception; otherwise it needs to look at one or more of ...{{_finished()}}, 
...{{_getRemaining()}}, and/or ...{{_needDict()}} to figure out what's needed 
to make further progress. (It would be nice if JNI exposed the {{avail_out}} 
field, but if it's not an input or dictionary issue it must be a full output 
buffer.)

There *is* a very minor bug in ...{{inflateBytesDirect()}}. It's calling 
{{inflate()}} with {{Z_PARTIAL_FLUSH}}, which only applies to {{deflate()}}. It 
should be {{Z_NO_FLUSH}}. However, in the current zlib code (1.2.11) the 
{{flush}} parameter only affects the return code, and it only checks whether or 
not it is {{Z_FINISH}}.

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2018-02-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351016#comment-16351016
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Update: turns out, end() was a red herring after all - any reuse of the same 
object without calling reset causes the issue.
Given that the object does not support the zlib library model of repeatedly 
calling inflate with more data, it basically never makes sense to call 
decompress without calling reset.
Perhaps the call should be built in? I cannot find whether zlib itself actually 
requires one to reset (at least, for the continuous decompression case, it 
doesn't look like it's the case), so perhaps cleanup could be improved too.
At any rate, error handling should be fixed to not return 0.

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org