[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466111#comment-17466111 ] Akira Ajisaka commented on HDFS-14099: -- Committed to trunk, branch-3.3, and branch-3.2. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > Time Spent: 1h > Remaining Estimate: 0h > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); >
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17466105#comment-17466105 ] Akira Ajisaka commented on HDFS-14099: -- [~groot] rebased the patch and I merged this. Thank you [~xuzq_zander] and [~groot] for your contribution! Sorry I should have moved this project to HADOOP instead of HDFS. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Labels: pull-request-available > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > Time Spent: 1h > Remaining Estimate: 0h > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof =
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464291#comment-17464291 ] Akira Ajisaka commented on HDFS-14099: -- After our internal testing, we found HDFS-14099 is also required as well as HADOOP-17096. The 003 patch looks good to me. However, after applying the patch to trunk, the compile fails in the test code due to the version up in commons-io. Hi [~xuzq_zander], would you rebase to the latest trunk? Note that now it's recommended to create a PR in GitHub rather than attaching the patch to JIRA. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369411#comment-17369411 ] Chenren Shao commented on HDFS-14099: - I have confirmed that this issue has been resolved. Thanks, both! > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks like it's a concatenated stream: reset low-level
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368435#comment-17368435 ] Chenren Shao commented on HDFS-14099: - Taking a deeper look at HADOOP-17096 and found this fix only affects compression. I am not sure how it could impact decompression issue that I encounter here. [~xuzq_zander] when you did your test, which patch did you use: [https://patch-diff.githubusercontent.com/raw/apache/hadoop/pull/441.patch] or [^HDFS-14099-trunk-003.patch] ? In my previous test, we used the latter and still got the error of ``` {{java.lang.InternalError: Unknown frame descriptor at org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native Method) at org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111)}} {{```}} > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368231#comment-17368231 ] Chenren Shao commented on HDFS-14099: - Thank you, [~xuzq_zander] and [~weichiu]. As you suspected, I tried the patch on the top of 3.2.1, so it is very likely that HADOOP-17096 was the issue. I will apply the patch for HADOOP-17096 and try again. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); >
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368103#comment-17368103 ] Wei-Chiu Chuang commented on HDFS-14099: It might have been fixed by HADOOP-17096. can you apply the patch attached there or use 3.2.2, 3.3.1 instead? > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368088#comment-17368088 ] xuzq commented on HDFS-14099: - Thanks [~cshao239]. I test the output.zst by `hadoop fs -text output.zst` and it can be successfully decompressed, and the content like "key": "value1496976". What problem happen when apply? > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); >
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360243#comment-17360243 ] Hadoop QA commented on HDFS-14099: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 1s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 54s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 22m 48s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 21s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 21m 53s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/619/artifact/out/diff-compile-javac-root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color} | {color:red} root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 2 new + 1998 unchanged - 0 fixed = 2000 total (was 1998) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 8s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 19m 8s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/619/artifact/out/diff-compile-javac-root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color} | {color:red} root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 2 new + 1873 unchanged - 0 fixed = 1875 total (was 1873) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360122#comment-17360122 ] Chenren Shao commented on HDFS-14099: - Hi, all. I found that hadoop cannot process multi-frame files and we applied this patch and still was not able to process it. The error message is the same as the one posted here. I will try to attach the problematic file here and we can reproduce the issue by reading it via spark. This file was created by essentially running `cat file1.zst file2.zst > output.zst`. You can run `zstd -d output.zst` to decompress it without any issue, but spark.read will cause problem. Spark read of file1.zst and file2.zst doesn't have problem. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining =
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918438#comment-16918438 ] Hadoop QA commented on HDFS-14099: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 13s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14099 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978855/HDFS-14099-trunk-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1f46bca5fa32 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 371c9eb | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27715/testReport/ | | Max. process+thread count | 1344 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27715/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor >
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918354#comment-16918354 ] xuzq commented on HDFS-14099: - fix some check style.[^HDFS-14099-trunk-003.patch] > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch, HDFS-14099-trunk-002.patch, > HDFS-14099-trunk-003.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks like it's a concatenated stream: reset low-level zlib (or > // other engine)
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918346#comment-16918346 ] Hadoop QA commented on HDFS-14099: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 15s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 2 new + 1 unchanged - 0 fixed = 3 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 38s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14099 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12978845/HDFS-14099-trunk-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f52e1cd89c10 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 872cdf4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/27714/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/27714/testReport/ | | Max. process+thread count | 1528 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27714/console | | Powered by | Apache Yetus 0.8.0
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903502#comment-16903502 ] xuzq commented on HDFS-14099: - [~aajisaka] [~jlowe] [~churromorales] have a look? thanks > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks like it's a concatenated stream: reset low-level zlib (or > // other engine) and buffers, then "resend" remaining input data >
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902721#comment-16902721 ] Hadoop QA commented on HDFS-14099: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 49s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 18s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.util.TestDiskChecker | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-441/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/441 | | JIRA Issue | HDFS-14099 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4878a7771852 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 00b5a27 | | Default Java | 1.8.0_212 | | checkstyle |
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900172#comment-16900172 ] Wei-Chiu Chuang commented on HDFS-14099: I am not sure I can do a quality review at this part of code. [~jlowe] and [~churromorales] you've worked on this before. Care to give a review? Thank you. > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks like it's a
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900035#comment-16900035 ] xuzq commented on HDFS-14099: - [~jojochuang] [~templedf] [~cxorm] I updated a new Patch. Do you have time to look at it? Thanks > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.3 > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-14099-trunk-001.patch > > > We need to use the ZSTD compression algorithm in Hadoop. So I write a simple > demo like this for testing. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > And I use "*hadoop fs -text*" to read this file and failed. The error as > blow. > {code:java} > Exception in thread "main" java.lang.InternalError: Unknown frame descriptor > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.inflateBytesDirect(Native > Method) > at > org.apache.hadoop.io.compress.zstd.ZStandardDecompressor.decompress(ZStandardDecompressor.java:181) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:111) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105) > at java.io.InputStream.read(InputStream.java:101) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:127) > at org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:101) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then found this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same *Frame*. > The first is in *ZStandardDecompressor.c.* > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > there is some datas (a new frame) in *CompressedBuffer* or *UserBuffer* need > to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a *Frame*. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get ready for next > // concatenated substream/"member". > int nRemaining = decompressor.getRemaining(); > if (nRemaining == 0) { > int m = getCompressedData(); > if (m == -1) { > // apparently the previous end-of-stream was also end-of-file: > // return success, as if we had never called getCompressedData() > eof = true; > return -1; > } > decompressor.reset(); > decompressor.setInput(buffer, 0, m); > lastBytesSent = m; > } else { > // looks like it's a concatenated stream: reset low-level zlib (or >
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898546#comment-16898546 ] Hadoop QA commented on HDFS-14099: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 59s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 36s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-441/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/441 | | JIRA Issue | HDFS-14099 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ebfaa8d64fac 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / c2d00c8 | | Default Java | 1.8.0_212 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-441/3/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | |
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894008#comment-16894008 ] Hadoop QA commented on HDFS-14099: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 6s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 28s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 48s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 27s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 96m 22s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-441/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/441 | | JIRA Issue | HDFS-14099 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 059dd185f2bb 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / c7c7a88 | | Default Java | 1.8.0_212 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-441/2/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889011#comment-16889011 ] Hadoop QA commented on HDFS-14099: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 57s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 45s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 31s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 95m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.8 Server=18.09.8 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-441/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/441 | | JIRA Issue | HDFS-14099 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux def9e7cb7b24 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / cd967c7 | | Default Java | 1.8.0_212 | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-441/1/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test
[jira] [Commented] (HDFS-14099) Unknown frame descriptor when decompressing multiple frames in ZStandardDecompressor
[ https://issues.apache.org/jira/browse/HDFS-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699006#comment-16699006 ] ASF GitHub Bot commented on HDFS-14099: --- GitHub user ZanderXu opened a pull request: https://github.com/apache/hadoop/pull/441 HDFS-14099 fix bug where decompressing multiple frames in ZStandardDecompressor[HDFS-14099](https://issues.apache.org/jira/browse/HDFS-14099) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ZanderXu/hadoop fix-bug-when-decompress-multiple-frames-in-ZStandardDecompressor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/441.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #441 commit 4d6f2c39d063fee373335c1c278d0b0c01197907 Author: xuzq Date: 2018-11-26T13:38:26Z fix bug where decompressing multiple frames in ZStandardDecompressor > Unknown frame descriptor when decompressing multiple frames in > ZStandardDecompressor > > > Key: HDFS-14099 > URL: https://issues.apache.org/jira/browse/HDFS-14099 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.3 > Environment: Hadoop Version: hadoop-3.0.3 > Java Version: 1.8.0_144 >Reporter: xuzq >Priority: Major > > I need to use zstd compress in Hadoop. So I write a simple demo like this. > {code:java} > // code placeholder > while ((size = fsDataInputStream.read(bufferV2)) > 0 ) { > countSize += size; > if (countSize == 65536 * 8) { > if(!isFinished) { > // finish a frame in zstd > cmpOut.finish(); > isFinished = true; > } > fsDataOutputStream.flush(); > fsDataOutputStream.hflush(); > } > if(isFinished) { > LOG.info("Will resetState. N=" + n); > // reset the stream and write again > cmpOut.resetState(); > isFinished = false; > } > cmpOut.write(bufferV2, 0, size); > bufferV2 = new byte[5 * 1024 * 1024]; > n++; > } > {code} > > > And I use *hadoop fs -text* to read this file and failed. The error as blow. > {code:java} > java.lang.RuntimeException: native zStandard library not available: this > version of libhadoop was built without zstd support. > at > org.apache.hadoop.io.compress.ZStandardCodec.checkNativeCodeLoaded(ZStandardCodec.java:65) > at > org.apache.hadoop.io.compress.ZStandardCodec.getDecompressorType(ZStandardCodec.java:211) > at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:181) > at > org.apache.hadoop.io.compress.CompressionCodec$Util.createInputStreamWithCodecPool(CompressionCodec.java:157) > at > org.apache.hadoop.io.compress.ZStandardCodec.createInputStream(ZStandardCodec.java:182) > at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:157) > at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:96) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331) > at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) > at org.apache.hadoop.fs.shell.Command.run(Command.java:176) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:328) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:391) > {code} > > So I had to look the code, include jni, then I find this bug. > *ZSTD_initDStream(stream)* method may by called twice in the same frame. > The first is in *ZStandardDecompressor.c.* > > {code:java} > if (size == 0) { > (*env)->SetBooleanField(env, this, ZStandardDecompressor_finished, > JNI_TRUE); > size_t result = dlsym_ZSTD_initDStream(stream); > if (dlsym_ZSTD_isError(result)) { > THROW(env, "java/lang/InternalError", > dlsym_ZSTD_getErrorName(result)); > return (jint) 0; > } > } > {code} > This call here is correct, but *Finished* no longer be set to false, even if > the is some datas in *compressedBuffer* need to be decompressed. > The second is in *org.apache.hadoop.io.compress.DecompressorStream* by > *decompressor.reset()*, because *Finished* is always true after decompressed > a frame. > {code:java} > if (decompressor.finished()) { > // First see if there was any leftover buffered input from previous > // stream; if not, attempt to refill buffer. If refill -> EOF, we're > // all done; else reset, fix up input buffer, and get