[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=642714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-642714
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 27/Aug/21 06:52
Start Date: 27/Aug/21 06:52
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r697196205



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};

Review comment:
   @sunchao I found the root cause is here...
   
   If in one JVM, e.g. two tasks running in same node, there will be multiple 
`BuiltInGzipCompressor` instances. If we use static values for the 
HEADER/TRAILER, the crc value will be probably overwritten by other 
`BuiltInGzipCompressor` instances...
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 642714)
Time Spent: 25h 20m  (was: 25h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 25h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=639484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-639484
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 18/Aug/21 17:02
Start Date: 18/Aug/21 17:02
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898777110






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 639484)
Time Spent: 25h 10m  (was: 25h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638871
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 17/Aug/21 20:26
Start Date: 17/Aug/21 20:26
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898777110






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638871)
Time Spent: 25h  (was: 24h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638306
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 16/Aug/21 17:12
Start Date: 16/Aug/21 17:12
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-899675356


   Thank you @sunchao !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638306)
Time Spent: 24h 50m  (was: 24h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 24h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638304
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 16/Aug/21 17:08
Start Date: 16/Aug/21 17:08
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-899672814


   Test failures unrelated. Merged to trunk. Thanks @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638304)
Time Spent: 24h 40m  (was: 24.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638303
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 16/Aug/21 17:08
Start Date: 16/Aug/21 17:08
Worklog Time Spent: 10m 
  Work Description: sunchao merged pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 638303)
Time Spent: 24.5h  (was: 24h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 24.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=638302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-638302
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 16/Aug/21 17:07
Start Date: 16/Aug/21 17:07
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898179005


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  0s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  0s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  2s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m  9s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux a1ce48e1d2df 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4112e047030ac8318ae5aee0bf3c5d0d104d6c1e |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/testReport/ |
   | Max. process+thread count | 1267 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637926
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 14/Aug/21 03:47
Start Date: 14/Aug/21 03:47
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898811923


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 25s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/29/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  6s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m 16s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/29/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux c10d59b16d4a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 92d0671a3368f916bf670c9891143c94098a2c1c |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/29/testReport/ |
   | Max. process+thread count | 3152 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637915
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 14/Aug/21 01:05
Start Date: 14/Aug/21 01:05
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898785995


   Thank you @sunchao !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637915)
Time Spent: 24h  (was: 23h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 24h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637911=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637911
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 14/Aug/21 00:50
Start Date: 14/Aug/21 00:50
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898783160


   > @sunchao They are from testBZip2NativeCodec and 
testCodecPoolCompressorReinit. They are original code which this change doesn't 
touch. Do we want to change it here?
   
   It should be fine then. I just triggered CI again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637911)
Time Spent: 23h 50m  (was: 23h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 23h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637907
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 14/Aug/21 00:17
Start Date: 14/Aug/21 00:17
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898778379


   Looks like unrelated failure.
   
   ```
   [ERROR] Tests run: 18, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 
2.128 s <<< FAILURE! - in org.apache.hadoop.metrics2.source.TestJvmMetrics
   [ERROR] testGetMetricsPerf(org.apache.hadoop.metrics2.source.TestJvmMetrics) 
 Time elapsed: 0.841 s  <<< ERROR!
   java.lang.OutOfMemoryError: unable to create new native thread
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637907)
Time Spent: 23h 40m  (was: 23.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 23h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637906
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 14/Aug/21 00:10
Start Date: 14/Aug/21 00:10
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898777110


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 10s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  19m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 11s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/28/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  17m 25s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  17m 29s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/28/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 188m  8s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.metrics2.source.TestJvmMetrics |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/28/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 9d7f770fbc29 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 92d0671a3368f916bf670c9891143c94098a2c1c |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637900
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 14/Aug/21 00:00
Start Date: 14/Aug/21 00:00
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898773785


   @sunchao They are from `testBZip2NativeCodec` and 
`testCodecPoolCompressorReinit`. They are original code which this change 
doesn't touch. Do we want to change it here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637900)
Time Spent: 23h 20m  (was: 23h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 23h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637898
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 23:28
Start Date: 13/Aug/21 23:28
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898767347


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  3s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  17m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  24m 43s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  20m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  4s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/27/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  17m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 51s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 194m 11s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/27/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux af0b3e2b0f23 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 87ae6bb66646c1ac6fab8896e443eb0c54500308 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/27/testReport/ |
   | Max. process+thread count | 1266 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637887
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 20:49
Start Date: 13/Aug/21 20:49
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688773637



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
##
@@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws 
IOException {
 
 BufferedWriter w = null;
 Compressor gzipCompressor = CodecPool.getCompressor(codec);
-if (null != gzipCompressor) {
-  // If it gives us back a Compressor, we should be able to use this
-  // to write files we can then read back with Java's gzip tools.
-  OutputStream os = new CompressorStream(new FileOutputStream(fileName),
-  gzipCompressor);
-  w = new BufferedWriter(new OutputStreamWriter(os));
-  w.write(msg);
-  w.close();
-  CodecPool.returnCompressor(gzipCompressor);
-
-  verifyGzipFile(fileName, msg);
-}
-
-// Create a gzip text file via codec.getOutputStream().

Review comment:
   Oh, got it. Yea, removed it accidentally. Restoring it with multi-write.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637887)
Time Spent: 23h  (was: 22h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 23h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637885
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 20:43
Start Date: 13/Aug/21 20:43
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688770891



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
##
@@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws 
IOException {
 
 BufferedWriter w = null;
 Compressor gzipCompressor = CodecPool.getCompressor(codec);
-if (null != gzipCompressor) {
-  // If it gives us back a Compressor, we should be able to use this
-  // to write files we can then read back with Java's gzip tools.
-  OutputStream os = new CompressorStream(new FileOutputStream(fileName),
-  gzipCompressor);
-  w = new BufferedWriter(new OutputStreamWriter(os));
-  w.write(msg);
-  w.close();
-  CodecPool.returnCompressor(gzipCompressor);
-
-  verifyGzipFile(fileName, msg);
-}
-
-// Create a gzip text file via codec.getOutputStream().

Review comment:
   oh I mean the original test with comment "// Create a gzip text file via 
codec.getOutputStream()." I think we should change it to use multi-write too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637885)
Time Spent: 22h 50m  (was: 22h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 22h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637884
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 20:41
Start Date: 13/Aug/21 20:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897916404


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 15s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 36s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 41s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 187m 47s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e46a4b76434d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d661abcbd46f7d907db31b1cd4557f9397430dab |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637880
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 20:03
Start Date: 13/Aug/21 20:03
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688751999



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {

Review comment:
   as we now only write the header once for all inputs, seems okay. Let me 
change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637880)
Time Spent: 22.5h  (was: 22h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637876
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 19:36
Start Date: 13/Aug/21 19:36
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688739194



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {

Review comment:
   yea, sorry `state == 
BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC` - it is a stronger 
guarantee than the current one, is it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637876)
Time Spent: 22h 20m  (was: 22h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 22h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637875
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 19:32
Start Date: 13/Aug/21 19:32
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688737420



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {

Review comment:
   Let me keep it for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637875)
Time Spent: 22h 10m  (was: 22h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 22h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637874
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 19:32
Start Date: 13/Aug/21 19:32
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688737260



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {

Review comment:
   Or you mean `state == 
BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637874)
Time Spent: 22h  (was: 21h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637873
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 19:31
Start Date: 13/Aug/21 19:31
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688736893



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {

Review comment:
   Oh, I remember I saw this guard in other compressor.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637873)
Time Spent: 21h 50m  (was: 21h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 21h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637872
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 19:30
Start Date: 13/Aug/21 19:30
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688736308



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {
+  int outputHeaderSize = writeHeader(b, off, len);
+  numExtraBytesWritten += outputHeaderSize;
+
+  compressedBytesWritten += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return compressedBytesWritten;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  compressedBytesWritten += deflated;
+  off += deflated;
+  len -= deflated;
+
+  // All current input are processed. And `finished` is called. Going to 
output trailer.
+  if (deflater.finished()) {
+state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+fillTrailer();
+  } else {
+return compressedBytesWritten;
+  }
+}
+
+int outputTrailerSize = writeTrailer(b, off, len);

Review comment:
   okay




-- 
This is an automated message from the Apache Git Service.
To respond 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637871
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 19:29
Start Date: 13/Aug/21 19:29
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688736051



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {

Review comment:
   This block writes the header. We only write the header when in 
`HEADER_BASIC` state.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637871)
Time Spent: 21.5h  (was: 21h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 21.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637838
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 16:36
Start Date: 13/Aug/21 16:36
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688637566



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return state == BuiltInGzipDecompressor.GzipStateLabel.FINISHED && 
deflater.finished();
+  }
+
+  @Override
+  public boolean needsInput() {
+return deflater.needsInput() && state != 
BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+if (finished()) {
+  throw new IOException("compress called on finished compressor");
+}
+
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {

Review comment:
   sorry to raise this again, but I think it's safe to use `state != 
BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC` now?

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637828
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 16:21
Start Date: 13/Aug/21 16:21
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688633641



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
##
@@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws 
IOException {
 
 BufferedWriter w = null;
 Compressor gzipCompressor = CodecPool.getCompressor(codec);
-if (null != gzipCompressor) {
-  // If it gives us back a Compressor, we should be able to use this
-  // to write files we can then read back with Java's gzip tools.
-  OutputStream os = new CompressorStream(new FileOutputStream(fileName),
-  gzipCompressor);
-  w = new BufferedWriter(new OutputStreamWriter(os));
-  w.write(msg);
-  w.close();
-  CodecPool.returnCompressor(gzipCompressor);
-
-  verifyGzipFile(fileName, msg);
-}
-
-// Create a gzip text file via codec.getOutputStream().

Review comment:
   Hmm? you mean keeping original single write?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637828)
Time Spent: 21h 10m  (was: 21h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 21h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637675
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 04:15
Start Date: 13/Aug/21 04:15
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-898179005


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  0s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  0s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  2s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m  9s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux a1ce48e1d2df 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4112e047030ac8318ae5aee0bf3c5d0d104d6c1e |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/26/testReport/ |
   | Max. process+thread count | 1267 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637616=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637616
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 01:16
Start Date: 13/Aug/21 01:16
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688181189



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   We should return `deflater.needsInput()` in most cases. But if we are 
writing the trailer, even it needs input, we should return false.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637616)
Time Spent: 20h 50m  (was: 20h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637614
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 01:14
Start Date: 13/Aug/21 01:14
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688180734



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Rethinking about it and testing locally, I think it should be 
   
   ```java
   deflater.needsInput() && state 
!=BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
   ```
   

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637604
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:53
Start Date: 13/Aug/21 00:53
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688174639



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Yes you're right, however the usage pattern for this seems to be:
   ```java
   compressor.setInput(b, off, len);
   while (!compressor.needsInput()) {
 compress();
   }
   ```
   so `setInput` is always called first. Although, it might be safer to also 
add a check of `HEADER_BASIC` there too.
   
   In the code snippet you have above, seems it will return true when for the 
first time `compressor.setInput` is called (since the state is `HEADER_BASIC`) 
, and it will not call `compress` afterwards?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637604)
Time Spent: 20.5h  (was: 20h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637603
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:53
Start Date: 13/Aug/21 00:53
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688174639



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Yes that's right. The usage pattern for this seems to be:
   ```java
   compressor.setInput(b, off, len);
   while (!compressor.needsInput()) {
 compress();
   }
   ```
   so `setInput` is always called first. Although, it might be safer to also 
add a check of `HEADER_BASIC` there too.
   
   In the code snippet you have above, seems it will return true when for the 
first time `compressor.setInput` is called (since the state is `HEADER_BASIC`) 
, and it will not call `compress` afterwards?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637603)
Time Spent: 20h 20m  (was: 20h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637596=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637596
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:10
Start Date: 13/Aug/21 00:10
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688162145



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Hm, but `deflater.needsInput()` actually return true after we just 
create Deflater instance (not call `setInput` yet).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637596)
Time Spent: 20h  (was: 19h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637597
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:10
Start Date: 13/Aug/21 00:10
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688162145



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Hm, but `deflater.needsInput()` actually returns true after we just 
create Deflater instance (not call `setInput` yet and the state is 
`HEADER_BASIC`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637597)
Time Spent: 20h 10m  (was: 20h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637594
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:06
Start Date: 13/Aug/21 00:06
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688160670



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   In `HEADER_BASIC` it'll return false since `deflater.needsInput` will 
return false (since it hasn't start processing the input yet). I think this 
logic is correct since we need it to go into `compress` method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637594)
Time Spent: 19h 50m  (was: 19h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637592
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:03
Start Date: 13/Aug/21 00:03
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688160063



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Seems to be:
   
   ```java
   if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
 return deflater.needsInput();
   }
   
   return state == BuiltInGzipDecompressor.GzipStateLabel.HEADER_BASIC;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637592)
Time Spent: 19h 40m  (was: 19.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637591
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 13/Aug/21 00:01
Start Date: 13/Aug/21 00:01
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688159282



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   Then `needsInput` will return false for `HEADER_BASIC` state? Seems no 
correct.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637591)
Time Spent: 19.5h  (was: 19h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637590
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 23:57
Start Date: 12/Aug/21 23:57
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688158036



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;

Review comment:
   lol yeah, we've come back from a circle :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637590)
Time Spent: 19h 20m  (was: 19h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637584
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 23:52
Start Date: 12/Aug/21 23:52
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688156120



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;

Review comment:
   I remember I use `deflater.finished()` at the beginning to check if it 
is the timing to output the trailer. So seems it already outputs the trailer 
for all inputs...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637584)
Time Spent: 19h 10m  (was: 19h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637583=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637583
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 23:50
Start Date: 12/Aug/21 23:50
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688155475



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;

Review comment:
   Oh, you're right. I may misunderstand it. Seems `finished` is not 
necessary, we can just check `deflater.finished()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637583)
Time Spent: 19h  (was: 18h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637575
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 22:56
Start Date: 12/Aug/21 22:56
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688136794



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;

Review comment:
   > but just we output the trailer for each input
   
   Hmm really? when I tested it, it still writes trailer after all the inputs. 
My understanding is `deflater.finished()` only returns true when `finish` is 
called.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637575)
Time Spent: 18h 50m  (was: 18h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637571
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 22:45
Start Date: 12/Aug/21 22:45
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688132453



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;

Review comment:
   Currently we only write the trailer after all inputs are compressed (the 
caller calls `finish()`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637571)
Time Spent: 18h 40m  (was: 18.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637570
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 22:44
Start Date: 12/Aug/21 22:44
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688132212



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;

Review comment:
   `finished` is used to make sure we write the trailer after `finish()` is 
called. In other words, if we remove it, it is okay (this is actually how I 
implemented it previously), but just we output the trailer for each input.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637570)
Time Spent: 18.5h  (was: 18h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637563
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 22:17
Start Date: 12/Aug/21 22:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894594938






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637563)
Time Spent: 18h 20m  (was: 18h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637562
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 22:16
Start Date: 12/Aug/21 22:16
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r688118961



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,271 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{
+  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private boolean finished;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+  private int accuButLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) {
+init(conf);
+  }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return finished && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+// After we output the trailer for the current input, we can take another 
input.

Review comment:
   This comment is outdated. Also I wonder if we can change it to something 
like:
   ```java
   if (!deflater.needsInput()) {
 return false;
   }
   
   // After we output the trailer for the current input, we can take 
another input.
   return state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM;
   ```
   
   It seems strange that we'd need more input when state is `FINISHED`.

##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/TestCodec.java
##
@@ -882,26 +908,28 @@ private void testGzipCodecWrite(boolean useNative) throws 
IOException {
 
 BufferedWriter w = null;
 Compressor gzipCompressor = CodecPool.getCompressor(codec);
-if (null != gzipCompressor) {
-  // If it gives us back a Compressor, we should be able to use this
-  // to write files we can then read back with Java's gzip tools.
-  OutputStream os = new CompressorStream(new FileOutputStream(fileName),
-  gzipCompressor);
-  w = new BufferedWriter(new OutputStreamWriter(os));
-  w.write(msg);
-  w.close();
-  CodecPool.returnCompressor(gzipCompressor);
-
-  verifyGzipFile(fileName, msg);
-}
-
-// Create a gzip text file via codec.getOutputStream().

Review comment:
   maybe we should keep this test still.

##
File path: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637524
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 19:40
Start Date: 12/Aug/21 19:40
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897916404


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 15s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 36s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 41s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 187m 47s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e46a4b76434d 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d661abcbd46f7d907db31b1cd4557f9397430dab |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/25/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637410
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 16:31
Start Date: 12/Aug/21 16:31
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897787276


   Sure. Just re-triggered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637410)
Time Spent: 17h 50m  (was: 17h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637403
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 16:21
Start Date: 12/Aug/21 16:21
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897775006


   @sunchao CI failed by the following issue again:
   
   ```
   [ERROR] Failed to execute goal on project hadoop-client-integration-tests: 
Could not resolve dependencies for project 
org.apache.hadoop:hadoop-client-integration-tests:jar:3.4.0-SNAPSHOT: Failed to 
collect dependencies at javax.activation:activation:jar:1.1.1: Failed to read 
artifact descriptor for javax.activation:activation:jar:1.1.1: Could not 
transfer artifact javax.activation:activation:pom:1.1.1 from/to central 
(https://repo.maven.apache.org/maven2): Transfer failed for 
https://repo.maven.apache.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.pom:
 Connection reset -> [Help 1]
   ```
   
   Could you help re-trigger the CI? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637403)
Time Spent: 17h 40m  (was: 17.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637303
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 12/Aug/21 10:49
Start Date: 12/Aug/21 10:49
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897537496


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  34m 13s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/24/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |  23m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m 54s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  23m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 50s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  20m 50s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 15s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/24/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 49s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 46s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 50s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 198m 26s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/24/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 3fac6fb8a31e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d661abcbd46f7d907db31b1cd4557f9397430dab |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=637141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637141
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 11/Aug/21 21:14
Start Date: 11/Aug/21 21:14
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-897158917


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 58s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  22m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 52s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/23/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 332 
unchanged - 0 fixed = 334 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 16s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 186m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/23/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e22538cca47a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1a1df5e5726beaf90c3112e6a9e4c83ae695658a |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/23/testReport/ |
   | Max. process+thread count | 1262 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=636718=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636718
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 11/Aug/21 03:55
Start Date: 11/Aug/21 03:55
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-896479273


   @sunchao Please let me know if the new change looks good to you. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 636718)
Time Spent: 17h 10m  (was: 17h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=636714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-636714
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 11/Aug/21 03:26
Start Date: 11/Aug/21 03:26
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-896471265


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 42s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  22m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 55s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 55s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  17m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 35s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 183m 27s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/22/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 73f41b858049 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2e04fa60c3d5ae2373dc79f0037b7926a201cfc3 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/22/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/22/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635974
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 09/Aug/21 17:32
Start Date: 09/Aug/21 17:32
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893970892






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635974)
Time Spent: 16h 50m  (was: 16h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635936
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 09/Aug/21 16:29
Start Date: 09/Aug/21 16:29
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-895365571


   Thanks @steveloughran. I also don't think there's risk here. The compressing 
here basically takes the buffers given by the caller, instead of allocating by 
itself.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635936)
Time Spent: 16h 40m  (was: 16.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635864
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 09/Aug/21 13:21
Start Date: 09/Aug/21 13:21
Worklog Time Spent: 10m 
  Work Description: steveloughran commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-895217443


   One concern here: anything we have to worry about from a security 
perspective? That is, if someone sends in something with an invalid range, does 
that trigger allocation of massive buffers, etc, etc. commons-compress has had 
security issues over time with things like .. in paths. I don't think there's 
risk here, but it's worth considering: do we have to worry about malicious gz 
files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635864)
Time Spent: 16.5h  (was: 16h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635540
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 07/Aug/21 02:54
Start Date: 07/Aug/21 02:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894594938


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 57s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 36s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 53s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 179m 47s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/21/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 1a509ee1f77e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e0bec4a7b2a5df7a1148e8b1023893dd6ff50ec6 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/21/testReport/ |
   | Max. process+thread count | 3150 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/21/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635513
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 23:34
Start Date: 06/Aug/21 23:34
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894566377


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m  4s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 53s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 27s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 39s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 36s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  6s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 20s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 180m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/20/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 5f58db499a75 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 939b349d899203203998e0ed6ba8125366ec0a5a |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/20/testReport/ |
   | Max. process+thread count | 1262 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/20/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635477
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 20:22
Start Date: 06/Aug/21 20:22
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894499414


   > @viirya there are a few style issues in 
[here](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 - could you fix them?
   
   Oh, ok. Let me fix them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635477)
Time Spent: 16h  (was: 15h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635476
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 20:22
Start Date: 06/Aug/21 20:22
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r684484896



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return false;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {
+  int outputHeaderSize = writeHeader(b, off, len);
+  numExtraBytesWritten += outputHeaderSize;
+
+  compressedBytesWritten += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return compressedBytesWritten;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  compressedBytesWritten += deflated;
+  off += deflated;
+  len -= deflated;
+
+  // All current input are processed. Going to output trailer.
+  if (deflater.finished()) {
+state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+fillTrailer();
+  } else {
+return compressedBytesWritten;
+  }
+}
+
+int outputTrailerSize = writeTrailer(b, off, len);
+numExtraBytesWritten += outputTrailerSize;
+
+compressedBytesWritten += outputTrailerSize;
+
+return compressedBytesWritten;
+  }
+
+  @Override
+  public long getBytesRead() {
+return deflater.getTotalIn();
+  }
+
+  

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635474
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 20:18
Start Date: 06/Aug/21 20:18
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r684483080



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return false;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {
+  int outputHeaderSize = writeHeader(b, off, len);
+  numExtraBytesWritten += outputHeaderSize;
+
+  compressedBytesWritten += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return compressedBytesWritten;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  compressedBytesWritten += deflated;
+  off += deflated;
+  len -= deflated;
+
+  // All current input are processed. Going to output trailer.
+  if (deflater.finished()) {
+state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+fillTrailer();
+  } else {
+return compressedBytesWritten;
+  }
+}
+
+int outputTrailerSize = writeTrailer(b, off, len);
+numExtraBytesWritten += outputTrailerSize;
+
+compressedBytesWritten += outputTrailerSize;
+
+return compressedBytesWritten;
+  }
+
+  @Override
+  public long getBytesRead() {
+return deflater.getTotalIn();
+  }
+
+  

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635434
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 18:24
Start Date: 06/Aug/21 18:24
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-894439227


   @viirya there are a few style issues in 
[here](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 - could you fix them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635434)
Time Spent: 15.5h  (was: 15h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635432
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 18:22
Start Date: 06/Aug/21 18:22
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r684424851



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private static final byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private static final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private static final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numExtraBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return false;
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+if (currentBufLen <= 0) {
+  return compressedBytesWritten;
+}
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&
+state != BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC) {
+  int outputHeaderSize = writeHeader(b, off, len);
+  numExtraBytesWritten += outputHeaderSize;
+
+  compressedBytesWritten += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return compressedBytesWritten;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  compressedBytesWritten += deflated;
+  off += deflated;
+  len -= deflated;
+
+  // All current input are processed. Going to output trailer.
+  if (deflater.finished()) {
+state = BuiltInGzipDecompressor.GzipStateLabel.TRAILER_CRC;
+fillTrailer();
+  } else {
+return compressedBytesWritten;
+  }
+}
+
+int outputTrailerSize = writeTrailer(b, off, len);
+numExtraBytesWritten += outputTrailerSize;
+
+compressedBytesWritten += outputTrailerSize;
+
+return compressedBytesWritten;
+  }
+
+  @Override
+  public long getBytesRead() {
+return deflater.getTotalIn();
+  }
+
+  

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635423
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 18:09
Start Date: 06/Aug/21 18:09
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892989941






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 635423)
Time Spent: 15h 10m  (was: 15h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635420
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 18:08
Start Date: 06/Aug/21 18:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892363546


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/12/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 15 new + 332 
unchanged - 0 fixed = 347 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  3s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 177m 32s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 205b08ba081f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0a1bb194f29612863d8ad31971a737b30be4d982 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/12/testReport/ |
   | Max. process+thread count | 1267 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=635419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-635419
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 18:08
Start Date: 06/Aug/21 18:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus removed a comment on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-891453787


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 10s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/8/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 156 new + 332 
unchanged - 0 fixed = 488 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 18s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 39s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m 53s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 0f0b4b015c68 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 49619e3f1ebdc62c89bcb74fd5cbf75a80a0601c |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/8/testReport/ |
   | Max. process+thread count | 2952 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634936
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 03:16
Start Date: 06/Aug/21 03:16
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893970892


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  22m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  20m 48s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 7 new + 332 
unchanged - 0 fixed = 339 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 34s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 54s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 182m 50s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 05889202c053 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d5ac82e295d0dacc890c74786d15758c4b8e51e3 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/19/testReport/ |
   | Max. process+thread count | 3149 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634933=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634933
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 02:52
Start Date: 06/Aug/21 02:52
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683910228



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   I think calling finish will set a flag to tell it flush the data when 
finished. 
   
   Yea the current approach also looks good to me. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634933)
Time Spent: 14.5h  (was: 14h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634926
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 02:29
Start Date: 06/Aug/21 02:29
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683902983



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm, I'm not sure. :)
   
   I looked at `finish()` at `Deflater`. It just sets a `finish` variable to 
true. But the variable is not used at all. So technically, I guess you still 
can call its `setInput` to set new input and `deflate` again?
   
   Because of that, I take more conservative approach here in case of it.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634926)
Time Spent: 14h 20m  (was: 14h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634924
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 02:17
Start Date: 06/Aug/21 02:17
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683899349



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   I see. Yea I meant `CompressorStream` calls its own `compress` method 
which calls `compressor.compress` indirectly.
   
   > Calling finish() on this compressor won't set state to FINISHED.
   
   Oh sorry I was looking at an old version of this PR which still set it to 
`FINISHED` in `finish()`. Never mind.
   
   So this looks good then :) Although I think the following:
   
   > What I thought is, the caller might set input and compress until it 
doesn't need input. The state is in FINISHED and the caller might call set 
input and compress again. At the moment this check isn't effective to write the 
header.
   
   Will also not happen? since when the state is finished, the caller should 
not call `setInput` before it calls `reset` on the compressor.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634924)
Time Spent: 14h 10m  (was: 14h)

> Add BuiltInGzipCompressor
> 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634919
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 01:54
Start Date: 06/Aug/21 01:54
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683892398



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Oh, I mean `Compressor` doesn't have a `compress()` method. Once calling 
`compress`, the caller must provide a buffer for compressed output.
   
   Calling `finish()` on this compressor won't set state to `FINISHED`. Only if 
we output the trailer, we set the state to `FINISHED`. In this state, the 
caller can set new input and call `compress(buf, off, len)` again. We will 
output new header and enter into another compressed output (section).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634919)
Time Spent: 14h  (was: 13h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634918=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634918
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 01:49
Start Date: 06/Aug/21 01:49
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683890692



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   > So seems we cannot do like CompressorStream.
   
   Hmm what do you mean here? sorry don't quite understand it.
   
   > Once the caller calls finish on this Compressor, we only call finish on 
the deflator. The caller then will call finished to verify if it reaches 
finished state. If not, it should call compress with buffer to get more 
compressed output.
   
   Yes. So if the input is large but the buffer in a `CompressorStream` is 
small, potentially it will need to call `compress` multiple times after the 
`finish()` is invoked, before it can reach to the `finished` status (i.e., 
`deflater.finished()` returns true).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634918)
Time Spent: 13h 50m  (was: 13h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634913
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 01:37
Start Date: 06/Aug/21 01:37
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683887010



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   For `Compressor` here, `compress` requires a buffer to write compressed 
output to. So seems we cannot do like `CompressorStream`.
   
   Once the caller calls `finish` on this `Compressor`, we only call `finish` 
on the deflator`. The caller then will call `finished` to verify if it reaches 
finished state. If not, it should call `compress` with buffer to get more 
compressed output.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634913)
Time Spent: 13h 40m  (was: 13.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634899
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 00:42
Start Date: 06/Aug/21 00:42
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683870968



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Ah OK, so the compression can happen on multiple inputs. Then, I'm 
curious whether we should handle the `FINISHED` state too in this `if` clause. 
For instance, in the following situation:
   
   ```java
 @Override
 public void finish() throws IOException {
   if (!compressor.finished()) {
 compressor.finish();
 while (!compressor.finished()) {
   compress();
 }
   }
 }
   ```
   
   the `CompressorStream` will first set the state to be `FINISHED` and then 
keep calling `compress` until it is finished. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634899)
Time Spent: 13.5h  (was: 13h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
> 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634887
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 00:11
Start Date: 06/Aug/21 00:11
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683862198



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Let me revert to original condition `INFLATE_STREAM` and `TRAILER_CRC`. 
It looks more reliable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634887)
Time Spent: 13h 20m  (was: 13h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634877
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 00:03
Start Date: 06/Aug/21 00:03
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683859384



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm, oh, no, I think we cannot do it. `setInput` can be called multiple 
times before we reach `FINISHED` status. If we set the state to `HEADER_BASIC`, 
it will re-output the header, but the current trailer is not output yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634877)
Time Spent: 13h 10m  (was: 13h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634875
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 06/Aug/21 00:02
Start Date: 06/Aug/21 00:02
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683859384



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm, oh, no, I think we cannot do it. `setInput` can be called multiple 
times before we reach finished status. If we set the state to `HEADER_BASIC`, 
it will re-output the header, but the current trailer is not output yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634875)
Time Spent: 13h  (was: 12h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634858
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 23:07
Start Date: 05/Aug/21 23:07
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683840945



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Seems so. Let me update it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634858)
Time Spent: 12h 50m  (was: 12h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634855
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 22:58
Start Date: 05/Aug/21 22:58
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893873105


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/18/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 7 new + 332 
unchanged - 0 fixed = 339 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 13s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m  1s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/18/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 53a5467acfb9 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 204709d1bc0ebb41521219367c39d1badc698ab1 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/18/testReport/ |
   | Max. process+thread count | 1266 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634831
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 22:04
Start Date: 05/Aug/21 22:04
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683816220



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm you are right. Should we change the state to `HEADER_BASIC` in 
`setInput`? it seems we should do so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634831)
Time Spent: 12.5h  (was: 12h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634786
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 20:08
Start Date: 05/Aug/21 20:08
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683753560



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   What I thought is, the caller might set input and compress until it 
doesn't need input. The state is in `FINISHED` and the caller might call set 
input and compress again. At the moment this check isn't effective to write the 
header.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634786)
Time Spent: 12h 20m  (was: 12h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634779
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 20:01
Start Date: 05/Aug/21 20:01
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683749490



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm, okay. I set it to `HEADER_BASIC` and see if CI can pass.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634779)
Time Spent: 12h 10m  (was: 12h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634763
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 19:31
Start Date: 05/Aug/21 19:31
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893728072


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  9s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/17/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 7 new + 332 
unchanged - 0 fixed = 339 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 17s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m 55s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/17/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e238c79df9e4 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f1db328eb900b4ffe96f4150ecb4359d389c67de |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/17/testReport/ |
   | Max. process+thread count | 3158 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634658
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:31
Start Date: 05/Aug/21 16:31
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893597879


   Re-triggered the CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634658)
Time Spent: 11h 50m  (was: 11h 40m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634656
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:30
Start Date: 05/Aug/21 16:30
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683610965



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   I think `compress` is always used in loops like this:
   ```java
   while (!compresser.needsInput()) {
 compresser.compress(..)
   }
   ```
   so if the state transitioned to `FINISHED`, we'd come out of the loop and 
ask for more input to compress. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634656)
Time Spent: 11h 40m  (was: 11.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634652
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:25
Start Date: 05/Aug/21 16:25
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893593265


   @sunchao Could you trigger the CI again? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634652)
Time Spent: 11.5h  (was: 11h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634650=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634650
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 16:24
Start Date: 05/Aug/21 16:24
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893592758


   Last CI failure looks unrelated:
   
   ```
   [ERROR] Failed to execute goal on project hadoop-yarn-common: Could not 
resolve dependencies for project 
org.apache.hadoop:hadoop-yarn-common:jar:3.4.0-SNAPSHOT: Failed to collect 
dependencies at com.sun.jersey:jersey-client:jar:1.19: Failed to read artifact 
descriptor for com.sun.jersey:jersey-client:jar:1.19: Could not transfer 
artifact com.sun.jersey:jersey-client:pom:1.19 from/to central 
(https://repo.maven.apache.org/maven2): Transfer failed for 
https://repo.maven.apache.org/maven2/com/sun/jersey/jersey-client/1.19/jersey-client-1.19.pom:
 Connection reset -> [Help 1]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634650)
Time Spent: 11h 20m  (was: 11h 10m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634498
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:59
Start Date: 05/Aug/21 11:59
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893374924


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |   9m 54s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/16/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |  26m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m  3s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m  3s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/16/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 7 new + 332 
unchanged - 0 fixed = 339 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  5s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 163m 52s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/16/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 0f0bd9b33214 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f1db328eb900b4ffe96f4150ecb4359d389c67de |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634363
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:42
Start Date: 05/Aug/21 11:42
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r682782504



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;

Review comment:
   nit: maybe better do `GZIP_HEADER_LEN = GZIP_HEADER.length`.

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;
+  private final int GZIP_TRAILER_LEN = 8;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private byte[] userBuf = null;
+  private int userBufOff = 0;
+  private int userBufLen = 0;
+
+  private int headerBytesWritten = 0;
+  private int trailerBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634303
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:36
Start Date: 05/Aug/21 11:36
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r682842698



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;
+  private final int GZIP_TRAILER_LEN = 8;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private byte[] userBuf = null;
+  private int userBufOff = 0;
+  private int userBufLen = 0;
+
+  private int headerBytesWritten = 0;
+  private int trailerBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int numAvailBytes = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  if (userBufLen <= 0) {
+return numAvailBytes;
+  }
+
+  int outputHeaderSize = writeHeader(b, off, len);
+  headerBytesWritten += outputHeaderSize;
+
+  // Completes header output.
+  if (headerOff == GZIP_HEADER_LEN) {
+state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM;
+  }
+
+  numAvailBytes += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return numAvailBytes;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // hand off user data (or what's left of it) to Deflater--but note that
+  // Deflater may not have consumed all of previous bufferload, in which 
case
+  // userBufLen will be zero
+  if (userBufLen > 0) {
+deflater.setInput(userBuf, userBufOff, userBufLen);
+
+crc.update(userBuf, userBufOff, userBufLen);  // CRC-32 is on 
uncompressed data
+
+currentBufLen = userBufLen;
+userBufOff += userBufLen;
+userBufLen = 0;
+  }
+
+
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  numAvailBytes += deflated;
+  off += 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634117
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:15
Start Date: 05/Aug/21 11:15
Worklog Time Spent: 10m 
  Work Description: viirya commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892860619






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634117)
Time Spent: 10h 40m  (was: 10.5h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634095=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634095
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:11
Start Date: 05/Aug/21 11:11
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892989941






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 634095)
Time Spent: 10.5h  (was: 10h 20m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we migrated to using prepared packages (lz4, 
> snappy), it will be better if we support GzipCodec generally without Hadoop 
> native codec installed. Similar to BuiltInGzipDecompressor, we can use Java 
> Deflater to support BuiltInGzipCompressor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=634059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-634059
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 11:00
Start Date: 05/Aug/21 11:00
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893365445


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/15/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 7 new + 332 
unchanged - 0 fixed = 339 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  5s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m 12s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/15/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 8d5a0ff00386 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d7f052a65c680122d111e15428139a5e2fdf43e2 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/15/testReport/ |
   | Max. process+thread count | 1267 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633979
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:52
Start Date: 05/Aug/21 07:52
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683215249



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm, if we change to `HEADER_BASIC` here, we may also need to check if 
the state is `FINISHED`? Otherwise after we output the trailer, we cannot call 
compress again to compress on another input.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633979)
Time Spent: 10h 10m  (was: 10h)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633977=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633977
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 07:51
Start Date: 05/Aug/21 07:51
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683215249



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = GZIP_HEADER.length;
+  private final int GZIP_TRAILER_LEN = GZIP_TRAILER.length;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private int numBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int compressedBytesWritten = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM &&

Review comment:
   Hmm, if we change to `HEADER_BASIC` here, we may also need to check if 
the state is `FINISHED`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 633977)
Time Spent: 10h  (was: 9h 50m)

> Add BuiltInGzipCompressor
> -
>
> Key: HADOOP-17825
> URL: https://issues.apache.org/jira/browse/HADOOP-17825
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: L. C. Hsieh
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is 
> not loaded. So, without Hadoop native codec installed, saving SequenceFile 
> using GzipCodec will throw exception like "SequenceFile doesn't work with 
> GzipCodec without native-hadoop code!"
> Same as other codecs which we 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633959
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 05/Aug/21 06:20
Start Date: 05/Aug/21 06:20
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r683008770



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};

Review comment:
   nit: this could be `private static final`

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SequenceFile.java
##
@@ -1180,14 +1180,6 @@ public static Option syncInterval(int value) {
   new Metadata() : metadataOption.getValue();
   this.compress = compressionTypeOption.getValue();
   final CompressionCodec codec = compressionTypeOption.getCodec();
-  if (codec != null &&

Review comment:
   there are a few unused imports in this file.

##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  // The trailer will be overwritten based on crc and output size.
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633854
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 04/Aug/21 21:54
Start Date: 04/Aug/21 21:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-893000270


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 28s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 24s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 10s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/14/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 12 new + 332 
unchanged - 0 fixed = 344 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  5s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  1s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 178m  6s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/14/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux dc221d00cbe1 4.15.0-151-generic #157-Ubuntu SMP Fri Jul 9 
23:07:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6b823c2a3df28393c89a954ecc0e3ad34c49c3ed |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/14/testReport/ |
   | Max. process+thread count | 3153 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633843
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 04/Aug/21 21:33
Start Date: 04/Aug/21 21:33
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#issuecomment-892989941


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 27s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  20m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  9s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/13/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 14 new + 332 
unchanged - 0 fixed = 346 total (was 332)  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m  7s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 180m  5s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/13/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3250 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 1a928499ddd8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8b752957cb7d7612dc1849244b6064cdd854e20b |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3250/13/testReport/ |
   | Max. process+thread count | 2226 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633762
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 04/Aug/21 18:54
Start Date: 04/Aug/21 18:54
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r682876971



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;
+  private final int GZIP_TRAILER_LEN = 8;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private byte[] userBuf = null;
+  private int userBufOff = 0;
+  private int userBufLen = 0;
+
+  private int headerBytesWritten = 0;
+  private int trailerBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int numAvailBytes = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  if (userBufLen <= 0) {
+return numAvailBytes;
+  }
+
+  int outputHeaderSize = writeHeader(b, off, len);
+  headerBytesWritten += outputHeaderSize;
+
+  // Completes header output.
+  if (headerOff == GZIP_HEADER_LEN) {
+state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM;
+  }
+
+  numAvailBytes += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return numAvailBytes;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // hand off user data (or what's left of it) to Deflater--but note that
+  // Deflater may not have consumed all of previous bufferload, in which 
case
+  // userBufLen will be zero
+  if (userBufLen > 0) {
+deflater.setInput(userBuf, userBufOff, userBufLen);
+
+crc.update(userBuf, userBufOff, userBufLen);  // CRC-32 is on 
uncompressed data
+
+currentBufLen = userBufLen;
+userBufOff += userBufLen;
+userBufLen = 0;
+  }
+
+
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  numAvailBytes += deflated;
+  off += 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633761
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 04/Aug/21 18:53
Start Date: 04/Aug/21 18:53
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r682876207



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;
+  private final int GZIP_TRAILER_LEN = 8;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private byte[] userBuf = null;
+  private int userBufOff = 0;
+  private int userBufLen = 0;
+
+  private int headerBytesWritten = 0;
+  private int trailerBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int numAvailBytes = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  if (userBufLen <= 0) {
+return numAvailBytes;
+  }
+
+  int outputHeaderSize = writeHeader(b, off, len);
+  headerBytesWritten += outputHeaderSize;
+
+  // Completes header output.
+  if (headerOff == GZIP_HEADER_LEN) {
+state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM;
+  }
+
+  numAvailBytes += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return numAvailBytes;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // hand off user data (or what's left of it) to Deflater--but note that
+  // Deflater may not have consumed all of previous bufferload, in which 
case
+  // userBufLen will be zero
+  if (userBufLen > 0) {

Review comment:
   Oh, let me see. Yea, seems working. The logic I used is from 
decompressor. For decompressor, it needs to parse the header from user input 
before the inflater can consume user input. So it cannot directly set the input 
to inflater. But for compressor case, we don't need it and can directly set the 
input buffer to the deflater at `setInput`.




-- 
This is an automated message from 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633752
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 04/Aug/21 18:45
Start Date: 04/Aug/21 18:45
Worklog Time Spent: 10m 
  Work Description: viirya commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r682870963



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;
+  private final int GZIP_TRAILER_LEN = 8;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private byte[] userBuf = null;
+  private int userBufOff = 0;
+  private int userBufLen = 0;
+
+  private int headerBytesWritten = 0;
+  private int trailerBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int numAvailBytes = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  if (userBufLen <= 0) {
+return numAvailBytes;
+  }
+
+  int outputHeaderSize = writeHeader(b, off, len);
+  headerBytesWritten += outputHeaderSize;
+
+  // Completes header output.
+  if (headerOff == GZIP_HEADER_LEN) {
+state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM;
+  }
+
+  numAvailBytes += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return numAvailBytes;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // hand off user data (or what's left of it) to Deflater--but note that
+  // Deflater may not have consumed all of previous bufferload, in which 
case
+  // userBufLen will be zero
+  if (userBufLen > 0) {
+deflater.setInput(userBuf, userBufOff, userBufLen);
+
+crc.update(userBuf, userBufOff, userBufLen);  // CRC-32 is on 
uncompressed data
+
+currentBufLen = userBufLen;
+userBufOff += userBufLen;
+userBufLen = 0;
+  }
+
+
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  numAvailBytes += deflated;
+  off += 

[jira] [Work logged] (HADOOP-17825) Add BuiltInGzipCompressor

2021-08-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17825?focusedWorklogId=633745=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633745
 ]

ASF GitHub Bot logged work on HADOOP-17825:
---

Author: ASF GitHub Bot
Created on: 04/Aug/21 18:41
Start Date: 04/Aug/21 18:41
Worklog Time Spent: 10m 
  Work Description: sunchao commented on a change in pull request #3250:
URL: https://github.com/apache/hadoop/pull/3250#discussion_r682868747



##
File path: 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipCompressor.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.io.compress.zlib;
+
+import java.io.IOException;
+import java.util.zip.Checksum;
+import java.util.zip.Deflater;
+import java.util.zip.GZIPOutputStream;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.compress.Compressor;
+import org.apache.hadoop.io.compress.DoNotPool;
+import org.apache.hadoop.util.DataChecksum;
+
+/**
+ * A {@link Compressor} based on the popular gzip compressed file format.
+ * http://www.gzip.org/
+ */
+@DoNotPool
+public class BuiltInGzipCompressor implements Compressor {
+
+  /**
+   * Fixed ten-byte gzip header. See {@link GZIPOutputStream}'s source for
+   * details.
+   */
+  private static final byte[] GZIP_HEADER = new byte[]{
+  0x1f, (byte) 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+
+  private byte[] GZIP_TRAILER = new byte[]{0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00};
+
+  private final int GZIP_HEADER_LEN = 10;
+  private final int GZIP_TRAILER_LEN = 8;
+
+  private Deflater deflater;
+
+  private int headerOff = 0;
+  private int trailerOff = 0;
+
+  private byte[] userBuf = null;
+  private int userBufOff = 0;
+  private int userBufLen = 0;
+
+  private int headerBytesWritten = 0;
+  private int trailerBytesWritten = 0;
+
+  private int currentBufLen = 0;
+
+  private final Checksum crc = DataChecksum.newCrc32();
+
+  private BuiltInGzipDecompressor.GzipStateLabel state;
+
+  public BuiltInGzipCompressor(Configuration conf) { init(conf); }
+
+  @Override
+  public boolean finished() {
+// Only if the trailer is also written, it is thought as finished.
+return deflater.finished() && state == 
BuiltInGzipDecompressor.GzipStateLabel.FINISHED;
+  }
+
+  @Override
+  public boolean needsInput() {
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  return deflater.needsInput();
+}
+
+return (state != BuiltInGzipDecompressor.GzipStateLabel.FINISHED);
+  }
+
+  @Override
+  public int compress(byte[] b, int off, int len) throws IOException {
+int numAvailBytes = 0;
+
+// If we are not within uncompressed data yet, output the header.
+if (state != BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  if (userBufLen <= 0) {
+return numAvailBytes;
+  }
+
+  int outputHeaderSize = writeHeader(b, off, len);
+  headerBytesWritten += outputHeaderSize;
+
+  // Completes header output.
+  if (headerOff == GZIP_HEADER_LEN) {
+state = BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM;
+  }
+
+  numAvailBytes += outputHeaderSize;
+
+  if (outputHeaderSize == len) {
+return numAvailBytes;
+  }
+
+  off += outputHeaderSize;
+  len -= outputHeaderSize;
+}
+
+if (state == BuiltInGzipDecompressor.GzipStateLabel.INFLATE_STREAM) {
+  // hand off user data (or what's left of it) to Deflater--but note that
+  // Deflater may not have consumed all of previous bufferload, in which 
case
+  // userBufLen will be zero
+  if (userBufLen > 0) {
+deflater.setInput(userBuf, userBufOff, userBufLen);
+
+crc.update(userBuf, userBufOff, userBufLen);  // CRC-32 is on 
uncompressed data
+
+currentBufLen = userBufLen;
+userBufOff += userBufLen;
+userBufLen = 0;
+  }
+
+
+  // now compress it into b[]
+  int deflated = deflater.deflate(b, off, len);
+
+  numAvailBytes += deflated;
+  off += 

  1   2   >