[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626432#comment-17626432 ] ASF GitHub Bot commented on HADOOP-13126: - ibobak commented on PR #2723: URL: https://github.com/apache/hadoop/pull/2723#issuecomment-1296629761 Update is big. I am now testing my version of the codec in my organzation, until I am sure that it works fine and without memory leaks, I won't post a PR. I need a little bit more time. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625553#comment-17625553 ] ASF GitHub Bot commented on HADOOP-13126: - martin-g commented on PR #2723: URL: https://github.com/apache/hadoop/pull/2723#issuecomment-1294761827 @ibobak If the change is small you can also tell me what to change and I can update this PR. But it seems there is no much interest in having BrotliCodec. This PR is opened for almost 2 years ... > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625537#comment-17625537 ] ASF GitHub Bot commented on HADOOP-13126: - ibobak commented on PR #2723: URL: https://github.com/apache/hadoop/pull/2723#issuecomment-1294738611 Colleagues, I've taken the source code from this commit https://github.com/apache/hadoop/pull/2723/commits/47f05930c2f5c576a6c25238c187bdf3409b8f23 made a jar of it, plugged it into my Spark cluster, launched a huge job with many transformations and actions, and found that there is a serious memory leak: executors consume RAM more and more (no matter that there is a limitation of 20GB, they consumed 40GB). I've made my own version of Brotli codec (also based on brotli4j) by looking at how Snappy and others are made, and it works with no memory leaks. Soon I'll post my PR. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558037#comment-17558037 ] Martin Tzvetanov Grigorov commented on HADOOP-13126: [~ste...@apache.org] Rebased my branch to latest trunk. Hopefully Hadoop QA will like it this time. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 20m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554031#comment-17554031 ] Steve Loughran commented on HADOOP-13126: - it checks prs...make sure yours is the latest > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 20m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291446#comment-17291446 ] Martin Tzvetanov Grigorov commented on HADOOP-13126: I am not sure why Hadoop QA says that the Pull Request does not apply to trunk. GitHub UI says: {code:java} This branch has no conflicts with the base branch Only those with write access to this repository can merge pull requests. {code} > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 10m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290980#comment-17290980 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 43s{color} | {color:red}{color} | {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HADOOP-Build/157/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > Time Spent: 10m > Remaining Estimate: 0h > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389593#comment-16389593 ] genericqa commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14270/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389588#comment-16389588 ] Steve Loughran commented on HADOOP-13126: - Not noticed this before. It would seem OK for Hadoop 3.2/2.10+; too late for the 3.1. Would probably need some more tests. Maybe even adding a test resource with a brotli compressed file as the reference; all a round trip does is verify that you can round trip "something", not that the compressor is working > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue >Priority: Major > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295420#comment-16295420 ] Lee Blum commented on HADOOP-13126: --- [~rdblue] that's great! Thanks! > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295256#comment-16295256 ] Ryan Blue commented on HADOOP-13126: Support for brotli, zstd, and lz4 is in Parquet master. It will be out in the next release. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295142#comment-16295142 ] Lee Blum commented on HADOOP-13126: --- [~rdblue] joining as well with our interest in this feature. Can we expect in what Parquet version will it be available? Brotli demonstrates supreme compression rates with low CPU consumption, and that can benefit a lot of users. We know that it will benefit our use case as well. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281358#comment-16281358 ] Carlo Alberto Ferraris commented on HADOOP-13126: - [~ajisakaa] or [~steve_l] maybe? > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280469#comment-16280469 ] Ryan Blue commented on HADOOP-13126: I'm happy to if there is interest from reviewers. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279641#comment-16279641 ] genericqa commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13788/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279637#comment-16279637 ] Carlo Alberto Ferraris commented on HADOOP-13126: - [~rdblue] do you have any plans to continue working on this? We have workloads that would benefit from brotli (especially if the codec supported/exposed the higher compression levels) > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160735#comment-16160735 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13234/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946171#comment-15946171 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HADOOP-13126 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11966/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550733#comment-15550733 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 17s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 2s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 31s{color} | {color:orange} root: The patch generated 6 new + 0 unchanged - 0 fixed = 6 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 12s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 29s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}204m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNestedEncryptionZones | | | hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer | | | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12831838/HADOOP-13126.5.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux 3b1df26d9d01 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550286#comment-15550286 ] Ryan Blue commented on HADOOP-13126: Brotli compression isn't splittable, but can be used with Hadoop-friendly container formats like Parquet. Using those formats is a best practice anyway, so it shouldn't matter that you can't easily split files when you use Brotli as an outer wrapper. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch, HADOOP-13126.5.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525169#comment-15525169 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 35s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 30s{color} | {color:orange} root: The patch generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 1s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 13s{color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 29s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}187m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.server.namenode.TestDeleteRace | | | hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices | | | hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HADOOP-13126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12810425/HADOOP-13126.4.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux a2779a84f2de 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GN
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524717#comment-15524717 ] Andre commented on HADOOP-13126: [~rdblue] would you know by any chance if the format is splittable when used in hadoop? > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15331322#comment-15331322 ] Akira AJISAKA commented on HADOOP-13126: Applied the patch and created binary distro. It includes jbrotli-native-linux-x86-amd64-0.5.0.jar and libbrotli.so is included in the jar file, so I'm thinking we should add the following to NOTICE.txt. {noformat} This product optionally depends on 'brotli', a compression and decompression library, which can be obtained at: * LICENSE: * license/LICENSE.brotli.txt (MIT License) * HOMEPAGE: * https://github.com/google/brotli {noformat} > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch, HADOOP-13126.4.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330074#comment-15330074 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 38s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 35s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 20s{color} | {color:red} root: The patch generated 5 new + 0 unchanged - 0 fixed = 5 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 8m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 20s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestAsyncHDFSWithHA | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12810425/HADOOP-13126.4.patch | | JIRA Issue | HADOOP-13126 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux f2515757a106 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8e8cb4c | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328821#comment-15328821 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 39s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 30s{color} | {color:red} root: The patch generated 20 new + 0 unchanged - 0 fixed = 20 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 20 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 31s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 38s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}168m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport | | | hadoop.hdfs.server.datanode.TestFsDatasetCache | | | hadoop.yarn.server.nodemanager.containermanager.queuing.TestQueuingContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12810161/HADOOP-13126.3.patch | | JIRA Issue | HADOOP-13126 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux 83ff71125c75 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/p
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328634#comment-15328634 ] Ryan Blue commented on HADOOP-13126: I'm attaching a new patch that depends on jbrotli 0.5.0. That version fixes the issue I noted above where Brotli doesn't consume all of its input. The new patch also adds BrotliCodec to the codec service loader so it is available automatically when needed to read .br files. We've been testing this code for a couple weeks and it seems to be working and stable. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328619#comment-15328619 ] Ryan Blue commented on HADOOP-13126: I'm attaching a new version of this patch that depends on [~marki]'s 0.5.0 release. That fixes the bug I noted above where Brotli doesn't consume all of the input buffer. This also adds BrotliCodec to the codec service loader and tests that it is loaded correctly. We've been running tests on this code for a few weeks and it appears to be stable. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch, > HADOOP-13126.3.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287510#comment-15287510 ] Martin W. Kirst commented on HADOOP-13126: -- [~rdblue] Great that you take care of adopting brotli into Hadoop. A small side note: I filled one minor issue for brotli itself, which you may concern, when thinking about backing a release. See https://github.com/google/brotli/issues/346 The good news are, the comments implies that a fix will be available soon. I will take care on that and adopt it ASAP. Let's rock this :-) > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287490#comment-15287490 ] Martin W. Kirst commented on HADOOP-13126: -- Sure, I will do. I plan to do this by end of this week. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Affects Versions: 2.7.2 >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280995#comment-15280995 ] Tsuyoshi Ozawa commented on HADOOP-13126: - [~rdblue] thank you for the response. The result of the benchmark is interesting to me. Let me review it. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch, HADOOP-13126.2.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280956#comment-15280956 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 14s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 4s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 46s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 43s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 36s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 21s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 53s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 53s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 26s {color} | {color:red} root: The patch generated 28 new + 0 unchanged - 0 fixed = 28 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 36s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 58s {color} | {color:red} root in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 12m 25s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} The patch does not g
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278998#comment-15278998 ] Hadoop QA commented on HADOOP-13126: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 2s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 26s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 39s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 23s {color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 45s {color} | {color:red} root in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 33s {color} | {color:red} root in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 38s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 38s {color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 21s {color} | {color:red} root: The patch generated 28 new + 0 unchanged - 0 fixed = 28 total (was 0) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 1m 47s {color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patched modules with no Java source: hadoop-project . {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 12s {color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 29s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 22s {color} | {color:red} root in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 4m 46s {color}
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278673#comment-15278673 ] Ryan Blue commented on HADOOP-13126: The results above show the comparison with Snappy. The file is less than half the size and compression took about the same amount of time. Comparing to LZ4 would be interesting. It isn't supported by Parquet so it's a bit harder for me to drop into my test case. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278666#comment-15278666 ] Tsuyoshi Ozawa commented on HADOOP-13126: - [~b...@cloudera.com] Thank you for the suggestion. Should we compare with snappy or lz4 codec instead of gzip since these codecs are de fact standard of Hadoop stack? > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278659#comment-15278659 ] Ryan Blue commented on HADOOP-13126: [~andrew.wang], you guys are probably interested in this. > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13126) Add Brotli compression codec
[ https://issues.apache.org/jira/browse/HADOOP-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278653#comment-15278653 ] Ryan Blue commented on HADOOP-13126: [~marki], could you review this patch also? > Add Brotli compression codec > > > Key: HADOOP-13126 > URL: https://issues.apache.org/jira/browse/HADOOP-13126 > Project: Hadoop Common > Issue Type: Improvement > Components: io >Reporter: Ryan Blue >Assignee: Ryan Blue > Attachments: HADOOP-13126.1.patch > > > I've been testing [Brotli|https://github.com/google/brotli/], a new > compression library based on LZ77 from Google. Google's [brotli > benchmarks|https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf] > look really good and we're also seeing a significant improvement in > compression size, compression speed, or both. > {code:title=Brotli preliminary test results} > [blue@work Downloads]$ time parquet from test.parquet -o test.snappy.parquet > --compression-codec snappy --overwrite > real1m17.106s > user1m30.804s > sys 0m4.404s > [blue@work Downloads]$ time parquet from test.parquet -o test.br.parquet > --compression-codec brotli --overwrite > real1m16.640s > user1m24.244s > sys 0m6.412s > [blue@work Downloads]$ time parquet from test.parquet -o test.gz.parquet > --compression-codec gzip --overwrite > real3m39.496s > user3m48.736s > sys 0m3.880s > [blue@work Downloads]$ ls -l > -rw-r--r-- 1 blue blue 1068821936 May 10 11:06 test.br.parquet > -rw-r--r-- 1 blue blue 1421601880 May 10 11:10 test.gz.parquet > -rw-r--r-- 1 blue blue 2265950833 May 10 10:30 test.snappy.parquet > {code} > Brotli, at quality 1, is as fast as snappy and ends up smaller than gzip-9. > Another test resulted in a slightly larger Brotli file than gzip produced, > but Brotli was 4x faster. I'd like to get this compression codec into Hadoop. > [Brotli is licensed with the MIT > license|https://github.com/google/brotli/blob/master/LICENSE], and the [JNI > library jbrotli is > ALv2|https://github.com/MeteoGroup/jbrotli/blob/master/LICENSE]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org