[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386009283 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386008175 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386007506 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386008180 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386009385 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386008731 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-02-28 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386007699 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java

[jira] [Commented] (HUDI-432) Benchmark HFile for scan vs seek

2020-02-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048212#comment-17048212 ] Vinoth Chandar commented on HUDI-432: - [~shivnarayan] even for 100K entries why is scan faster than

[GitHub] [incubator-hudi] vinothchandar commented on issue #1359: [SUPPORT] handle partition value containing colon ?

2020-02-28 Thread GitBox
vinothchandar commented on issue #1359: [SUPPORT] handle partition value containing colon ? URL: https://github.com/apache/incubator-hudi/issues/1359#issuecomment-592893616 @tooptoop4 can you please provide a snippet to reproduce this? I can look into this then

[GitHub] [incubator-hudi] lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592880451 Also, the compressed byte[] data seems bigger than original one. ``` test random keys

[GitHub] [incubator-hudi] nsivabalan commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
nsivabalan commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592868794 I also played with testing the sizes. Looks like the encoding is the culprit. test

[GitHub] [incubator-hudi] nsivabalan commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
nsivabalan commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592861991 Good point @lamber-ken on avoiding conversions.

[GitHub] [incubator-hudi] nsivabalan commented on issue #1363: [HUDI-647] Change community.html page with new PPMC/committers

2020-02-28 Thread GitBox
nsivabalan commented on issue #1363: [HUDI-647] Change community.html page with new PPMC/committers URL: https://github.com/apache/incubator-hudi/pull/1363#issuecomment-592857747 thanks ! This is an automated message from

[GitHub] [incubator-hudi] lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592855749 Hi @bvaradar, I test the pr, seems that the size of compressed bigger than original one.

[GitHub] [incubator-hudi] nsivabalan commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-28 Thread GitBox
nsivabalan commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-592854339 One question about using nested schema. Can you remind me what happens if someone passes in a nested schema for

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-28 Thread GitBox
nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r385998155 ## File path:

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-28 Thread GitBox
nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r385998334 ## File path:

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-28 Thread GitBox
nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r385997582 ## File path:

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-28 Thread GitBox
nsivabalan commented on a change in pull request #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer URL: https://github.com/apache/incubator-hudi/pull/1165#discussion_r385997241 ## File path:

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #202

2020-02-28 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.34 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[jira] [Commented] (HUDI-649) Address code style issues in hudi-client package

2020-02-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048127#comment-17048127 ] Vinoth Chandar commented on HUDI-649: - Lets make this more specific? I dont believe in sweeping

[jira] [Updated] (HUDI-649) Address code style issues in hudi-client package

2020-02-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-649: --- Summary: Address code style issues in hudi-client package (was: Code cleanup in hudi-client package) >

[jira] [Updated] (HUDI-649) Code cleanup in hudi-client package

2020-02-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-649: Status: Open (was: New) > Code cleanup in hudi-client package > ---

[jira] [Commented] (HUDI-649) Code cleanup in hudi-client package

2020-02-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048123#comment-17048123 ] Ethan Guo commented on HUDI-649: [~vinoth] ^ > Code cleanup in hudi-client package >

[jira] [Created] (HUDI-649) Code cleanup in hudi-client package

2020-02-28 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-649: -- Summary: Code cleanup in hudi-client package Key: HUDI-649 URL: https://issues.apache.org/jira/browse/HUDI-649 Project: Apache Hudi (incubating) Issue Type: Improvement

[jira] [Commented] (HUDI-635) MergeHandle's DiskBasedMap entries can be thinner

2020-02-28 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048052#comment-17048052 ] lamber-ken commented on HUDI-635: - I got it thank you. > MergeHandle's DiskBasedMap entries can be thinner

[GitHub] [incubator-hudi] smarthi commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-02-28 Thread GitBox
smarthi commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files URL: https://github.com/apache/incubator-hudi/pull/1354#issuecomment-592773113 > Mostly, what you need for your notice should be: > > ```

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592649734 Hi @vinothchandar > we can only place strings inside the parquet footers

[GitHub] [incubator-hudi] yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-02-28 Thread GitBox
yihua commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r385909221 ## File path:

[GitHub] [incubator-hudi] tooptoop4 commented on issue #1359: [SUPPORT] handle partition value containing colon ?

2020-02-28 Thread GitBox
tooptoop4 commented on issue #1359: [SUPPORT] handle partition value containing colon ? URL: https://github.com/apache/incubator-hudi/issues/1359#issuecomment-592671073 it is timestamp data This is an automated message from

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592649734 Hi @vinothchandar > we can only place strings inside the parquet footers

[GitHub] [incubator-hudi] lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592649734 Hi @vinothchandar > we can only place strings inside the parquet footers Right, I

[jira] [Commented] (HUDI-635) MergeHandle's DiskBasedMap entries can be thinner

2020-02-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047814#comment-17047814 ] Vinoth Chandar commented on HUDI-635: - Its RFC-13! > MergeHandle's DiskBasedMap entries can be

[GitHub] [incubator-hudi] vinothchandar commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
vinothchandar commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592610993 @lamber-ken we can only place strings inside the parquet footers

[jira] [Created] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2020-02-28 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-648: --- Summary: Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes Key: HUDI-648 URL: https://issues.apache.org/jira/browse/HUDI-648

[GitHub] [incubator-hudi] vinothchandar commented on issue #1359: [SUPPORT] handle partition value containing colon ?

2020-02-28 Thread GitBox
vinothchandar commented on issue #1359: [SUPPORT] handle partition value containing colon ? URL: https://github.com/apache/incubator-hudi/issues/1359#issuecomment-592609201 hdfs paths cannot contain `:` I think.. is your partitionpath correct.. seems too granular?

[GitHub] [incubator-hudi] vinothchandar commented on issue #1242: [HUDI-544] Archived commits command code cleanup

2020-02-28 Thread GitBox
vinothchandar commented on issue #1242: [HUDI-544] Archived commits command code cleanup URL: https://github.com/apache/incubator-hudi/pull/1242#issuecomment-592608211 @n3nash is shepherding this. This is an automated

[GitHub] [incubator-hudi] vinothchandar commented on issue #1350: [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java

2020-02-28 Thread GitBox
vinothchandar commented on issue #1350: [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java URL: https://github.com/apache/incubator-hudi/pull/1350#issuecomment-592587223 Okay lets watch more. This is so good to get this integrated in. :) Can you also

[GitHub] [incubator-hudi] lresende commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-02-28 Thread GitBox
lresende commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files URL: https://github.com/apache/incubator-hudi/pull/1354#issuecomment-592581418 Mostly, what you need for your notice should be: ``` Apache

[jira] [Commented] (HUDI-553) Building/Running Hudi on higher java versions

2020-02-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047764#comment-17047764 ] Vinoth Chandar commented on HUDI-553: - You got it! > Building/Running Hudi on higher java versions >

[jira] [Assigned] (HUDI-553) Building/Running Hudi on higher java versions

2020-02-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-553: --- Assignee: lamber-ken > Building/Running Hudi on higher java versions >

[jira] [Commented] (HUDI-553) Building/Running Hudi on higher java versions

2020-02-28 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047681#comment-17047681 ] lamber-ken commented on HUDI-553: - Hi [~rxu], [~vinoth] willing to drive it ;) > Building/Running Hudi on

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592397157 Hi @bvaradar, the idea of compressing strings is great, just considering: Call

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592397157 Hi @bvaradar, the idea of compressing strings is great, just thinking: Call

[GitHub] [incubator-hudi] codecov-io commented on issue #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns

2020-02-28 Thread GitBox
codecov-io commented on issue #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns URL: https://github.com/apache/incubator-hudi/pull/1330#issuecomment-592472336 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1330?src=pr=h1) Report

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592397157 Hi @bvaradar, Call time line will be: `byte[]` -> `base64 String` -> `gzip stream`

[GitHub] [incubator-hudi] lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet

2020-02-28 Thread GitBox
lamber-ken commented on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592397157 Hi @bvaradar, Call time line will be: `byte[]` -> `base64 String` -> `gzip stream` ->