[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1449: [WIP][HUDI-698]Add unit test for CleansCommand

2020-04-07 Thread GitBox
yanghua commented on a change in pull request #1449: [WIP][HUDI-698]Add unit test for CleansCommand URL: https://github.com/apache/incubator-hudi/pull/1449#discussion_r405274332 ## File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestCleansCommand.java ##

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil

2020-04-07 Thread GitBox
yanghua commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r405272257 ## File path:

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1495: [HUDI-770] Organize upsert/insert API implementation under a single package

2020-04-07 Thread GitBox
codecov-io edited a comment on issue #1495: [HUDI-770] Organize upsert/insert API implementation under a single package URL: https://github.com/apache/incubator-hudi/pull/1495#issuecomment-610761048 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1495?src=pr=h1) Report >

[GitHub] [incubator-hudi] codecov-io commented on issue #1495: [HUDI-770] Organize upsert/insert API implementation under a single package

2020-04-07 Thread GitBox
codecov-io commented on issue #1495: [HUDI-770] Organize upsert/insert API implementation under a single package URL: https://github.com/apache/incubator-hudi/pull/1495#issuecomment-610761048 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1495?src=pr=h1) Report > Merging

[jira] [Assigned] (HUDI-684) Introduce abstraction for writing and reading and compacting from FileGroups

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-684: --- Assignee: (was: Vinoth Chandar) > Introduce abstraction for writing and reading and

[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2020-04-07 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-767: Status: Open (was: New) > Support transformation when export to Hudi >

[jira] [Assigned] (HUDI-767) Support transformation when export to Hudi

2020-04-07 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-767: --- Assignee: Raymond Xu > Support transformation when export to Hudi >

[GitHub] [incubator-hudi] xushiyan commented on issue #1480: [SUPPORT] Backwards Incompatible Schema Evolution

2020-04-07 Thread GitBox
xushiyan commented on issue #1480: [SUPPORT] Backwards Incompatible Schema Evolution URL: https://github.com/apache/incubator-hudi/issues/1480#issuecomment-610745060 @bvaradar Yes, I marked 767 for 0.6.0. I'll put 768 on waiting list at the moment 

[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2020-04-07 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-767: Fix Version/s: 0.6.0 > Support transformation when export to Hudi >

[jira] [Updated] (HUDI-425) Implement support for bootstrapping in HoodieDeltaStreamer

2020-04-07 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-425: Labels: help-wanted (was: ) > Implement support for bootstrapping in HoodieDeltaStreamer >

[jira] [Assigned] (HUDI-558) Introduce ability to compress bloom filters while storing in parquet

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-558: --- Assignee: (was: Balaji Varadarajan) > Introduce ability to compress bloom filters while

[jira] [Updated] (HUDI-289) Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-289: Priority: Blocker (was: Major) > Implement a test suite to support long running test for Hudi

[jira] [Resolved] (HUDI-132) Automate doc update/deploy process

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-132. - Resolution: Duplicate > Automate doc update/deploy process > -- >

[jira] [Updated] (HUDI-651) Incremental Query on Hive via Spark SQL does not return expected results

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-651: Priority: Blocker (was: Major) > Incremental Query on Hive via Spark SQL does not return expected

[jira] [Updated] (HUDI-407) Implement a join-based index

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-407: Priority: Blocker (was: Major) > Implement a join-based index > > >

[jira] [Updated] (HUDI-686) Implement BloomIndexV2 that does not depend on memory caching

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-686: Priority: Blocker (was: Major) > Implement BloomIndexV2 that does not depend on memory caching >

[jira] [Updated] (HUDI-408) [Umbrella] Refactor/Code clean up hoodie write client

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-408: Priority: Blocker (was: Critical) > [Umbrella] Refactor/Code clean up hoodie write client >

[jira] [Updated] (HUDI-558) Introduce ability to compress bloom filters while storing in parquet

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-558: Priority: Blocker (was: Major) > Introduce ability to compress bloom filters while storing in

[jira] [Assigned] (HUDI-686) Implement BloomIndexV2 that does not depend on memory caching

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-686: --- Assignee: lamber-ken (was: Vinoth Chandar) > Implement BloomIndexV2 that does not depend on

[jira] [Resolved] (HUDI-288) Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-288. - Resolution: Fixed > Add support for ingesting multiple kafka streams in a single DeltaStreamer >

[jira] [Updated] (HUDI-242) Support Efficient bootstrap of large parquet datasets to Hudi

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-242: Priority: Blocker (was: Major) > Support Efficient bootstrap of large parquet datasets to Hudi >

[jira] [Updated] (HUDI-408) [Umbrella] Refactor/Code clean up hoodie write client

2020-04-07 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-408: Priority: Critical (was: Major) > [Umbrella] Refactor/Code clean up hoodie write client >

[jira] [Updated] (HUDI-770) Organize ingest API implementation under a single package

2020-04-07 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-770: Labels: pull-request-available (was: ) > Organize ingest API implementation under a single package

[GitHub] [incubator-hudi] bvaradar opened a new pull request #1495: [HUDI-770] Organize upsert/insert API implementation under a single package

2020-04-07 Thread GitBox
bvaradar opened a new pull request #1495: [HUDI-770] Organize upsert/insert API implementation under a single package URL: https://github.com/apache/incubator-hudi/pull/1495 [HUDI-770] Organize upsert/insert API implementation under a single package

[jira] [Created] (HUDI-770) Organize ingest API implementation under a single package

2020-04-07 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-770: --- Summary: Organize ingest API implementation under a single package Key: HUDI-770 URL: https://issues.apache.org/jira/browse/HUDI-770 Project: Apache Hudi

[jira] [Assigned] (HUDI-770) Organize ingest API implementation under a single package

2020-04-07 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-770: --- Assignee: Balaji Varadarajan > Organize ingest API implementation under a single

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #241

2020-04-07 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.41 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [incubator-hudi] garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer

2020-04-07 Thread GitBox
garyli1019 commented on issue #1486: [HUDI-759] Integrate checkpoint privoder with delta streamer URL: https://github.com/apache/incubator-hudi/pull/1486#issuecomment-610701882 Add https://github.com/apache/incubator-hudi/pull/1493 into this PR.

[GitHub] [incubator-hudi] garyli1019 closed pull request #1493: [MINOR] remove Hive dependency from delta streamer

2020-04-07 Thread GitBox
garyli1019 closed pull request #1493: [MINOR] remove Hive dependency from delta streamer URL: https://github.com/apache/incubator-hudi/pull/1493 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [incubator-hudi] garyli1019 commented on issue #1493: [MINOR] remove Hive dependency from delta streamer

2020-04-07 Thread GitBox
garyli1019 commented on issue #1493: [MINOR] remove Hive dependency from delta streamer URL: https://github.com/apache/incubator-hudi/pull/1493#issuecomment-610700940 combine with https://github.com/apache/incubator-hudi/pull/1486

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1488: [SUPPORT] Hudi table has only five rows when record key is binary

2020-04-07 Thread GitBox
lamber-ken edited a comment on issue #1488: [SUPPORT] Hudi table has only five rows when record key is binary URL: https://github.com/apache/incubator-hudi/issues/1488#issuecomment-610681461 hi @jvaesteves > the partition name is /18228, is this the expected behaviour? it's

[GitHub] [incubator-hudi] lamber-ken closed issue #1375: [SUPPORT] HoodieDeltaStreamer offset not handled correctly

2020-04-07 Thread GitBox
lamber-ken closed issue #1375: [SUPPORT] HoodieDeltaStreamer offset not handled correctly URL: https://github.com/apache/incubator-hudi/issues/1375 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [incubator-hudi] lamber-ken commented on issue #1488: [SUPPORT] Hudi table has only five rows when record key is binary

2020-04-07 Thread GitBox
lamber-ken commented on issue #1488: [SUPPORT] Hudi table has only five rows when record key is binary URL: https://github.com/apache/incubator-hudi/issues/1488#issuecomment-610681802

[GitHub] [incubator-hudi] lamber-ken commented on issue #1488: [SUPPORT] Hudi table has only five rows when record key is binary

2020-04-07 Thread GitBox
lamber-ken commented on issue #1488: [SUPPORT] Hudi table has only five rows when record key is binary URL: https://github.com/apache/incubator-hudi/issues/1488#issuecomment-610681461 hi @jvaesteves > the partition name is /18228, is this the expected behaviour? it's not the

[jira] [Updated] (HUDI-769) Write blog about HoodieMultiTableDeltaStreamer in cwiki

2020-04-07 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-769: Description: (was: Relevant Section : 

[jira] [Created] (HUDI-769) Write blog about HoodieMultiTableDeltaStreamer in cwiki

2020-04-07 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-769: --- Summary: Write blog about HoodieMultiTableDeltaStreamer in cwiki Key: HUDI-769 URL: https://issues.apache.org/jira/browse/HUDI-769 Project: Apache Hudi

[jira] [Updated] (HUDI-766) Update Apache Hudi website with usage info about HoodieMultiTableDeltaStreamer

2020-04-07 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-766: Status: Open (was: New) > Update Apache Hudi website with usage info about

[incubator-hudi] branch master updated: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment (#1150)

2020-04-07 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new d610252 [HUDI-288]: Add support for

[GitHub] [incubator-hudi] bvaradar merged pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread GitBox
bvaradar merged pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment URL: https://github.com/apache/incubator-hudi/pull/1150 This is an automated message from

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread GitBox
codecov-io edited a comment on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment URL: https://github.com/apache/incubator-hudi/pull/1150#issuecomment-605617268 #

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-04-07 Thread GitBox
satishkotha commented on a change in pull request #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction URL: https://github.com/apache/incubator-hudi/pull/1396#discussion_r405144082 ## File path:

[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-04-07 Thread GitBox
satishkotha commented on a change in pull request #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction URL: https://github.com/apache/incubator-hudi/pull/1396#discussion_r405144082 ## File path:

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
lamber-ken edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610635678 > Can you give this a shot on a cluster? btw, what @vinothchandar wants to say is that run your snippet

[GitHub] [incubator-hudi] lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610635678 > Can you give this a shot on a cluster? more, what @vinothchandar wants to say is that run your snippet code on

[GitHub] [incubator-hudi] lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610630535 ![image](https://user-images.githubusercontent.com/20113411/78721369-fc4a4f00-7959-11ea-8fa2-340717c3a233.png)

[GitHub] [incubator-hudi] lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
lamber-ken commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610626104 hi @tverdokhlebd, thanks your detailed spark log, from your description and dataset, key information - run on local

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-07 Thread GitBox
vinothchandar commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r405076523 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieSimpleIndex.java ## @@

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-07 Thread GitBox
nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r405068106 ## File path: hudi-client/src/test/java/org/apache/hudi/TestHoodieClientOnCopyOnWriteStorage.java ##

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-07 Thread GitBox
nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r405067795 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieSimpleIndex.java ## @@ -0,0

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610580236 > Can you give this a shot on a cluster? Do you mean access to the cluster? Those steps also were reproducing on

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610580236 > Can you give this a shot on a cluster? Do you mean access to the cluster? Those steps also were

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-07 Thread GitBox
nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r405053500 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieSimpleIndex.java ## @@ -0,0

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-07 Thread GitBox
nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r405053129 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieSimpleIndex.java ## @@ -0,0

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread GitBox
pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r405053101 ## File path:

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-07 Thread GitBox
nsivabalan commented on a change in pull request #1402: [WIP][HUDI-407] Adding Simple Index URL: https://github.com/apache/incubator-hudi/pull/1402#discussion_r405052805 ## File path: hudi-client/src/main/java/org/apache/hudi/index/HoodieIndex.java ## @@ -77,15 +80,15 @@

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: ` sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610564674 Code: sparkSession .read .jdbc( url = jdbcConfig.url, table = table,

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread GitBox
pratyakshsharma commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r405005385 ## File path:

[jira] [Commented] (HUDI-105) DeltaStreamer Kafka Ingestion does not handle invalid offsets

2020-04-07 Thread Hoang Ngo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077475#comment-17077475 ] Hoang Ngo commented on HUDI-105: Hi [~vinoth], Do you know if this fix is applied in hudi 0.5.0? I have

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread GitBox
bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r404990360 ## File path:

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-04-07 Thread GitBox
bvaradar commented on a change in pull request #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment URL: https://github.com/apache/incubator-hudi/pull/1150#discussion_r404992549 ## File path:

[GitHub] [incubator-hudi] vinothchandar commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
vinothchandar commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610504319 Is this real data or can you share a reproducible snippet of code? Especially with these local microbenchmarks, its

[GitHub] [incubator-hudi] hddong commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil

2020-04-07 Thread GitBox
hddong commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster and code clean for SparkUtil URL: https://github.com/apache/incubator-hudi/pull/1452#issuecomment-610457107 > again, is this PR not only for `clean` command? Sorry for change it late :)

[jira] [Updated] (HUDI-740) Fix can not specify the sparkMaster and code clean for SparkUtil

2020-04-07 Thread hong dongdong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated HUDI-740: --- Summary: Fix can not specify the sparkMaster and code clean for SparkUtil (was: [HUDI-740]Fix can not

[GitHub] [incubator-hudi] hddong commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of clean and compact commands

2020-04-07 Thread GitBox
hddong commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of clean and compact commands URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r404900670 ## File path:

[jira] [Commented] (HUDI-69) Support realtime view in Spark datasource #136

2020-04-07 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077314#comment-17077314 ] Bhavani Sudha commented on HUDI-69: --- [~garyli1019] Yes the InputPathHandler will be able to provide MOR

[GitHub] [incubator-hudi] vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-04-07 Thread GitBox
vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi. URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-610432304 TestMergeOnReadTable or TestClientCopyOnWriteStorage etc that will do a full upsert dag for cow and mor

[jira] [Updated] (HUDI-740) [HUDI-740]Fix can not specify the sparkMaster of clean and compact commands

2020-04-07 Thread hong dongdong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated HUDI-740: --- Summary: [HUDI-740]Fix can not specify the sparkMaster of clean and compact commands (was: Fix can

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610307143 So, the process took 2h 40m (local[4] and driver memory 10gb) and thrown "java.lang.OutOfMemoryError: GC

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 53M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610307143 So, the process took 2h 40m and thrown "java.lang.OutOfMemoryError: GC overhead limit exceeded". Log

[GitHub] [incubator-hudi] yanghua commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-04-07 Thread GitBox
yanghua commented on issue #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command URL: https://github.com/apache/incubator-hudi/pull/1452#issuecomment-610298785 again, is this PR not only for `clean` command?

[GitHub] [incubator-hudi] hddong commented on issue #1490: [HUDI-700]Add unit test for FileSystemViewCommand

2020-04-07 Thread GitBox
hddong commented on issue #1490: [HUDI-700]Add unit test for FileSystemViewCommand URL: https://github.com/apache/incubator-hudi/pull/1490#issuecomment-610298123 @yanghua @vinothchandar please have a review. This is an

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-04-07 Thread GitBox
yanghua commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r404690620 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/utils/SparkUtil.java

[GitHub] [incubator-hudi] hddong commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command

2020-04-07 Thread GitBox
hddong commented on a change in pull request #1452: [HUDI-740]Fix can not specify the sparkMaster of cleans run command URL: https://github.com/apache/incubator-hudi/pull/1452#discussion_r404685892 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/utils/SparkUtil.java

[GitHub] [incubator-hudi] lamber-ken commented on issue #143: Tracking ticket for folks to be added to slack group

2020-04-07 Thread GitBox
lamber-ken commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-610257073 hi @malanb5 @tverdokhlebd, done and welcome : ) This is

[GitHub] [incubator-hudi] malanb5 commented on issue #143: Tracking ticket for folks to be added to slack group

2020-04-07 Thread GitBox
malanb5 commented on issue #143: Tracking ticket for folks to be added to slack group URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-610243459 Please add me too mala...@gmail.com This is an automated

[GitHub] [incubator-hudi] tverdokhlebd removed a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records

2020-04-07 Thread GitBox
tverdokhlebd removed a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610221356 Tried to set this config: - local[4] - driver memory 12GB - driver memoryOverhead 2048

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1493: [MINOR] remove Hive dependency from delta streamer

2020-04-07 Thread GitBox
codecov-io edited a comment on issue #1493: [MINOR] remove Hive dependency from delta streamer URL: https://github.com/apache/incubator-hudi/pull/1493#issuecomment-610227776 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1493?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] codecov-io commented on issue #1493: [MINOR] remove Hive dependency from delta streamer

2020-04-07 Thread GitBox
codecov-io commented on issue #1493: [MINOR] remove Hive dependency from delta streamer URL: https://github.com/apache/incubator-hudi/pull/1493#issuecomment-610227776 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1493?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records

2020-04-07 Thread GitBox
tverdokhlebd edited a comment on issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610221356 Tried to set this config: - local[4] - driver memory 12GB - driver memoryOverhead 2048 And

[GitHub] [incubator-hudi] tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records

2020-04-07 Thread GitBox
tverdokhlebd commented on issue #1491: [SUPPORT] OutOfMemoryError during upsert 30M records URL: https://github.com/apache/incubator-hudi/issues/1491#issuecomment-610221356 Tried to set this config: - local[4] - driver memory 12GB - driver memoryOverhead 2048 And

[incubator-hudi] branch hudi_test_suite_refactor updated (29b4fdf -> 3e2e710)

2020-04-07 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. from 29b4fdf [HUDI-394] Provide a basic implementation of test suite add 3e2e710 Fix

[GitHub] [incubator-hudi] n3nash merged pull request #1494: Fix Compilation Issues + Port Bug Fixes

2020-04-07 Thread GitBox
n3nash merged pull request #1494: Fix Compilation Issues + Port Bug Fixes URL: https://github.com/apache/incubator-hudi/pull/1494 This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [incubator-hudi] modi95 opened a new pull request #1494: Fix Compilation Issues + Port Bug Fixes

2020-04-07 Thread GitBox
modi95 opened a new pull request #1494: Fix Compilation Issues + Port Bug Fixes URL: https://github.com/apache/incubator-hudi/pull/1494 ## What is the purpose of the pull request - Port bug fixes to test suite - Additional features to test suite will be added in a separate PR -

[GitHub] [incubator-hudi] loagosad commented on issue #1438: How to get the file name corresponding to HoodieKey through the GlobalBloomIndex

2020-04-07 Thread GitBox
loagosad commented on issue #1438: How to get the file name corresponding to HoodieKey through the GlobalBloomIndex URL: https://github.com/apache/incubator-hudi/issues/1438#issuecomment-610196060 @nsivabalan I have committed the records before read. In the test code, i just using