[GitHub] [hudi] hughfdjackson commented on issue #2265: Arrays with nulls in them result in broken parquet files

2021-01-05 Thread GitBox
hughfdjackson commented on issue #2265: URL: https://github.com/apache/hudi/issues/2265#issuecomment-754502750 @umehrot2 Thanks for looking into this - I'm taking a bit of hope from error message of the code you linked ;)

[jira] [Updated] (HUDI-913) Update docs about KeyGenerator

2021-01-05 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-913: -- Status: Open (was: New) > Update docs about KeyGenerator > -- > >

[jira] [Updated] (HUDI-913) Update docs about KeyGenerator

2021-01-05 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-913: -- Fix Version/s: 0.7.0 > Update docs about KeyGenerator > -- > > Key:

[jira] [Closed] (HUDI-913) Update docs about KeyGenerator

2021-01-05 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-913. - Resolution: Done > Update docs about KeyGenerator > -- > > Key:

[GitHub] [hudi] codecov-io edited a comment on pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2374: URL: https://github.com/apache/hudi/pull/2374#issuecomment-750782300 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=h1) Report > Merging [#2374](https://codecov.io/gh/apache/hudi/pull/2374?src=pr=desc) (21792c6) into

[GitHub] [hudi] wangxianghu opened a new pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
wangxianghu opened a new pull request #2405: URL: https://github.com/apache/hudi/pull/2405 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[jira] [Updated] (HUDI-1506) Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1506: -- Description: {code:java} // Caused by: org.apache.spark.SparkException: Job aborted due to stage

[GitHub] [hudi] wangxianghu commented on pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
wangxianghu commented on pull request #2405: URL: https://github.com/apache/hudi/pull/2405#issuecomment-754644106 @yanghua please take a look when free This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] wangxianghu commented on pull request #2404: [MINOR] Add Jira URL and Mailing List

2021-01-05 Thread GitBox
wangxianghu commented on pull request #2404: URL: https://github.com/apache/hudi/pull/2404#issuecomment-754643863 @yanghua please take a look when free This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] SureshK-T2S opened a new issue #2406: [SUPPORT] Deltastreamer - Property hoodie.datasource.write.partitionpath.field not found

2021-01-05 Thread GitBox
SureshK-T2S opened a new issue #2406: URL: https://github.com/apache/hudi/issues/2406 I am attempting to create a hudi table using a parquet file on S3. The motivation for this approach is based on this Hudi blog:

[GitHub] [hudi] codecov-io commented on pull request #2404: [MINOR] Add Jira URL and Mailing List

2021-01-05 Thread GitBox
codecov-io commented on pull request #2404: URL: https://github.com/apache/hudi/pull/2404#issuecomment-754638146 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2404?src=pr=h1) Report > Merging [#2404](https://codecov.io/gh/apache/hudi/pull/2404?src=pr=desc) (fdeb851) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=h1) Report > Merging [#2379](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=desc) (70ffbba) into

[jira] [Created] (HUDI-1506) Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread wangxianghu (Jira)
wangxianghu created HUDI-1506: - Summary: Fix wrong exception thrown in HoodieAvroUtils Key: HUDI-1506 URL: https://issues.apache.org/jira/browse/HUDI-1506 Project: Apache Hudi Issue Type: Bug

[jira] [Updated] (HUDI-1506) Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1506: - Labels: pull-request-available (was: ) > Fix wrong exception thrown in HoodieAvroUtils >

[GitHub] [hudi] wangxianghu opened a new pull request #2404: [MINOR] Add Jira URL and Mailing List

2021-01-05 Thread GitBox
wangxianghu opened a new pull request #2404: URL: https://github.com/apache/hudi/pull/2404 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] yanghua merged pull request #2403: [HUDI-913] Update docs about KeyGenerator

2021-01-05 Thread GitBox
yanghua merged pull request #2403: URL: https://github.com/apache/hudi/pull/2403 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch asf-site updated: [HUDI-913] Update docs about KeyGenerator (#2403)

2021-01-05 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new ee00bd6 [HUDI-913] Update docs about

[GitHub] [hudi] liujinhui1994 closed pull request #2386: [HUDI-1160] Support update partial fields for CoW table

2021-01-05 Thread GitBox
liujinhui1994 closed pull request #2386: URL: https://github.com/apache/hudi/pull/2386 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[hudi] branch asf-site updated: Travis CI build asf-site

2021-01-05 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new f7ca68a Travis CI build asf-site f7ca68a is

[GitHub] [hudi] codecov-io commented on pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
codecov-io commented on pull request #2405: URL: https://github.com/apache/hudi/pull/2405#issuecomment-754665459 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=h1) Report > Merging [#2405](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=desc) (b51e61e) into

[GitHub] [hudi] vinothchandar commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-01-05 Thread GitBox
vinothchandar commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r551618729 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -232,17 +250,18 @@ void

[jira] [Updated] (HUDI-1479) Replace FSUtils.getAllPartitionPaths() with HoodieTableMetadata#getAllPartitionPaths()

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1479: - Description: *Change #1* {code:java} public static List getAllPartitionPaths(FileSystem fs,

[jira] [Updated] (HUDI-1479) Replace FSUtils.getAllPartitionPaths() with HoodieTableMetadata#getAllPartitionPaths()

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1479: - Attachment: image-2021-01-05-10-00-35-187.png > Replace FSUtils.getAllPartitionPaths() with >

[GitHub] [hudi] nsivabalan commented on a change in pull request #2400: [WIP] Some fixes to test suite framework. Adding clustering node

2021-01-05 Thread GitBox
nsivabalan commented on a change in pull request #2400: URL: https://github.com/apache/hudi/pull/2400#discussion_r552072132 ## File path: docker/demo/config/test-suite/complex-dag-cow.yaml ## @@ -14,41 +14,47 @@ # See the License for the specific language governing

[GitHub] [hudi] nsivabalan commented on pull request #2400: Some fixes and enhancements to test suite framework

2021-01-05 Thread GitBox
nsivabalan commented on pull request #2400: URL: https://github.com/apache/hudi/pull/2400#issuecomment-754812328 @n3nash : Patch is ready for review. @satishkotha : I have added clustering node. Do check it out. This is

[jira] [Commented] (HUDI-1459) Support for handling of REPLACE instants

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259074#comment-17259074 ] Vinoth Chandar commented on HUDI-1459: -- [~pwason] [~satishkotha] several users reporting this when

[jira] [Updated] (HUDI-1479) Replace FSUtils.getAllPartitionPaths() with HoodieTableMetadata#getAllPartitionPaths()

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1479: - Description: *Change #1* {code:java} public static List getAllPartitionPaths(FileSystem fs,

[jira] [Commented] (HUDI-1308) Issues found during testing RFC-15

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259072#comment-17259072 ] Vinoth Chandar commented on HUDI-1308: -- More testing on S3 from [~vbalaji] {code} Caused by:

[jira] [Commented] (HUDI-1479) Replace FSUtils.getAllPartitionPaths() with HoodieTableMetadata#getAllPartitionPaths()

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259090#comment-17259090 ] Vinoth Chandar commented on HUDI-1479: -- [~uditme] I have updated the description with detailed steps

[GitHub] [hudi] codecov-io edited a comment on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] yanghua commented on a change in pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
yanghua commented on a change in pull request #2405: URL: https://github.com/apache/hudi/pull/2405#discussion_r551988234 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -428,10 +429,14 @@ public static Object

[GitHub] [hudi] codecov-io edited a comment on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=h1) Report > Merging [#2379](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=desc) (4cc4b35) into

[GitHub] [hudi] afilipchik commented on a change in pull request #2380: [Hudi 73] Adding support for vanilla AvroKafkaSource

2021-01-05 Thread GitBox
afilipchik commented on a change in pull request #2380: URL: https://github.com/apache/hudi/pull/2380#discussion_r552037619 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,130 @@ +/* + *

[jira] [Created] (HUDI-1507) Hive sync having issues w/ Clustering

2021-01-05 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1507: - Summary: Hive sync having issues w/ Clustering Key: HUDI-1507 URL: https://issues.apache.org/jira/browse/HUDI-1507 Project: Apache Hudi Issue

[GitHub] [hudi] afilipchik commented on a change in pull request #2380: [Hudi 73] Adding support for vanilla AvroKafkaSource

2021-01-05 Thread GitBox
afilipchik commented on a change in pull request #2380: URL: https://github.com/apache/hudi/pull/2380#discussion_r552040233 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/serde/AbstractHoodieKafkaAvroDeserializer.java ## @@ -0,0 +1,130 @@ +/* + *

[GitHub] [hudi] nsivabalan commented on a change in pull request #2402: [HUDI-1383] Fixing sorting of partition vals for hive sync computation

2021-01-05 Thread GitBox
nsivabalan commented on a change in pull request #2402: URL: https://github.com/apache/hudi/pull/2402#discussion_r552046190 ## File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java ## @@ -21,10 +21,10 @@ import

[jira] [Commented] (HUDI-1507) Hive sync having issues w/ Clustering

2021-01-05 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259040#comment-17259040 ] sivabalan narayanan commented on HUDI-1507: --- CC : [~satish]  > Hive sync having issues w/

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r551984532 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r551984722 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r551984380 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] yanghua commented on pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
yanghua commented on pull request #2405: URL: https://github.com/apache/hudi/pull/2405#issuecomment-754694548 @wangxianghu And Travis failed, please check what's wrong... This is an automated message from the Apache Git

[GitHub] [hudi] nsivabalan commented on a change in pull request #2402: [HUDI-1383] Fixing sorting of partition vals for hive sync computation

2021-01-05 Thread GitBox
nsivabalan commented on a change in pull request #2402: URL: https://github.com/apache/hudi/pull/2402#discussion_r55204 ## File path: hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java ## @@ -56,7 +56,7 @@ } private static Iterable

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r551986565 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -682,6 +693,58 @@ public void

[GitHub] [hudi] yanghua commented on pull request #2404: [MINOR] Add Jira URL and Mailing List

2021-01-05 Thread GitBox
yanghua commented on pull request #2404: URL: https://github.com/apache/hudi/pull/2404#issuecomment-754682021 @vinothchandar Do you agree with this change? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] afilipchik commented on pull request #2380: [Hudi 73] Adding support for vanilla AvroKafkaSource

2021-01-05 Thread GitBox
afilipchik commented on pull request #2380: URL: https://github.com/apache/hudi/pull/2380#issuecomment-754744538 On making AbstractHoodieKafkaAvroDeserializer abstract - it looks like modified Confluent deserializer, so it believe it should be called like that. If we want to support

[jira] [Created] (HUDI-1508) Partition update with global index in MOR tables resulting in duplicate values during read optimized queries

2021-01-05 Thread Ryan Pifer (Jira)
Ryan Pifer created HUDI-1508: Summary: Partition update with global index in MOR tables resulting in duplicate values during read optimized queries Key: HUDI-1508 URL: https://issues.apache.org/jira/browse/HUDI-1508

[jira] [Assigned] (HUDI-1507) Hive sync having issues w/ Clustering

2021-01-05 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish reassigned HUDI-1507: Assignee: satish > Hive sync having issues w/ Clustering > - > >

[jira] [Updated] (HUDI-1399) support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1399: - Status: Patch Available (was: In Progress) > support a independent clustering spark job to

[GitHub] [hudi] codecov-io edited a comment on pull request #2400: Some fixes and enhancements to test suite framework

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2400: URL: https://github.com/apache/hudi/pull/2400#issuecomment-753557036 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2400?src=pr=h1) Report > Merging [#2400](https://codecov.io/gh/apache/hudi/pull/2400?src=pr=desc) (ab40bd6) into

[jira] [Updated] (HUDI-1507) Hive sync having issues w/ Clustering

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1507: - Labels: pull-request-available (was: ) > Hive sync having issues w/ Clustering >

[GitHub] [hudi] satishkotha opened a new pull request #2407: [HUDI-1507] Change timeline utils to support reading replacecommit

2021-01-05 Thread GitBox
satishkotha opened a new pull request #2407: URL: https://github.com/apache/hudi/pull/2407 ## What is the purpose of the pull request Change timeline utils to support reading replacecommit metadata ## Brief change log HiveSync uses TimelineUtils to get modified

[GitHub] [hudi] WTa-hash commented on issue #2229: [SUPPORT] UpsertPartitioner performance

2021-01-05 Thread GitBox
WTa-hash commented on issue #2229: URL: https://github.com/apache/hudi/issues/2229#issuecomment-754894794 @bvaradar - I would like to understand a little bit more about what's going on here with the spark stage "Getting small files from partitions" from the screenshot.

[GitHub] [hudi] WTa-hash edited a comment on issue #2229: [SUPPORT] UpsertPartitioner performance

2021-01-05 Thread GitBox
WTa-hash edited a comment on issue #2229: URL: https://github.com/apache/hudi/issues/2229#issuecomment-754894794 @bvaradar - I would like to understand a little bit more about what's going on here with the spark stage "Getting small files from partitions" from the screenshot.

[GitHub] [hudi] satishkotha commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
satishkotha commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r552133598 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -109,6 +111,9 @@ public static void main(String[]

[GitHub] [hudi] WTa-hash edited a comment on issue #2229: [SUPPORT] UpsertPartitioner performance

2021-01-05 Thread GitBox
WTa-hash edited a comment on issue #2229: URL: https://github.com/apache/hudi/issues/2229#issuecomment-754894794 @bvaradar - I would like to understand a little bit more about what's going on here with the spark stage "Getting small files from partitions" from the screenshot.

[jira] [Updated] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1509: - Description: During the in-house testing for 0.5x to 0.6x release upgrade, I have detected a

[jira] [Commented] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323 ] Prashant Wason commented on HUDI-1509: -- I timed the various code fragments involved in the above

[jira] [Created] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
Prashant Wason created HUDI-1509: Summary: Major performance degradation due to rewriting records with default values Key: HUDI-1509 URL: https://issues.apache.org/jira/browse/HUDI-1509 Project:

[jira] [Comment Edited] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323 ] Prashant Wason edited comment on HUDI-1509 at 1/6/21, 12:52 AM: I timed

[jira] [Comment Edited] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323 ] Prashant Wason edited comment on HUDI-1509 at 1/6/21, 12:52 AM: I timed

[jira] [Comment Edited] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259323#comment-17259323 ] Prashant Wason edited comment on HUDI-1509 at 1/6/21, 12:52 AM: I timed

[GitHub] [hudi] codecov-io commented on pull request #2407: [HUDI-1507] Change timeline utils to support reading replacecommit

2021-01-05 Thread GitBox
codecov-io commented on pull request #2407: URL: https://github.com/apache/hudi/pull/2407#issuecomment-754918776 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2407?src=pr=h1) Report > Merging [#2407](https://codecov.io/gh/apache/hudi/pull/2407?src=pr=desc) (88ff431) into

[jira] [Updated] (HUDI-1507) Hive sync having issues w/ Clustering

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1507: - Labels: pull-request-available release-blocker (was: release-blocker) > Hive sync having issues

[jira] [Updated] (HUDI-1507) Hive sync having issues w/ Clustering

2021-01-05 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1507: -- Labels: release-blocker (was: pull-request-available) > Hive sync having issues w/

[GitHub] [hudi] jtmzheng opened a new issue #2408: [SUPPORT] OutOfMemory on upserting into MOR dataset

2021-01-05 Thread GitBox
jtmzheng opened a new issue #2408: URL: https://github.com/apache/hudi/issues/2408 **Describe the problem you faced** We have a Spark Streaming application running on EMR 5.31.0 that reads from a Kinesis stream (batch interval of 30 minutes) and upserts to a MOR dataset that is

[jira] [Updated] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1509: - Fix Version/s: 0.7.0 > Major performance degradation due to rewriting records with default values

[jira] [Commented] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259324#comment-17259324 ] Prashant Wason commented on HUDI-1509: -- So calling getCombinedFieldsToWrite() is adding 275usec for

[jira] [Updated] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-1509: - Affects Version/s: 0.7.0 0.6.1 0.6.0 > Major

[jira] [Commented] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-05 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259332#comment-17259332 ] Nishith Agarwal commented on HUDI-1509: --- [~pwason] Thanks for digging into this and instrumenting

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r552329734 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -153,7 +161,12 @@ private int

[GitHub] [hudi] wosow opened a new issue #2409: [SUPPORT] Spark structured Streaming writes to Hudi and synchronizes Hive to create only read-optimized tables without creating real-time tables

2021-01-05 Thread GitBox
wosow opened a new issue #2409: URL: https://github.com/apache/hudi/issues/2409 Spark structured Streaming writes to Hudi and synchronizes Hive to create only read-optimized tables without creating real-time tables , no errors happening **Environment Description**

[GitHub] [hudi] yanghua commented on pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
yanghua commented on pull request #2405: URL: https://github.com/apache/hudi/pull/2405#issuecomment-754997547 @wangxianghu Please check Travis again. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r552329605 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -109,6 +111,9 @@ public static void main(String[]

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r552318635 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -109,6 +111,9 @@ public static void main(String[]

[GitHub] [hudi] yanghua commented on pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client

2021-01-05 Thread GitBox
yanghua commented on pull request #2375: URL: https://github.com/apache/hudi/pull/2375#issuecomment-755056830 > > > Hi @garyli1019. Maybe I think the current implementation is OK. Beacause even in streaming job, we need to accumulate batch records in memory during the check-point cycle

[GitHub] [hudi] ivorzhou commented on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2021-01-05 Thread GitBox
ivorzhou commented on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-755121964 > @ivorzhou : is the requirement to set default value or value from previous version of the record? if previous version of the record, then guess we already have another PR for

[jira] [Updated] (HUDI-1510) Move HoodieEngineContext to hudi-common module

2021-01-05 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1510: Component/s: (was: Writer Core) (was: Common Core)

[GitHub] [hudi] umehrot2 opened a new pull request #2410: [HUDI-1510] Move HoodieEngineContext and its dependencies to hudi-common

2021-01-05 Thread GitBox
umehrot2 opened a new pull request #2410: URL: https://github.com/apache/hudi/pull/2410 ## What is the purpose of the pull request Moves HoodieEngineContext class and its dependencies to hudi-common, so that we can parallelize fetching of files and partitions in

[jira] [Updated] (HUDI-1510) Move HoodieEngineContext to hudi-common module

2021-01-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1510: - Labels: pull-request-available (was: ) > Move HoodieEngineContext to hudi-common module >

[GitHub] [hudi] codecov-io edited a comment on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=h1) Report > Merging [#2379](https://codecov.io/gh/apache/hudi/pull/2379?src=pr=desc) (d53595e) into

[jira] [Created] (HUDI-1510) Move HoodieEngineContext to hudi-common module

2021-01-05 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1510: --- Summary: Move HoodieEngineContext to hudi-common module Key: HUDI-1510 URL: https://issues.apache.org/jira/browse/HUDI-1510 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-1510) Move HoodieEngineContext to hudi-common module

2021-01-05 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1510: Issue Type: Improvement (was: Bug) > Move HoodieEngineContext to hudi-common module >

[GitHub] [hudi] codecov-io edited a comment on pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2405: URL: https://github.com/apache/hudi/pull/2405#issuecomment-754665459 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=h1) Report > Merging [#2405](https://codecov.io/gh/apache/hudi/pull/2405?src=pr=desc) (9bded33) into

[GitHub] [hudi] Nieal-Yang commented on pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client

2021-01-05 Thread GitBox
Nieal-Yang commented on pull request #2375: URL: https://github.com/apache/hudi/pull/2375#issuecomment-755103932 > > > > Hi @garyli1019. Maybe I think the current implementation is OK. Beacause even in streaming job, we need to accumulate batch records in memory during the check-point

[GitHub] [hudi] codecov-io edited a comment on pull request #2405: [HUDI-1506] Fix wrong exception thrown in HoodieAvroUtils

2021-01-05 Thread GitBox
codecov-io edited a comment on pull request #2405: URL: https://github.com/apache/hudi/pull/2405#issuecomment-754665459 This is an automated message from the Apache Git Service. To respond to the message, please log on to