[GitHub] [hudi] n3nash commented on a change in pull request #2440: [HUDI-1532] Fixed suboptimal implementation of a magic sequence search

2021-01-15 Thread GitBox
n3nash commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r558822909 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean

[GitHub] [hudi] n3nash commented on a change in pull request #2440: [HUDI-1532] Fixed suboptimal implementation of a magic sequence search

2021-01-15 Thread GitBox
n3nash commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r558822909 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean

[GitHub] [hudi] satishkotha commented on pull request #2453: [HUDI-1533] Make SerializableSchema work for large schemas

2021-01-15 Thread GitBox
satishkotha commented on pull request #2453: URL: https://github.com/apache/hudi/pull/2453#issuecomment-761429551 @n3nash @vinothchandar This change is contained to clustering and is helpful for some usecases. So I'm hoping we can merge before releasing 0.7. PTAL.

[jira] [Updated] (HUDI-1533) SerializableSchema doesnt work for some schemas

2021-01-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1533: - Labels: pull-request-available (was: ) > SerializableSchema doesnt work for some schemas >

[GitHub] [hudi] satishkotha opened a new pull request #2453: [HUDI-1533] Make SerializableSchema work for large schemas

2021-01-15 Thread GitBox
satishkotha opened a new pull request #2453: URL: https://github.com/apache/hudi/pull/2453 ## What is the purpose of the pull request - Make SerializableSchema work for large schemas - Add ability to sortBy numeric values ## Brief change log * writeUTF cannot support

[GitHub] [hudi] wangxianghu commented on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-15 Thread GitBox
wangxianghu commented on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-761408789 @yanghua @lw309637554 please take a look when free This is an automated message from the Apache Git Service.

[GitHub] [hudi] xushiyan edited a comment on pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-15 Thread GitBox
xushiyan edited a comment on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-760076561 @vinothchandar @leesf @yanghua The style can be sync'ed by - using google-java-format in spotless config and `spotless:apply` enforces the style that is also compatible

[GitHub] [hudi] codecov-io edited a comment on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-758857465 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-758857465 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2434?src=pr=h1) Report > Merging [#2434](https://codecov.io/gh/apache/hudi/pull/2434?src=pr=desc) (cd1909f) into

[GitHub] [hudi] Trevor-zhang commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-15 Thread GitBox
Trevor-zhang commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-761323239 > @wangxianghu please help to review thanks. PTAL @yanghua . This is an automated message from the

[GitHub] [hudi] codecov-io edited a comment on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-758857465 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2434?src=pr=h1) Report > Merging [#2434](https://codecov.io/gh/apache/hudi/pull/2434?src=pr=desc) (cd1909f) into

[GitHub] [hudi] vburenin commented on a change in pull request #2440: [HUDI-1532] Fixed suboptimal implementation of a magic sequence search

2021-01-15 Thread GitBox
vburenin commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r558774265 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean

[GitHub] [hudi] n3nash commented on a change in pull request #2440: [HUDI-1532] Fixed suboptimal implementation of a magic sequence search

2021-01-15 Thread GitBox
n3nash commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r558771039 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean

[jira] [Updated] (HUDI-1532) Super slow magic sequence search within the log files on GCS

2021-01-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1532: - Labels: pull-request-available (was: ) > Super slow magic sequence search within the log files

[GitHub] [hudi] n3nash commented on a change in pull request #2440: [HUDI-1532] Fixed suboptimal implementation of a magic sequence search

2021-01-15 Thread GitBox
n3nash commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r558771039 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean

[GitHub] [hudi] n3nash commented on a change in pull request #2451: [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits

2021-01-15 Thread GitBox
n3nash commented on a change in pull request #2451: URL: https://github.com/apache/hudi/pull/2451#discussion_r558762513 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java ## @@ -177,10 +179,12 @@ public HoodieMetadataPayload

[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2430?src=pr=h1) Report > Merging [#2430](https://codecov.io/gh/apache/hudi/pull/2430?src=pr=desc) (7c96c0c) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=h1) Report > Merging [#2452](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=desc) (dc8dbbc) into

[GitHub] [hudi] codecov-io commented on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-15 Thread GitBox
codecov-io commented on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-761259726 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=h1) Report > Merging [#2452](https://codecov.io/gh/apache/hudi/pull/2452?src=pr=desc) (dc8dbbc) into

[GitHub] [hudi] nsivabalan closed pull request #2262: [HUDI-1383] Modify hive partition synchronization

2021-01-15 Thread GitBox
nsivabalan closed pull request #2262: URL: https://github.com/apache/hudi/pull/2262 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] vinothchandar commented on a change in pull request #2441: [HUDI 1308] [WIP] rfc15 perf prod testing

2021-01-15 Thread GitBox
vinothchandar commented on a change in pull request #2441: URL: https://github.com/apache/hudi/pull/2441#discussion_r558645032 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java ## @@ -260,9 +260,12 @@ public static void processFiles(FileSystem fs,

[GitHub] [hudi] vinothchandar commented on a change in pull request #2441: [HUDI 1308] [WIP] rfc15 perf prod testing

2021-01-15 Thread GitBox
vinothchandar commented on a change in pull request #2441: URL: https://github.com/apache/hudi/pull/2441#discussion_r558642403 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -371,10 +370,10 @@

[GitHub] [hudi] vinothchandar commented on a change in pull request #2441: [HUDI 1308] [WIP] rfc15 perf prod testing

2021-01-15 Thread GitBox
vinothchandar commented on a change in pull request #2441: URL: https://github.com/apache/hudi/pull/2441#discussion_r558638311 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java ## @@ -260,9 +260,12 @@ public static void processFiles(FileSystem fs,

[GitHub] [hudi] vinothchandar edited a comment on pull request #2441: [HUDI 1308] [WIP] rfc15 perf prod testing

2021-01-15 Thread GitBox
vinothchandar edited a comment on pull request #2441: URL: https://github.com/apache/hudi/pull/2441#issuecomment-761240594 >But getting complicated now and difficult to trace where it is opened and closed. @prashantwason IMO we should have always opened and close with each fetch

[GitHub] [hudi] vinothchandar commented on pull request #2441: [HUDI 1308] [WIP] rfc15 perf prod testing

2021-01-15 Thread GitBox
vinothchandar commented on pull request #2441: URL: https://github.com/apache/hudi/pull/2441#issuecomment-761240594 >But getting complicated now and difficult to trace where it is opened and closed. @prashantwason IMO we should have always opened and close with each fetch call - that

[GitHub] [hudi] nsivabalan commented on pull request #2442: Adding new configurations in 0.7.0

2021-01-15 Thread GitBox
nsivabalan commented on pull request #2442: URL: https://github.com/apache/hudi/pull/2442#issuecomment-761231689 done. This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] satishkotha commented on issue #2439: [SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol

2021-01-15 Thread GitBox
satishkotha commented on issue #2439: URL: https://github.com/apache/hudi/issues/2439#issuecomment-761225598 @rakeshramakrishnan From logs, I do see table **`default`.`hive_hudi_sync`** is created correctly and available in catalog 25064 [Thread-5] INFO

[GitHub] [hudi] vinothchandar commented on pull request #2451: [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits

2021-01-15 Thread GitBox
vinothchandar commented on pull request #2451: URL: https://github.com/apache/hudi/pull/2451#issuecomment-761183691 @umehrot2 the CI seems to be failing? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] codecov-io edited a comment on pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2325: URL: https://github.com/apache/hudi/pull/2325#issuecomment-742860619 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2325?src=pr=h1) Report > Merging [#2325](https://codecov.io/gh/apache/hudi/pull/2325?src=pr=desc) (5bd9036) into

[GitHub] [hudi] nsivabalan commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2021-01-15 Thread GitBox
nsivabalan commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-761115228 my bad. had to fix the config naming. do not do line by line review for now. This is an automated message from

[jira] [Created] (HUDI-1533) SerializableSchema doesnt work for some schemas

2021-01-15 Thread satish (Jira)
satish created HUDI-1533: Summary: SerializableSchema doesnt work for some schemas Key: HUDI-1533 URL: https://issues.apache.org/jira/browse/HUDI-1533 Project: Apache Hudi Issue Type: Sub-task

[GitHub] [hudi] nsivabalan commented on a change in pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2021-01-15 Thread GitBox
nsivabalan commented on a change in pull request #2111: URL: https://github.com/apache/hudi/pull/2111#discussion_r558486472 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieConcatHandle.java ## @@ -0,0 +1,72 @@ +/* + * Licensed to

[jira] [Created] (HUDI-1532) Super slow magic sequence search within the log files on GCS

2021-01-15 Thread Volodymyr Burenin (Jira)
Volodymyr Burenin created HUDI-1532: --- Summary: Super slow magic sequence search within the log files on GCS Key: HUDI-1532 URL: https://issues.apache.org/jira/browse/HUDI-1532 Project: Apache Hudi

[GitHub] [hudi] codecov-io edited a comment on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-15 Thread GitBox
codecov-io edited a comment on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-15 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r558324652 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -102,65 +105,76 @@ public void open() throws Exception

[GitHub] [hudi] quitozang commented on issue #2446: [SUPPORT] The parameter "hoodie.bloom.index.filter.type" does not take effect in deltaStreamer

2021-01-15 Thread GitBox
quitozang commented on issue #2446: URL: https://github.com/apache/hudi/issues/2446#issuecomment-760954403 I modified part of the source code of DeltaStreamer like this ![image](https://user-images.githubusercontent.com/12569046/104734923-e769da00-577b-11eb-8cbd-637c359281ed.png)

[GitHub] [hudi] codecov-io commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-15 Thread GitBox
codecov-io commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760950650 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2449?src=pr=h1) Report > Merging [#2449](https://codecov.io/gh/apache/hudi/pull/2449?src=pr=desc) (ab4973d) into

[GitHub] [hudi] codecov-io commented on pull request #2447: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-15 Thread GitBox
codecov-io commented on pull request #2447: URL: https://github.com/apache/hudi/pull/2447#issuecomment-760949326 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=h1) Report > Merging [#2447](https://codecov.io/gh/apache/hudi/pull/2447?src=pr=desc) (78dd27b) into

[GitHub] [hudi] codecov-io commented on pull request #2441: [HUDI 1308] [WIP] rfc15 perf prod testing

2021-01-15 Thread GitBox
codecov-io commented on pull request #2441: URL: https://github.com/apache/hudi/pull/2441#issuecomment-760947767 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2441?src=pr=h1) Report > Merging [#2441](https://codecov.io/gh/apache/hudi/pull/2441?src=pr=desc) (d985a60) into

[jira] [Updated] (HUDI-1531) Introduce HoodiePartitionCleaner to delete specific partition

2021-01-15 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1531: - Labels: pull-request-available (was: ) > Introduce HoodiePartitionCleaner to delete specific

[GitHub] [hudi] wangxianghu opened a new pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific parti…

2021-01-15 Thread GitBox
wangxianghu opened a new pull request #2452: URL: https://github.com/apache/hudi/pull/2452 …tion ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the

[jira] [Created] (HUDI-1531) Introduce HoodiePartitionCleaner to delete specific partition

2021-01-15 Thread wangxianghu (Jira)
wangxianghu created HUDI-1531: - Summary: Introduce HoodiePartitionCleaner to delete specific partition Key: HUDI-1531 URL: https://issues.apache.org/jira/browse/HUDI-1531 Project: Apache Hudi

[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-15 Thread GitBox
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r558106065 ## File path: hudi-flink/pom.xml ## @@ -124,28 +124,77 @@ kafka-clients ${kafka.version} + + org.apache.flink +

[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-15 Thread GitBox
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r558106065 ## File path: hudi-flink/pom.xml ## @@ -124,28 +124,77 @@ kafka-clients ${kafka.version} + + org.apache.flink +

[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-15 Thread GitBox
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r558065700 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -79,6 +79,11 @@ public

[GitHub] [hudi] danny0405 commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-15 Thread GitBox
danny0405 commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r558063116 ## File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java ## @@ -81,16 +103,50 @@ public static DFSPropertiesConfiguration