[GitHub] [hudi] afilipchik commented on pull request #1565: [HUDI-73]: implemented vanilla AvroKafkaSource

2020-11-19 Thread GitBox
afilipchik commented on pull request #1565: URL: https://github.com/apache/hudi/pull/1565#issuecomment-730524203 Hey, any way to push it over the line? We are upgrading to 0.6 and in the current state evolutions on top of Confluent Schema registry is borked

[GitHub] [hudi] satishkotha opened a new pull request #2263: [HUDI-1075] [WIP] Implement simple clustering strategies to create and run ClusteringPlan

2020-11-19 Thread GitBox
satishkotha opened a new pull request #2263: URL: https://github.com/apache/hudi/pull/2263 ## What is the purpose of the pull request Implement simple clustering strategies to create and run ClusteringPlan ## Brief change log - Add simple strategy to to schedule clustering

[GitHub] [hudi] satishkotha commented on pull request #2196: [HUDI-1349]spark sql support overwrite use replace action

2020-11-19 Thread GitBox
satishkotha commented on pull request #2196: URL: https://github.com/apache/hudi/pull/2196#issuecomment-730596478 LGTM. @n3nash Can you take a pass and merge this? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] codecov-io edited a comment on pull request #2136: [HUDI-37] Persist the HoodieIndex type in the hoodie.properties file

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2136: URL: https://github.com/apache/hudi/pull/2136#issuecomment-729377388 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2136?src=pr=h1) Report > Merging [#2136](https://codecov.io/gh/apache/hudi/pull/2136?src=pr=desc) (1d50501) into

[GitHub] [hudi] bhasudha opened a new pull request #2264: [HUDI-1406] Add date partition based source input selector for Delta …

2020-11-19 Thread GitBox
bhasudha opened a new pull request #2264: URL: https://github.com/apache/hudi/pull/2264 …streamer - Adds ability to list only recent date based partitions from source data. ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review

[jira] [Updated] (HUDI-1406) Add new DFS path sector implementation for listing date based partitions

2020-11-19 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1406: Description: Deltastreamer DFS source lists files from table path and determine files changed

[jira] [Updated] (HUDI-1406) Add new DFS path sector implementation for listing date based partitions

2020-11-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1406: - Labels: pull-request-available (was: ) > Add new DFS path sector implementation for listing date

[jira] [Updated] (HUDI-1075) Implement a simple merge clustering strategy

2020-11-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1075: - Labels: pull-request-available (was: ) > Implement a simple merge clustering strategy >

[GitHub] [hudi] spyzzz edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-11-19 Thread GitBox
spyzzz edited a comment on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-730456220 Since i deployed hudi jar in HIVE auxlib to be able to request hive table without adding each time the JAR in the session. We have some clients who now systematically get this

[GitHub] [hudi] codecov-io commented on pull request #2254: [HUDI-1350] Support Partition level delete API in HUDI

2020-11-19 Thread GitBox
codecov-io commented on pull request #2254: URL: https://github.com/apache/hudi/pull/2254#issuecomment-730513524 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=h1) Report > Merging [#2254](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=desc) (7cdd2db) into

[jira] [Created] (HUDI-1406) Add new DFS path sector implementation for listing date based partitions

2020-11-19 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1406: --- Summary: Add new DFS path sector implementation for listing date based partitions Key: HUDI-1406 URL: https://issues.apache.org/jira/browse/HUDI-1406 Project: Apache

[GitHub] [hudi] codecov-io edited a comment on pull request #2264: [HUDI-1406] Add date partition based source input selector for Delta …

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2264: URL: https://github.com/apache/hudi/pull/2264#issuecomment-730736010 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2254: [HUDI-1350] Support Partition level delete API in HUDI

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2254: URL: https://github.com/apache/hudi/pull/2254#issuecomment-730513524 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=h1) Report > Merging [#2254](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=desc) (a279a39) into

[GitHub] [hudi] konradwudkowski opened a new issue #2265: Arrays with nulls in them result in broken parquet files

2020-11-19 Thread GitBox
konradwudkowski opened a new issue #2265: URL: https://github.com/apache/hudi/issues/2265 Writing a dataframe with an array column when an array contains a `null` causes hudi to write broken parquet. **To Reproduce** Steps to reproduce (using pyspark here): 1. Create a

[GitHub] [hudi] codecov-io commented on pull request #2264: [HUDI-1406] Add date partition based source input selector for Delta …

2020-11-19 Thread GitBox
codecov-io commented on pull request #2264: URL: https://github.com/apache/hudi/pull/2264#issuecomment-730736010 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2264?src=pr=h1) Report > Merging [#2264](https://codecov.io/gh/apache/hudi/pull/2264?src=pr=desc) (12cf95c) into

[GitHub] [hudi] helianthuslulu commented on issue #2238: [SUPPORT] _hoodie_is_deleted support for Spark Datasource API in hudi 0.5.2-incubating

2020-11-19 Thread GitBox
helianthuslulu commented on issue #2238: URL: https://github.com/apache/hudi/issues/2238#issuecomment-730760611 I met one problem with **"_hoodie_is_deletd"**,too. and hudi version is 0.5.2: **1)kafaka data sample:** "rowkey1","value1","value2",true

[GitHub] [hudi] nsivabalan commented on issue #2238: [SUPPORT] _hoodie_is_deleted support for Spark Datasource API in hudi 0.5.2-incubating

2020-11-19 Thread GitBox
nsivabalan commented on issue #2238: URL: https://github.com/apache/hudi/issues/2238#issuecomment-730837343 @RajasekarSribalan : sorry, missed this. Yes, you are right. if you use "UPSERT" operation, "_hoodie_is_deleted" value will be used to distinguish records to be deleted vs upserted.

[GitHub] [hudi] nsivabalan commented on issue #2238: [SUPPORT] _hoodie_is_deleted support for Spark Datasource API in hudi 0.5.2-incubating

2020-11-19 Thread GitBox
nsivabalan commented on issue #2238: URL: https://github.com/apache/hudi/issues/2238#issuecomment-730837495 @helianthuslulu : sorry I don't quite get your question. would you mind explaining once again. This is an automated

[GitHub] [hudi] n3nash commented on a change in pull request #2249: [HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable

2020-11-19 Thread GitBox
n3nash commented on a change in pull request #2249: URL: https://github.com/apache/hudi/pull/2249#discussion_r527392458 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/collection/LazyFileIterable.java ## @@ -38,6 +38,8 @@ // Stores the key and

[GitHub] [hudi] nsivabalan commented on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

2020-11-19 Thread GitBox
nsivabalan commented on issue #2255: URL: https://github.com/apache/hudi/issues/2255#issuecomment-730845323 If I can give you a patch w/ some log statements, would you be able to patch and give us the logs ? This is an

[GitHub] [hudi] codecov-io commented on pull request #2263: [HUDI-1075] [WIP] Implement simple clustering strategies to create and run ClusteringPlan

2020-11-19 Thread GitBox
codecov-io commented on pull request #2263: URL: https://github.com/apache/hudi/pull/2263#issuecomment-730664726 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2263?src=pr=h1) Report > Merging [#2263](https://codecov.io/gh/apache/hudi/pull/2263?src=pr=desc) (dc6d321) into

[GitHub] [hudi] lw309637554 commented on pull request #2196: [HUDI-1349]spark sql support overwrite use replace action

2020-11-19 Thread GitBox
lw309637554 commented on pull request #2196: URL: https://github.com/apache/hudi/pull/2196#issuecomment-730723941 > LGTM. @n3nash Can you take a pass and merge this? @satishkotha Thanks for your suggestion. This is

[GitHub] [hudi] codecov-io edited a comment on pull request #2254: [HUDI-1350] Support Partition level delete API in HUDI

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2254: URL: https://github.com/apache/hudi/pull/2254#issuecomment-730513524 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=h1) Report > Merging [#2254](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=desc) (3af20d8) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2254: [HUDI-1350] Support Partition level delete API in HUDI

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2254: URL: https://github.com/apache/hudi/pull/2254#issuecomment-730513524 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=h1) Report > Merging [#2254](https://codecov.io/gh/apache/hudi/pull/2254?src=pr=desc) (a279a39) into

[GitHub] [hudi] lw309637554 commented on a change in pull request #2136: [HUDI-37] Persist the HoodieIndex type in the hoodie.properties file

2020-11-19 Thread GitBox
lw309637554 commented on a change in pull request #2136: URL: https://github.com/apache/hudi/pull/2136#discussion_r527339351 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -206,7 +206,7 @@ private void doCheck() throws

[GitHub] [hudi] nsivabalan commented on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

2020-11-19 Thread GitBox
nsivabalan commented on issue #2255: URL: https://github.com/apache/hudi/issues/2255#issuecomment-730844186 hmmm, interesting. reason I asked is that, we have a test w/ write client here

[GitHub] [hudi] codecov-io edited a comment on pull request #2254: [HUDI-1350] Support Partition level delete API in HUDI

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2254: URL: https://github.com/apache/hudi/pull/2254#issuecomment-730513524 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2136: [HUDI-37] Persist the HoodieIndex type in the hoodie.properties file

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2136: URL: https://github.com/apache/hudi/pull/2136#issuecomment-729377388 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2136?src=pr=h1) Report > Merging [#2136](https://codecov.io/gh/apache/hudi/pull/2136?src=pr=desc) (953c4b9) into

[GitHub] [hudi] bhasudha commented on issue #2258: [SUPPORT] Unable to query hudi tables in Presto

2020-11-19 Thread GitBox
bhasudha commented on issue #2258: URL: https://github.com/apache/hudi/issues/2258#issuecomment-730205400 @bhushanamk which version of Presto are you using. And if you are using your own version, how did you do the setup ?

[GitHub] [hudi] codecov-io edited a comment on pull request #2262: [HUDI-1383] Modify hive partition synchronization

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2262: URL: https://github.com/apache/hudi/pull/2262#issuecomment-730222148 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2262?src=pr=h1) Report > Merging [#2262](https://codecov.io/gh/apache/hudi/pull/2262?src=pr=desc) (c9917ff) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2262: [HUDI-1383] Modify hive partition synchronization

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2262: URL: https://github.com/apache/hudi/pull/2262#issuecomment-730222148 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2262?src=pr=h1) Report > Merging [#2262](https://codecov.io/gh/apache/hudi/pull/2262?src=pr=desc) (c9917ff) into

[GitHub] [hudi] codecov-io commented on pull request #2242: [HUDI-1366] Make deltasteamer support exporting data from hdfs to hudi

2020-11-19 Thread GitBox
codecov-io commented on pull request #2242: URL: https://github.com/apache/hudi/pull/2242#issuecomment-730253264 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2242?src=pr=h1) Report > Merging [#2242](https://codecov.io/gh/apache/hudi/pull/2242?src=pr=desc) (8259263) into

[GitHub] [hudi] codecov-io commented on pull request #2262: [HUDI-1383] Modify hive partition synchronization

2020-11-19 Thread GitBox
codecov-io commented on pull request #2262: URL: https://github.com/apache/hudi/pull/2262#issuecomment-730222148 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2262?src=pr=h1) Report > Merging [#2262](https://codecov.io/gh/apache/hudi/pull/2262?src=pr=desc) (c9917ff) into

[GitHub] [hudi] hj2016 commented on pull request #2249: [HUDI-1358] Fix leaks in DiskBasedMap and LazyFileIterable

2020-11-19 Thread GitBox
hj2016 commented on pull request #2249: URL: https://github.com/apache/hudi/pull/2249#issuecomment-730199130 @bvaradar I will verify for you later This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] garyli1019 commented on pull request #2242: [HUDI-1366] Make deltasteamer support exporting data from hdfs to hudi

2020-11-19 Thread GitBox
garyli1019 commented on pull request #2242: URL: https://github.com/apache/hudi/pull/2242#issuecomment-730216182 @liujinhui1994 thanks for your contribution, would you please add more descriptions about this PR? I am not quite sure if I understand the purpose.

[GitHub] [hudi] shenh062326 commented on pull request #2222: [HUDI-1364] Add HoodieJavaEngineContext to hudi-java-client

2020-11-19 Thread GitBox
shenh062326 commented on pull request #: URL: https://github.com/apache/hudi/pull/#issuecomment-730306470 @vinothchandar Can you take a look at this PR. This is an automated message from the Apache Git Service. To

[GitHub] [hudi] Karl-WangSK edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2020-11-19 Thread GitBox
Karl-WangSK edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-730111282 cc @bvaradar @yanghua This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] spyzzz commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-11-19 Thread GitBox
spyzzz commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-730456220 Since i deployed hudi jar in HIVE auxlib to be albe to request hive table without adding each time the JAR in the session. We have some clients who now systematicly get this token hbase

[jira] [Updated] (HUDI-1403) Decouple HoodieiFlinkStreamer from Kafka to support more sources

2020-11-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1403: -- Summary: Decouple HoodieiFlinkStreamer from Kafka to support more sources (was: Decouple

[jira] [Updated] (HUDI-1403) Decouple HoodieFlinkStreamer from Kafka to support more sources

2020-11-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1403: -- Summary: Decouple HoodieFlinkStreamer from Kafka to support more sources (was: Decouple

[jira] [Updated] (HUDI-1405) Make HoodieFlinkStreamer support read props from local fileSystem

2020-11-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1405: -- Summary: Make HoodieFlinkStreamer support read props from local fileSystem (was: Make

[jira] [Updated] (HUDI-1403) Decouple HoodieFlinkStreamer from Kafka to support more sources

2020-11-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1403: -- Description: currently, `HoodieFlinkStreamer` support read from kafka only. we should enhance it to

[GitHub] [hudi] codecov-io edited a comment on pull request #2196: [HUDI-1349]spark sql support overwrite use replace action

2020-11-19 Thread GitBox
codecov-io edited a comment on pull request #2196: URL: https://github.com/apache/hudi/pull/2196#issuecomment-728653013 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2196?src=pr=h1) Report > Merging [#2196](https://codecov.io/gh/apache/hudi/pull/2196?src=pr=desc) (0da0ca7) into