[jira] [Closed] (HUDI-1313) Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu closed HUDI-1313. - Resolution: Invalid > Rename hudi-spark-client module to hudi-client-spark >

[jira] [Updated] (HUDI-1313) Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1313: -- Status: Open (was: New) > Rename hudi-spark-client module to hudi-client-spark >

[GitHub] [hudi] wangxianghu closed pull request #2139: [HUDI-1313] Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread GitBox
wangxianghu closed pull request #2139: URL: https://github.com/apache/hudi/pull/2139 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] wangxianghu commented on pull request #2139: [HUDI-1313] Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread GitBox
wangxianghu commented on pull request #2139: URL: https://github.com/apache/hudi/pull/2139#issuecomment-703021700 > @wangxianghu I did not end up doing this in my pr because hudi-sync is alreadynamed in the same way. We can leave this also as-is? Yes, I named it as

[GitHub] [hudi] satishkotha commented on a change in pull request #2129: [HUDI-1302] Add support for timestamp field in HiveSync

2020-10-02 Thread GitBox
satishkotha commented on a change in pull request #2129: URL: https://github.com/apache/hudi/pull/2129#discussion_r499101046 ## File path: hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/DLASyncConfig.java ## @@ -68,6 +68,9 @@ @Parameter(names = {"--help", "-h"},

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-10-02 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 Remaining work items: - [x] 1. Support for rollbacks in MOR Table - [x] 2. Rollback of metadata if commit eventually fails on dataset - [x] 3. HUDI-CLI

[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1310: - Parent Issue: HUDI-1308 (was: HUDI-1292) > Corruption Block Handling too slow in S3 >

[jira] [Updated] (HUDI-1311) Writes creating/updating large number of files seeing errors when deleting marker files in S3

2020-10-02 Thread Vinoth Chandar (Jira)

[jira] [Updated] (HUDI-1311) Writes creating/updating large number of files seeing errors when deleting marker files in S3

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1311: - Parent Issue: HUDI-1308 (was: HUDI-1292) > Writes creating/updating large number of files seeing

[jira] [Updated] (HUDI-1309) Listing Metadata unreadable in S3 as the log block is deemed corrupted

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1309: - Parent Issue: HUDI-1292 (was: HUDI-1308) > Listing Metadata unreadable in S3 as the log block is

[jira] [Updated] (HUDI-1310) Corruption Block Handling too slow in S3

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1310: - Parent Issue: HUDI-1292 (was: HUDI-1308) > Corruption Block Handling too slow in S3 >

[jira] [Updated] (HUDI-1312) Query side use of Metadata Table

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1312: - Parent: HUDI-1292 Issue Type: Sub-task (was: New Feature) > Query side use of Metadata

[jira] [Updated] (HUDI-1308) Issues found during testing RFC-15

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1308: - Fix Version/s: 0.7.0 > Issues found during testing RFC-15 > -- >

[jira] [Commented] (HUDI-1307) spark datasource load path format is confused for snapshot and increment read mode

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206532#comment-17206532 ] Vinoth Chandar commented on HUDI-1307: -- [~309637554] makes sense to make them consistent. go for it!

[jira] [Updated] (HUDI-1307) spark datasource load path format is confused for snapshot and increment read mode

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1307: - Description: as spark datasource read hudi table 1、snapshot mode {code:java} val readHudi =

[jira] [Updated] (HUDI-1307) spark datasource load path format is confused for snapshot and increment read mode

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1307: - Description: as spark datasource read hudi table 1、snapshot mode {code:java} val readHudi =

[jira] [Updated] (HUDI-1307) spark datasource load path format is confused for snapshot and increment read mode

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1307: - Description: as spark datasource read hudi table 1、snapshot mode {code:java} val readHudi =

[jira] [Commented] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2020-10-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206529#comment-17206529 ] Vinoth Chandar commented on HUDI-1297: -- yes. thats happening as we speak. support for 3.0 >

[GitHub] [hudi] xushiyan commented on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2020-10-02 Thread GitBox
xushiyan commented on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-702912344 @pratyakshsharma Do you happen to know about supporting array type schema? This is an automated message from the

[GitHub] [hudi] vinothchandar commented on pull request #2139: [HUDI-1313] Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread GitBox
vinothchandar commented on pull request #2139: URL: https://github.com/apache/hudi/pull/2139#issuecomment-702871683 @wangxianghu I did not end up doing this in my pr because hudi-sync is alreadynamed in the same way. We can leave this also as-is?

[GitHub] [hudi] vinothchandar commented on issue #2135: [SUPPORT] GDPR safe deletes is complex

2020-10-02 Thread GitBox
vinothchandar commented on issue #2135: URL: https://github.com/apache/hudi/issues/2135#issuecomment-702868560 @andaag, we would be happy to take a feature request do you mind raising a jira, with more detailed requirements

[GitHub] [hudi] bvaradar commented on issue #2131: [SUPPORT] HUDI with Mongo Oplogs (Debezium)

2020-10-02 Thread GitBox
bvaradar commented on issue #2131: URL: https://github.com/apache/hudi/issues/2131#issuecomment-702846845 @tandonraghavs : Thanks for the explanation. I have created a PR to expose Schema in preCombine : https://github.com/apache/hudi/pull/2141 There is one option for you which is

[GitHub] [hudi] bvaradar commented on a change in pull request #2141: [HUDI-898] Add new backwards compatible API to expose schema in preCombine

2020-10-02 Thread GitBox
bvaradar commented on a change in pull request #2141: URL: https://github.com/apache/hudi/pull/2141#discussion_r498938639 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/SparkWriteHelper.java ## @@ -49,21 +52,48 @@ public static

[jira] [Updated] (HUDI-898) Need to add Schema parameter to HoodieRecordPayload::preCombine

2020-10-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-898: Labels: pull-request-available (was: ) > Need to add Schema parameter to

[GitHub] [hudi] hotienvu commented on pull request #2140: fixed multi-partition sync failure when PartitionKeyType is used

2020-10-02 Thread GitBox
hotienvu commented on pull request #2140: URL: https://github.com/apache/hudi/pull/2140#issuecomment-702843403 Closing this. See https://github.com/apache/hudi/issues/2138 for solution This is an automated message from the

[GitHub] [hudi] bvaradar opened a new pull request #2141: [HUDI-898] Add new backwards compatible API to expose schema in preCombine

2020-10-02 Thread GitBox
bvaradar opened a new pull request #2141: URL: https://github.com/apache/hudi/pull/2141 This is to let users implement custom deduping-merge logic. ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of

[GitHub] [hudi] hotienvu closed pull request #2140: fixed multi-partition sync failure when PartitionKeyType is used

2020-10-02 Thread GitBox
hotienvu closed pull request #2140: URL: https://github.com/apache/hudi/pull/2140 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] hotienvu closed issue #2138: [SUPPORT] Failed to sync to hive multi-partition table

2020-10-02 Thread GitBox
hotienvu closed issue #2138: URL: https://github.com/apache/hudi/issues/2138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hotienvu commented on issue #2138: [SUPPORT] Failed to sync to hive multi-partition table

2020-10-02 Thread GitBox
hotienvu commented on issue #2138: URL: https://github.com/apache/hudi/issues/2138#issuecomment-702842659 Thanks @bvaradar, it is working now. Closing this and the pr IMHO, would be great if the doc [here](https://hudi.apache.org/docs/writing_data.html) have an example for writing

[jira] [Updated] (HUDI-898) Need to add Schema parameter to HoodieRecordPayload::preCombine

2020-10-02 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-898: Status: Open (was: New) > Need to add Schema parameter to HoodieRecordPayload::preCombine >

[jira] [Assigned] (HUDI-898) Need to add Schema parameter to HoodieRecordPayload::preCombine

2020-10-02 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-898: --- Assignee: Balaji Varadarajan > Need to add Schema parameter to

[GitHub] [hudi] andaag commented on issue #2135: [SUPPORT] GDPR safe deletes is complex

2020-10-02 Thread GitBox
andaag commented on issue #2135: URL: https://github.com/apache/hudi/issues/2135#issuecomment-702821679 Yes, but it'd be quite critical to be able to access this from the apis/programatically. If we have a job running the gdpr deletion, we'd need that job to know the times involved

[GitHub] [hudi] bvaradar commented on issue #2135: [SUPPORT] GDPR safe deletes is complex

2020-10-02 Thread GitBox
bvaradar commented on issue #2135: URL: https://github.com/apache/hudi/issues/2135#issuecomment-702799057 @andaag : I think you are referring to "cleaner" here. We expose cleaner metadata through CLI. "cleans show" (Source code : CleansCommand.java). Cleaner track what is the

[GitHub] [hudi] bvaradar commented on issue #2138: [SUPPORT] Failed to sync to hive multi-partition table

2020-10-02 Thread GitBox
bvaradar commented on issue #2138: URL: https://github.com/apache/hudi/issues/2138#issuecomment-702772462 @hotienvu : In your code, hive-sync config is configured wrongly DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY() should be set to "year,month,day" instead of

[GitHub] [hudi] bvaradar edited a comment on issue #2138: [SUPPORT] Failed to sync to hive multi-partition table

2020-10-02 Thread GitBox
bvaradar edited a comment on issue #2138: URL: https://github.com/apache/hudi/issues/2138#issuecomment-702772462 @hotienvu : In your code, hive-sync config is configured wrongly DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY() should be set to "year,month,day" instead of

[GitHub] [hudi] bvaradar commented on a change in pull request #2093: [HUDI-1200]: fixed NPE in CustomKeyGenerator

2020-10-02 Thread GitBox
bvaradar commented on a change in pull request #2093: URL: https://github.com/apache/hudi/pull/2093#discussion_r498859641 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java ## @@ -58,6 +59,7 @@ public CustomKeyGenerator(TypedProperties

[GitHub] [hudi] nsivabalan commented on pull request #2092: [HUDI-1285] Fix merge on read DAG to make docker demo pass

2020-10-02 Thread GitBox
nsivabalan commented on pull request #2092: URL: https://github.com/apache/hudi/pull/2092#issuecomment-702767045 cool, sounds good. will wait for you to update the patch then This is an automated message from the Apache Git

[GitHub] [hudi] hotienvu opened a new pull request #2140: fixed multi-partition sync failure when PartitionKeyType is used

2020-10-02 Thread GitBox
hotienvu opened a new pull request #2140: URL: https://github.com/apache/hudi/pull/2140 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[jira] [Resolved] (HUDI-1089) Refactor hudi-client to support multi-engine

2020-10-02 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu resolved HUDI-1089. --- Resolution: Fixed > Refactor hudi-client to support multi-engine >

[jira] [Commented] (HUDI-1089) Refactor hudi-client to support multi-engine

2020-10-02 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17206149#comment-17206149 ] wangxianghu commented on HUDI-1089: --- done via master branch : 1f7add92916c37b05be270d9c75a9042134ec506

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-10-02 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r498806514 ## File path: hudi-cli/pom.xml ## @@ -148,7 +148,14 @@ org.apache.hudi - hudi-client + hudi-client-common +

[jira] [Updated] (HUDI-1313) Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1313: - Labels: pull-request-available (was: ) > Rename hudi-spark-client module to hudi-client-spark >

[GitHub] [hudi] wangxianghu opened a new pull request #2139: [HUDI-1313] Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread GitBox
wangxianghu opened a new pull request #2139: URL: https://github.com/apache/hudi/pull/2139 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[jira] [Assigned] (HUDI-1313) Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu reassigned HUDI-1313: - Assignee: wangxianghu > Rename hudi-spark-client module to hudi-client-spark >

[jira] [Created] (HUDI-1313) Rename hudi-spark-client module to hudi-client-spark

2020-10-02 Thread wangxianghu (Jira)
wangxianghu created HUDI-1313: - Summary: Rename hudi-spark-client module to hudi-client-spark Key: HUDI-1313 URL: https://issues.apache.org/jira/browse/HUDI-1313 Project: Apache Hudi Issue Type:

[GitHub] [hudi] SteNicholas removed a comment on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-10-02 Thread GitBox
SteNicholas removed a comment on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-702550693 @linshan-ma You could use the current commit to check your test case again. IMO, the current commit has already resolved your problem.

[GitHub] [hudi] hotienvu opened a new issue #2138: [SUPPORT] Failed to sync to hive using nested partition columns

2020-10-02 Thread GitBox
hotienvu opened a new issue #2138: URL: https://github.com/apache/hudi/issues/2138 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster

[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2020-10-02 Thread GitBox
sassai commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-702568499 @bvaradar: Sorry for the late reply. I was not able to investigate this issue further until now. In the meantime I updated Hudi to 0.6.0 to check if the issue still occurs.

[GitHub] [hudi] SteNicholas removed a comment on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-10-02 Thread GitBox
SteNicholas removed a comment on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-700524082 > According to this([https://github.com/apache/hudi/issues/2051)](https://github.com/apache/hudi/issues/2051%EF%BC%89) test。I can't get the results I want。When we set

[GitHub] [hudi] SteNicholas commented on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-10-02 Thread GitBox
SteNicholas commented on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-702550693 @linshan-ma You could use the current commit to check your test case again. IMO, the current commit has already resolved your problem.

[GitHub] [hudi] SteNicholas removed a comment on pull request #2111: [HUDI-1234] Insert new records regardless of small file when using insert operation

2020-10-02 Thread GitBox
SteNicholas removed a comment on pull request #2111: URL: https://github.com/apache/hudi/pull/2111#issuecomment-701890658 > > > According to this([https://github.com/apache/hudi/issues/2051)](https://github.com/apache/hudi/issues/2051%EF%BC%89) test。I can't get the results I want。When we