[GitHub] [hudi] zherenyu831 commented on issue #2041: [SUPPORT] hudi-spark-bundle 0.6.0 is not able to query data, two problems

2020-08-25 Thread GitBox
zherenyu831 commented on issue #2041: URL: https://github.com/apache/hudi/issues/2041#issuecomment-680663016 @bvaradar Thank you for quick response > I have seen this kind of issue (first problem) if you use prebuilt version of spark. Spark 2.x is prebuilt with scala 2.11 and

[GitHub] [hudi] bvaradar commented on issue #2041: [SUPPORT] hudi-spark-bundle 0.6.0 is not able to query data, two problems

2020-08-25 Thread GitBox
bvaradar commented on issue #2041: URL: https://github.com/apache/hudi/issues/2041#issuecomment-680661912 Regarding second problem, can you use Hive 2.x (e.g: 2.3.3) version. This is what hudi is compiled with. This is an

[GitHub] [hudi] bvaradar commented on issue #2041: [SUPPORT] hudi-spark-bundle 0.6.0 is not able to query data, two problems

2020-08-25 Thread GitBox
bvaradar commented on issue #2041: URL: https://github.com/apache/hudi/issues/2041#issuecomment-680656042 @zherenyu831 : I have seen this kind of issue (first problem) if you use prebuilt version of spark. Spark 2.x is prebuilt with scala 2.11 and you are using Hudi built with scala 2.12.

[GitHub] [hudi] zherenyu831 edited a comment on issue #2041: [SUPPORT] hudi-spark-bundle 0.6.0 is not able to query data, two problems

2020-08-25 Thread GitBox
zherenyu831 edited a comment on issue #2041: URL: https://github.com/apache/hudi/issues/2041#issuecomment-680653572 Fixed second problem by adding hive-exec into pom ``` "org.apache.hive" % "hive-exec" % "3.1.2" % Provided, ```

[GitHub] [hudi] zherenyu831 commented on issue #2041: [SUPPORT] hudi-spark-bundle 0.6.0 is not able to query data, two problems

2020-08-25 Thread GitBox
zherenyu831 commented on issue #2041: URL: https://github.com/apache/hudi/issues/2041#issuecomment-680653572 Fixed second problem by adding hive-exec on local ``` "org.apache.hive" % "hive-exec" % "3.1.2" % Provided, ```

[GitHub] [hudi] zherenyu831 opened a new issue #2041: [SUPPORT] hudi-spark-bundle 0.6.0 is not able to query data, two problems

2020-08-25 Thread GitBox
zherenyu831 opened a new issue #2041: URL: https://github.com/apache/hudi/issues/2041 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster

[jira] [Updated] (HUDI-1042) [Umbrella] Support clustering on filegroups

2020-08-25 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1042: - Fix Version/s: 0.7.0 > [Umbrella] Support clustering on filegroups >

[GitHub] [hudi] n3nash merged pull request #2037: [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables

2020-08-25 Thread GitBox
n3nash merged pull request #2037: URL: https://github.com/apache/hudi/pull/2037 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[hudi] branch master updated: [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables

2020-08-25 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new f468c20 [HUDI-1226] Fix ComplexKeyGenerator

[GitHub] [hudi] sathyaprakashg commented on pull request #2012: HUDI-1129 Deltastreamer Add support for schema evaluation

2020-08-25 Thread GitBox
sathyaprakashg commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-680502038 Thanks @sbernauer for the code example. I fixed it now This is an automated message from the Apache Git

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

2020-08-25 Thread GitBox
bvaradar commented on issue #2020: URL: https://github.com/apache/hudi/issues/2020#issuecomment-680450091 @dm-tran : No, that should be fine. Hudi logic takes care of Spark retries. So, that should not be the issue. Given, that you are able to reproduce very easily and I have not seen

[GitHub] [hudi] bvaradar commented on issue #2017: multi-level partition

2020-08-25 Thread GitBox
bvaradar commented on issue #2017: URL: https://github.com/apache/hudi/issues/2017#issuecomment-680443395 @Yogashri12 : hoodie.datasource.write.keygenerator.class should be set to org.apache.hudi.keygen.ComplexKeyGenerator and not org.apache.hudi.ComplexKeyGenerator

[GitHub] [hudi] bvaradar closed issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-25 Thread GitBox
bvaradar closed issue #2005: URL: https://github.com/apache/hudi/issues/2005 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-25 Thread GitBox
bvaradar commented on issue #2005: URL: https://github.com/apache/hudi/issues/2005#issuecomment-680432778 Thanks @cdmikechen for clarifying. Agree on not having to instantiate the input format. @garyli1019 has a PR for this : https://github.com/apache/hudi/pull/2008 Closing this

[GitHub] [hudi] bvaradar closed issue #2038: Required hive version at least 2.3.1, Right?

2020-08-25 Thread GitBox
bvaradar closed issue #2038: URL: https://github.com/apache/hudi/issues/2038 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #2038: Required hive version at least 2.3.1, Right?

2020-08-25 Thread GitBox
bvaradar commented on issue #2038: URL: https://github.com/apache/hudi/issues/2038#issuecomment-680424479 Yes, Hudi is compiled with hive 2.x This is an automated message from the Apache Git Service. To respond to the

[jira] [Closed] (HUDI-1224) Fix HoodieIOException: No content to map due to end-of-input

2020-08-25 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu closed HUDI-1224. - Resolution: Duplicate > Fix HoodieIOException: No content to map due to end-of-input >

[jira] [Updated] (HUDI-1224) Fix HoodieIOException: No content to map due to end-of-input

2020-08-25 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1224: -- Status: Open (was: New) > Fix HoodieIOException: No content to map due to end-of-input >

[jira] [Commented] (HUDI-1224) Fix HoodieIOException: No content to map due to end-of-input

2020-08-25 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184858#comment-17184858 ] wangxianghu commented on HUDI-1224: --- I found an

[jira] [Updated] (HUDI-830) Fix issues related to running the test suite in docker due to Hive 2.x

2020-08-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-830: Labels: pull-request-available (was: ) > Fix issues related to running the test suite in docker due

[GitHub] [hudi] xushiyan opened a new pull request #2040: [HUDI-781] Add HoodieWrittableTestTable

2020-08-25 Thread GitBox
xushiyan opened a new pull request #2040: URL: https://github.com/apache/hudi/pull/2040 ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc

[GitHub] [hudi] modi95 opened a new pull request #2039: [HUDI-830][WIP] Test Suite Fixes

2020-08-25 Thread GitBox
modi95 opened a new pull request #2039: URL: https://github.com/apache/hudi/pull/2039 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] yanghua commented on pull request #1772: [HUDI-978]Specify version information for each component separately

2020-08-25 Thread GitBox
yanghua commented on pull request #1772: URL: https://github.com/apache/hudi/pull/1772#issuecomment-680390778 @Trevor-zhang Can you help to review this PR? 3ks. This is an automated message from the Apache Git Service. To

[GitHub] [hudi] vinothchandar commented on pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-08-25 Thread GitBox
vinothchandar commented on pull request #1804: URL: https://github.com/apache/hudi/pull/1804#issuecomment-680381474 actually compaction is failing. ``` INFO: 20/08/25 01:10:42 ERROR HoodieCompactor: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in

[jira] [Updated] (HUDI-532) Add java doc for hudi test suite test classes

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-532: -- Priority: Minor (was: Major) > Add java doc for hudi test suite test classes >

[jira] [Closed] (HUDI-532) Add java doc for hudi test suite test classes

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-532. - Fix Version/s: 0.6.1 Resolution: Done Done via asf-site branch: df8f099c999379038bf189491ff71dfb958806dd >

[GitHub] [hudi] yanghua merged pull request #1901: [HUDI-532] Add java doc for the test classes of hudi test suite

2020-08-25 Thread GitBox
yanghua merged pull request #1901: URL: https://github.com/apache/hudi/pull/1901 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-532] Add java doc for the test classes of hudi test suite (#1901)

2020-08-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new df8f099 [HUDI-532] Add java doc for the test

[jira] [Updated] (HUDI-1223) Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1223: --- Priority: Trivial (was: Major) > Remove unused UpdateHandler class in HoodieCopyOnWriteTable >

[jira] [Closed] (HUDI-1223) Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1223. -- Resolution: Done Done via master branch: 7e68c42eb19bfe955217f0e8e25acbeb4a6974b5 > Remove unused

[jira] [Updated] (HUDI-1223) Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1223: --- Issue Type: Improvement (was: Task) > Remove unused UpdateHandler class in HoodieCopyOnWriteTable >

[jira] [Updated] (HUDI-1223) Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1223: --- Status: Open (was: New) > Remove unused UpdateHandler class in HoodieCopyOnWriteTable >

[GitHub] [hudi] yanghua merged pull request #2032: [HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread GitBox
yanghua merged pull request #2032: URL: https://github.com/apache/hudi/pull/2032 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable (#2032)

2020-08-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7e68c42 [HUDI-1223] Remove unused

[GitHub] [hudi] yanghua commented on pull request #2032: [HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread GitBox
yanghua commented on pull request #2032: URL: https://github.com/apache/hudi/pull/2032#issuecomment-680376811 > ## _Tips_ > * _Thank you very much for contributing to Apache Hudi._ > * _Please review https://hudi.apache.org/contributing.html before opening a pull request._ > >

[jira] [Assigned] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-25 Thread cdmikechen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cdmikechen reassigned HUDI-1225: Assignee: cdmikechen > Avro Date logical type not handled correctly when converting to Spark Row >

[GitHub] [hudi] cdmikechen commented on issue #2034: [SUPPORT] DateType can't be transformed to right data by kafka avro

2020-08-25 Thread GitBox
cdmikechen commented on issue #2034: URL: https://github.com/apache/hudi/issues/2034#issuecomment-680364028 @bvaradar Yes, of course. I will deal with it recently. I've test a case in hive, I found hive may also parse date type as int and display it as `-mm-dd`, according to day

[GitHub] [hudi] cdmikechen edited a comment on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-25 Thread GitBox
cdmikechen edited a comment on issue #2005: URL: https://github.com/apache/hudi/issues/2005#issuecomment-680351956 @bvaradar Thanks for your reminder, I finally found my mistake: I use hudi in a maven project with spark dependencies. I noticed that hudi remove

[GitHub] [hudi] cdmikechen commented on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-25 Thread GitBox
cdmikechen commented on issue #2005: URL: https://github.com/apache/hudi/issues/2005#issuecomment-680351956 @bvaradar Thanks for your reminder, I finally found my mistake: I use hudi in a maven project with spark dependencies. I noticed that hudi remove

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1953: [HUDI-1181] Fix decimal type display issue for record key field

2020-08-25 Thread GitBox
zhedoubushishi commented on a change in pull request #1953: URL: https://github.com/apache/hudi/pull/1953#discussion_r476901023 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -433,23 +434,46 @@ private static Object

[GitHub] [hudi] umehrot2 commented on a change in pull request #1953: [HUDI-1181] Fix decimal type display issue for record key field

2020-08-25 Thread GitBox
umehrot2 commented on a change in pull request #1953: URL: https://github.com/apache/hudi/pull/1953#discussion_r476840436 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -433,23 +434,46 @@ private static Object

[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-25 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-680303027 Thanks for your explanation, @bvaradar. Closed this ticket. This is an automated message from the Apache Git Service.

[GitHub] [hudi] jiegzhan closed issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-25 Thread GitBox
jiegzhan closed issue #1980: URL: https://github.com/apache/hudi/issues/1980 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1975: [HUDI-1194][WIP] Reorganize HoodieHiveClient based on the way to call Hive API

2020-08-25 Thread GitBox
zhedoubushishi commented on a change in pull request #1975: URL: https://github.com/apache/hudi/pull/1975#discussion_r476794521 ## File path: pom.xml ## @@ -94,7 +94,7 @@ 2.9.9 2.7.3 org.apache.hive -2.3.1 +2.3.6 Review comment: When running hive

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1975: [HUDI-1194][WIP] Reorganize HoodieHiveClient based on the way to call Hive API

2020-08-25 Thread GitBox
zhedoubushishi commented on a change in pull request #1975: URL: https://github.com/apache/hudi/pull/1975#discussion_r476780680 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -303,8 +305,28 @@ object DataSourceWriteOptions { val

[GitHub] [hudi] satishkotha commented on pull request #2037: [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables

2020-08-25 Thread GitBox
satishkotha commented on pull request #2037: URL: https://github.com/apache/hudi/pull/2037#issuecomment-680287507 > > @satishkotha was it throwing an exception before this change ? > > From https://issues.apache.org/jira/browse/HUDI-1226, > > 1. If we pass empty

[GitHub] [hudi] rubenssoto commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-25 Thread GitBox
rubenssoto commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-680284508 @umehrot2 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] rubenssoto commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-25 Thread GitBox
rubenssoto commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-680284122 @bvaradar is this problem was solved in 0.6 because I read that rfc 15 is in experimental. And Athena already support?

[GitHub] [hudi] satishkotha commented on pull request #2037: [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables

2020-08-25 Thread GitBox
satishkotha commented on pull request #2037: URL: https://github.com/apache/hudi/pull/2037#issuecomment-680279264 > @satishkotha was it throwing an exception before this change ? From https://issues.apache.org/jira/browse/HUDI-1226, 1) If we pass empty string(-hoodie-conf

[hudi] branch master updated (492ddcb -> cc555ba)

2020-08-25 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 492ddcb [HUDI-1191] Add incremental meta client API to query partitions modified in a time window add cc555ba

[GitHub] [hudi] n3nash merged pull request #2036: [HUDI-1133] Tune buffer sizes for the diskbased external spillable map

2020-08-25 Thread GitBox
n3nash merged pull request #2036: URL: https://github.com/apache/hudi/pull/2036 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] satishkotha commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-08-25 Thread GitBox
satishkotha commented on issue #1954: URL: https://github.com/apache/hudi/issues/1954#issuecomment-680262762 If a single column as key works for you, you can also try hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGenerator

[GitHub] [hudi] satishkotha commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-08-25 Thread GitBox
satishkotha commented on issue #1954: URL: https://github.com/apache/hudi/issues/1954#issuecomment-680259178 @tooptoop4 For non-partitioned tables, data is typically stored in base directory (s3://redact/my2/multpk7/). Looks like partitionpath field you specified is getting

[jira] [Updated] (HUDI-1226) ComplexKeyGenerator doesnt work for non partitioned tables

2020-08-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1226: - Labels: pull-request-available (was: ) > ComplexKeyGenerator doesnt work for non partitioned

[GitHub] [hudi] satishkotha opened a new pull request #2037: [HUDI-1226] Fix ComplexKeyGenerator for non-partitioned tables

2020-08-25 Thread GitBox
satishkotha opened a new pull request #2037: URL: https://github.com/apache/hudi/pull/2037 ## What is the purpose of the pull request ComplexKeyGenerator getPartitionPath doesnt seem to work well with non-partitioned tables. Fix it and add test case ## Brief change log -

[jira] [Created] (HUDI-1226) ComplexKeyGenerator doesnt work for non partitioned tables

2020-08-25 Thread satish (Jira)
satish created HUDI-1226: Summary: ComplexKeyGenerator doesnt work for non partitioned tables Key: HUDI-1226 URL: https://issues.apache.org/jira/browse/HUDI-1226 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-1137) [Test Suite] Add option to configure different path selector

2020-08-25 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1137: - Status: Closed (was: Patch Available) > [Test Suite] Add option to configure different path selector >

[jira] [Updated] (HUDI-1137) [Test Suite] Add option to configure different path selector

2020-08-25 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1137: - Status: Patch Available (was: In Progress) > [Test Suite] Add option to configure different path selector >

[jira] [Updated] (HUDI-1137) [Test Suite] Add option to configure different path selector

2020-08-25 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1137: - Status: Open (was: New) > [Test Suite] Add option to configure different path selector >

[jira] [Updated] (HUDI-1137) [Test Suite] Add option to configure different path selector

2020-08-25 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1137: - Status: In Progress (was: Open) > [Test Suite] Add option to configure different path selector >

[GitHub] [hudi] ashishmgofficial edited a comment on issue #2007: [SUPPORT] Is Timeline metadata queryable ?

2020-08-25 Thread GitBox
ashishmgofficial edited a comment on issue #2007: URL: https://github.com/apache/hudi/issues/2007#issuecomment-680233990 @bvaradar Yes, we want to keep the metadata details and if possible store it somewhere for other analytical purposes and for audit

[GitHub] [hudi] ashishmgofficial commented on issue #2007: [SUPPORT] Is Timeline metadata queryable ?

2020-08-25 Thread GitBox
ashishmgofficial commented on issue #2007: URL: https://github.com/apache/hudi/issues/2007#issuecomment-680233990 Yes, we want to keep the metadata details and if possible store it somewhere for other analytical purposes and for audit

[GitHub] [hudi] bvaradar commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-08-25 Thread GitBox
bvaradar commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-680231795 @nsivabalan : Can you reply when you get a chance ? Thanks, Balaji.V This is an automated message from the

[hudi] branch master updated: [HUDI-1191] Add incremental meta client API to query partitions modified in a time window

2020-08-25 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 492ddcb [HUDI-1191] Add incremental meta

[GitHub] [hudi] n3nash merged pull request #1964: [HUDI-1191] Add incremental meta client API to query partitions changed

2020-08-25 Thread GitBox
n3nash merged pull request #1964: URL: https://github.com/apache/hudi/pull/1964 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #2034: [SUPPORT] DateType can't be transformed to right data by kafka avro

2020-08-25 Thread GitBox
bvaradar commented on issue #2034: URL: https://github.com/apache/hudi/issues/2034#issuecomment-680230551 Looking at the constructor of java.sql.Date, Date(long date) :Constructs a Date object using the given milliseconds time value. It expects time resolution in milliseconds.

[jira] [Updated] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-25 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1225: - Status: Open (was: New) > Avro Date logical type not handled correctly when converting

[jira] [Created] (HUDI-1225) Avro Date logical type not handled correctly when converting to Spark Row

2020-08-25 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1225: Summary: Avro Date logical type not handled correctly when converting to Spark Row Key: HUDI-1225 URL: https://issues.apache.org/jira/browse/HUDI-1225

[GitHub] [hudi] shashwatsrivastava94 commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-08-25 Thread GitBox
shashwatsrivastava94 commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-680216583 Was wondering if there is an update here! Running a PoC and would love to use Hudi + Spark 3 if possible. Thanks!

[GitHub] [hudi] bvaradar commented on issue #2005: [SUPPORT] hudi hive-sync in master branch (0.6.1) can not run by spark

2020-08-25 Thread GitBox
bvaradar commented on issue #2005: URL: https://github.com/apache/hudi/issues/2005#issuecomment-680174568 @cdmikechen : The integration test actually brings up a dockerized environment and runs spark-submit command. So, the dependencies specified in hudi-integ-test/pom.xml should not be

[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

2020-08-25 Thread GitBox
bvaradar commented on issue #1954: URL: https://github.com/apache/hudi/issues/1954#issuecomment-680168375 @satishkotha : Would you be able to help reproduce this ? This is an automated message from the Apache Git Service.

[GitHub] [hudi] kpurella commented on issue #2001: NPE While writing data to same partition on S3

2020-08-25 Thread GitBox
kpurella commented on issue #2001: URL: https://github.com/apache/hudi/issues/2001#issuecomment-680144616 Resolved after addressing partitionpath issue. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] kpurella closed issue #2001: NPE While writing data to same partition on S3

2020-08-25 Thread GitBox
kpurella closed issue #2001: URL: https://github.com/apache/hudi/issues/2001 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] kpurella commented on issue #2001: NPE While writing data to same partition on S3

2020-08-25 Thread GitBox
kpurella commented on issue #2001: URL: https://github.com/apache/hudi/issues/2001#issuecomment-680144291 @bvaradar Thank you for your response. I was able to resolve this issue. I am building invalid partitionpath which is causing the issue. - Thank you.

[GitHub] [hudi] bvaradar commented on issue #2031: [SUPPORT] java.lang.NoSuchMethodError: ExpressionEncoder.fromRow

2020-08-25 Thread GitBox
bvaradar commented on issue #2031: URL: https://github.com/apache/hudi/issues/2031#issuecomment-680118462 @vinothsiva1989 : This is likely due to scala compiler version. I see that you are using 2.12 but most/all of spark 2.x versions comes prepackaged with 2.11 only. Can you check if you

[GitHub] [hudi] bvaradar commented on issue #2029: Records seen with _hoodie_is_deleted set to true on non-existing record

2020-08-25 Thread GitBox
bvaradar commented on issue #2029: URL: https://github.com/apache/hudi/issues/2029#issuecomment-680115497 @nsivabalan : Please take a look. This is an automated message from the Apache Git Service. To respond to the

[jira] [Updated] (HUDI-1133) Tune buffer sizes for the diskbased external spillable map

2020-08-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1133: - Labels: pull-request-available (was: ) > Tune buffer sizes for the diskbased external spillable

[GitHub] [hudi] nbalajee opened a new pull request #2036: [HUDI-1133] Tune buffer sizes for the diskbased external spillable map

2020-08-25 Thread GitBox
nbalajee opened a new pull request #2036: URL: https://github.com/apache/hudi/pull/2036 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] wangxianghu commented on pull request #2035: [HUDI-1216] Create chinese version of pyspark quickstart example

2020-08-25 Thread GitBox
wangxianghu commented on pull request #2035: URL: https://github.com/apache/hudi/pull/2035#issuecomment-680083731 @yanghu please take a look when free This is an automated message from the Apache Git Service. To respond to

[jira] [Updated] (HUDI-1216) Create chinese version of pyspark quickstart example

2020-08-25 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1216: - Labels: pull-request-available (was: ) > Create chinese version of pyspark quickstart example >

[GitHub] [hudi] wangxianghu opened a new pull request #2035: [HUDI-1216] Create chinese version of pyspark quickstart example

2020-08-25 Thread GitBox
wangxianghu opened a new pull request #2035: URL: https://github.com/apache/hudi/pull/2035 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] vinothchandar commented on pull request #1804: [HUDI-960] Implementation of the HFile base and log file format.

2020-08-25 Thread GitBox
vinothchandar commented on pull request #1804: URL: https://github.com/apache/hudi/pull/1804#issuecomment-680017703 ``` [ERROR] Failures: [ERROR]

[GitHub] [hudi] cdmikechen opened a new issue #2034: [SUPPORT] DateType can't be transformed to right data by kafka avro

2020-08-25 Thread GitBox
cdmikechen opened a new issue #2034: URL: https://github.com/apache/hudi/issues/2034 **Describe the problem you faced** If using DeltaStreamer to get kafka avro data to hudi, DateType can't be transformed to right data (like `2020-8-24`). DateType always shows `1970-01-01`.

[GitHub] [hudi] wangxianghu commented on pull request #1946: [HUDI-1176]Support log4j2 config

2020-08-25 Thread GitBox
wangxianghu commented on pull request #1946: URL: https://github.com/apache/hudi/pull/1946#issuecomment-679987859 > @wangxianghu Can you help to verify and review this PR? sure, will do This is an automated message

[GitHub] [hudi] wangxianghu commented on pull request #2033: [HUDI-1222] Introduce MergeHelper.UpdateHandler as independent class

2020-08-25 Thread GitBox
wangxianghu commented on pull request #2033: URL: https://github.com/apache/hudi/pull/2033#issuecomment-679987475 > for such a helper struct like class, it makes sense to be inline right? can you please help me understand the reason behind this refactor. @vinothchandar thanks for

[GitHub] [hudi] Yogashri12 commented on issue #2017: multi-level partition

2020-08-25 Thread GitBox
Yogashri12 commented on issue #2017: URL: https://github.com/apache/hudi/issues/2017#issuecomment-679967819 how to use ComplexKeyGenerator in pyspark. hudi_options = { 'hoodie.table.name': tableName, 'hoodie.datasource.write.recordkey.field': 'ID',

[GitHub] [hudi] vinothchandar commented on pull request #2033: [HUDI-1222] Introduce MergeHelper.UpdateHandler as independent class

2020-08-25 Thread GitBox
vinothchandar commented on pull request #2033: URL: https://github.com/apache/hudi/pull/2033#issuecomment-679965694 for such a helper struct like class, it makes sense to be inline right? can you please help me understand the reason behind this refactor.

[jira] [Assigned] (HUDI-1221) Ensure docker demo page reflects the latest support on all query engines

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned HUDI-1221: -- Assignee: wangxianghu (was: vinoyang) > Ensure docker demo page reflects the latest support on all

[GitHub] [hudi] yanghua commented on pull request #2022: Code optimization on hudi-common moudle

2020-08-25 Thread GitBox
yanghua commented on pull request #2022: URL: https://github.com/apache/hudi/pull/2022#issuecomment-679963589 @Trevor-zhang Please follow the contributing guidelines, e.g. the title of the PR. This is an automated message

[GitHub] [hudi] yanghua commented on pull request #1946: [HUDI-1176]Support log4j2 config

2020-08-25 Thread GitBox
yanghua commented on pull request #1946: URL: https://github.com/apache/hudi/pull/1946#issuecomment-679962745 @wangxianghu Can you help to verify and review this PR? This is an automated message from the Apache Git Service.

[jira] [Closed] (HUDI-1218) Introduce BulkInsertSortMode as Independent class

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1218. -- Resolution: Done Done via master branch: 6a4dc7384c8f7efbbce729b8e57b3eaaf5cab104 > Introduce

[jira] [Updated] (HUDI-1218) Introduce BulkInsertSortMode as Independent class

2020-08-25 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1218: --- Status: Open (was: New) > Introduce BulkInsertSortMode as Independent class >

[GitHub] [hudi] yanghua merged pull request #2021: [HUDI-1218] Introduce BulkInsertSortMode as Independent class

2020-08-25 Thread GitBox
yanghua merged pull request #2021: URL: https://github.com/apache/hudi/pull/2021 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-1218] Introduce BulkInsertSortMode as Independent class (#2021)

2020-08-25 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6a4dc73 [HUDI-1218] Introduce

[jira] [Created] (HUDI-1224) Fix HoodieIOException: No content to map due to end-of-input

2020-08-25 Thread wangxianghu (Jira)
wangxianghu created HUDI-1224: - Summary: Fix HoodieIOException: No content to map due to end-of-input Key: HUDI-1224 URL: https://issues.apache.org/jira/browse/HUDI-1224 Project: Apache Hudi

[GitHub] [hudi] wangxianghu commented on pull request #2032: [HUDI-1223] Remove unused UpdateHandler class in HoodieCopyOnWriteTable

2020-08-25 Thread GitBox
wangxianghu commented on pull request #2032: URL: https://github.com/apache/hudi/pull/2032#issuecomment-679927681 @yanghua please take a look when free This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] wangxianghu commented on pull request #2033: [HUDI-1222] Introduce MergeHelper.UpdateHandler as independent class

2020-08-25 Thread GitBox
wangxianghu commented on pull request #2033: URL: https://github.com/apache/hudi/pull/2033#issuecomment-679927856 @yanghua please take a look when free This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

2020-08-25 Thread GitBox
dm-tran commented on issue #2020: URL: https://github.com/apache/hudi/issues/2020#issuecomment-679911709 Thank you @bvaradar > Can you check if more than 1 writers are concurrently happening. Only the structured streaming application writes to the Hudi table, so there is

[GitHub] [hudi] sbernauer commented on pull request #2012: HUDI-1129 Deltastreamer Add support for schema evaluation

2020-08-25 Thread GitBox
sbernauer commented on pull request #2012: URL: https://github.com/apache/hudi/pull/2012#issuecomment-679910464 Hi @sathyaprakashg, thanks for your work! When i move the new field `evoluted_optional_union_field` to a place not at the end of the schema (somewhere in the middle) i get

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

2020-08-25 Thread GitBox
bvaradar commented on issue #2020: URL: https://github.com/apache/hudi/issues/2020#issuecomment-679905218 @dm-tran : Thanks for the details. The only possible explanation that I can think of is more than 1 writers are concurrently running that can cause this. Can you check if more than 1

  1   2   >