[GitHub] [hudi] prashantwason commented on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-23 Thread GitBox
prashantwason commented on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-698099305 @umehrot2 Directly using hudi datasource or delta streamer for testing should work too. I haven't testing this yet so please let me know if that doesn't work.

[GitHub] [hudi] JiaDe-Wu opened a new issue #2108: [SUPPORT]Submit rollback -->Pending job --> kill YARN --> lost data

2020-09-23 Thread GitBox
JiaDe-Wu opened a new issue #2108: URL: https://github.com/apache/hudi/issues/2108 Hi I used Hudi-CLI to roll back and found that yarn has no resources, and I have been waiting for the task to start. Then manually killed the queued yarn and returned to hudi to find that the data to

[GitHub] [hudi] bvaradar commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-23 Thread GitBox
bvaradar commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r493957561 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -554,14 +608,16 @@ protected

[GitHub] [hudi] wangxianghu edited a comment on pull request #2105: [MINOR] Fix ClassCastException when use QuickstartUtils generate data

2020-09-23 Thread GitBox
wangxianghu edited a comment on pull request #2105: URL: https://github.com/apache/hudi/pull/2105#issuecomment-698074779 @bhasudha This exception occurs because the methods to generate data in `QuickstartUtils` treat `ts` field as `long` type, while the schema provided by

[GitHub] [hudi] wangxianghu edited a comment on pull request #2105: [MINOR] Fix ClassCastException when use QuickstartUtils generate data

2020-09-23 Thread GitBox
wangxianghu edited a comment on pull request #2105: URL: https://github.com/apache/hudi/pull/2105#issuecomment-698074779 @bhasudha This exception occurs because the methods to generate data in `QuickstartUtils` treat `ts` filed as `long` type, while the schema provided by

[GitHub] [hudi] wangxianghu commented on pull request #2105: [MINOR] Fix ClassCastException when use QuickstartUtils generate data

2020-09-23 Thread GitBox
wangxianghu commented on pull request #2105: URL: https://github.com/apache/hudi/pull/2105#issuecomment-698074779 @bhasudha This exception occurs because the methods to generate data in `QuickstartUtils` treated `ts` filed as `long` type, while the schema provided by `QuickstartUtils`

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r494000687 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/SparkAppendHandleFactory.java ## @@ -0,0 +1,45 @@ +/* + * Licensed to the

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r493999629 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AsyncCleanerService.java ## @@ -52,19 +52,6 @@ protected

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r493999114 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java ## @@ -161,11 +108,11 @@ private

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r493998442 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/common/HoodieSparkEngineContext.java ## @@ -0,0 +1,56 @@ +/* + * Licensed to

[GitHub] [hudi] wangxianghu commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-698069429 > @wangxianghu on the checkstyle change to bump up the line count to 500, I think we should revert to 200 as it is now. > I checked out a few of the issues. they can be

[GitHub] [hudi] umehrot2 commented on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-23 Thread GitBox
umehrot2 commented on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-698068512 > @umehrot2 I have updated the RFC doc with [details on how to test

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r493996642 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java ## @@ -134,7 +138,7 @@ private void

[GitHub] [hudi] yanghua commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
yanghua commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r493978853 ## File path: hudi-cli/pom.xml ## @@ -148,7 +148,14 @@ org.apache.hudi - hudi-client + hudi-client-common +

[hudi] branch asf-site updated: Travis CI build asf-site

2020-09-23 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 5e612df Travis CI build asf-site 5e612df is

[GitHub] [hudi] vinothchandar commented on issue #1961: [SUPPORT] Jetty Not able to find method java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V on Databri

2020-09-23 Thread GitBox
vinothchandar commented on issue #1961: URL: https://github.com/apache/hudi/issues/1961#issuecomment-698020199 @saumyasuhagiya interesting that you needed both the bundle and the `hudi-spark` jar. did it not work with just `spark.executor.extraClassPath

[jira] [Updated] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

2020-09-23 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1289: - Fix Version/s: 0.6.1 > Using hbase index in spark hangs in Hudi 0.6.0 >

[jira] [Updated] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

2020-09-23 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1289: - Status: Open (was: New) > Using hbase index in spark hangs in Hudi 0.6.0 >

[GitHub] [hudi] vinothchandar commented on a change in pull request #2085: [HUDI-1209] Properties File must be optional when running deltastreamer

2020-09-23 Thread GitBox
vinothchandar commented on a change in pull request #2085: URL: https://github.com/apache/hudi/pull/2085#discussion_r493940237 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java ## @@ -112,9 +112,14 @@ public

[hudi] branch asf-site updated: [DOC] Adding youtube link for DC_THURS chat on Hudi (#2107)

2020-09-23 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 4db7bf4 [DOC] Adding youtube link for

[jira] [Updated] (HUDI-1299) Add youtube link for DC_THURS chat

2020-09-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1299: - Labels: pull-request-available (was: ) > Add youtube link for DC_THURS chat >

[GitHub] [hudi] vinothchandar merged pull request #2107: [HUDI-1299] : Adding youtube link for DC_THURS chat on Hudi

2020-09-23 Thread GitBox
vinothchandar merged pull request #2107: URL: https://github.com/apache/hudi/pull/2107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[jira] [Created] (HUDI-1299) Add youtube link for DC_THURS chat

2020-09-23 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1299: - Summary: Add youtube link for DC_THURS chat Key: HUDI-1299 URL: https://issues.apache.org/jira/browse/HUDI-1299 Project: Apache Hudi Issue Type:

[GitHub] [hudi] n3nash opened a new pull request #2107: Adding youtube link for DC_THURS chat on Hudi

2020-09-23 Thread GitBox
n3nash opened a new pull request #2107: URL: https://github.com/apache/hudi/pull/2107 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] satishkotha commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-23 Thread GitBox
satishkotha commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r493882291 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java ## @@ -251,6 +262,28 @@ private

[GitHub] [hudi] satishkotha commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-23 Thread GitBox
satishkotha commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r493881151 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -738,7 +799,9 @@ private String

[GitHub] [hudi] ashishmgofficial edited a comment on issue #2104: [SUPPORT] MOR Hive sync - _rt table read issue

2020-09-23 Thread GitBox
ashishmgofficial edited a comment on issue #2104: URL: https://github.com/apache/hudi/issues/2104#issuecomment-697736256 @n3nash Its not a complicated data . I was just trying out Hudi MOR on some sample data which I used for testing out COW also. its just 6 lines of JSON in multiple

[GitHub] [hudi] ashishmgofficial commented on issue #2104: [SUPPORT] MOR Hive sync - _rt table read issue

2020-09-23 Thread GitBox
ashishmgofficial commented on issue #2104: URL: https://github.com/apache/hudi/issues/2104#issuecomment-697736256 @n3nash Its not a complicated data . I was just trying out Hudi MOR on some sample data which I used for testing out COW also. its just 6 lines of JSON in multiple batches.

[GitHub] [hudi] n3nash commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-23 Thread GitBox
n3nash commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-697709940 @eigakow Can you provide the stack trace which shows the originator class. If the Hive-Sync is affected in 0.6.0, we should see this exception being raised from a hoodie hive-sync code

[GitHub] [hudi] n3nash commented on issue #2098: [SUPPORT] File does not exisit(parquet) while reading Hudi Table from Spark

2020-09-23 Thread GitBox
n3nash commented on issue #2098: URL: https://github.com/apache/hudi/issues/2098#issuecomment-697705248 @ShortFinger For COW -> The number of versions to keep is a function of a) how frequently you run the ingestion job which may have updates b) how long running is the consumer of this

[GitHub] [hudi] n3nash commented on issue #2101: [SUPPORT]Unable to interpret Child JSON fields value as a separate columns rather it is loaded as one single field value. Any way to interpret that.

2020-09-23 Thread GitBox
n3nash commented on issue #2101: URL: https://github.com/apache/hudi/issues/2101#issuecomment-697675730 @getniz Your JSON is a nested data type which will end up being inferred as a nested AVRO [record](http://avro.apache.org/docs/current/spec.html#schema_record) type. My understanding

[GitHub] [hudi] n3nash commented on issue #2103: [SUPPORT] NullPointerException when using ComplexKeyGenerator

2020-09-23 Thread GitBox
n3nash commented on issue #2103: URL: https://github.com/apache/hudi/issues/2103#issuecomment-697658390 @sbernauer I believe this is being fixed by -> https://github.com/apache/hudi/pull/2093. Could you help review and validate the PR please ?

[GitHub] [hudi] n3nash commented on issue #2104: [SUPPORT] MOR Hive sync - _rt table read issue

2020-09-23 Thread GitBox
n3nash commented on issue #2104: URL: https://github.com/apache/hudi/issues/2104#issuecomment-697656772 @ashishmgofficial It looks like there was an exception reading the log file but there are no details of what actually caused it. It's hard to say what happened, I've created an issue

[jira] [Created] (HUDI-1298) Add better error messages when IOException occurs during log file reading

2020-09-23 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1298: - Summary: Add better error messages when IOException occurs during log file reading Key: HUDI-1298 URL: https://issues.apache.org/jira/browse/HUDI-1298 Project:

[GitHub] [hudi] bvaradar commented on pull request #1566: [HUDI-603]: DeltaStreamer can now fetch schema before every run in continuous mode

2020-09-23 Thread GitBox
bvaradar commented on pull request #1566: URL: https://github.com/apache/hudi/pull/1566#issuecomment-697570338 @pratyakshsharma : Pinging to see if you can take a look :) This is an automated message from the Apache Git

[GitHub] [hudi] bvaradar commented on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-09-23 Thread GitBox
bvaradar commented on pull request #1760: URL: https://github.com/apache/hudi/pull/1760#issuecomment-697565174 @bschell : Did you try the above change I mentioned ? Let me know This is an automated message from the Apache

[GitHub] [hudi] Karl-WangSK commented on pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
Karl-WangSK commented on pull request #2106: URL: https://github.com/apache/hudi/pull/2106#issuecomment-697391812 @leesf sry. Can u review this pr again when you are fre? This is an automated message from the Apache Git

[GitHub] [hudi] Karl-WangSK opened a new pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
Karl-WangSK opened a new pull request #2106: URL: https://github.com/apache/hudi/pull/2106 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose

[GitHub] [hudi] Karl-WangSK closed pull request #2096: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
Karl-WangSK closed pull request #2096: URL: https://github.com/apache/hudi/pull/2096 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] wangxianghu opened a new pull request #2105: [MINOR] Fix ClassCastException when use QuickstartUtils generate data

2020-09-23 Thread GitBox
wangxianghu opened a new pull request #2105: URL: https://github.com/apache/hudi/pull/2105 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] wangxianghu commented on pull request #2105: [MINOR] Fix ClassCastException when use QuickstartUtils generate data

2020-09-23 Thread GitBox
wangxianghu commented on pull request #2105: URL: https://github.com/apache/hudi/pull/2105#issuecomment-697367167 @yanghua please take a look when free This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] ivorzhou edited a comment on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2020-09-23 Thread GitBox
ivorzhou edited a comment on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-697058418 > Thank you for creating this PR. At this point, I am not fully convinced if we really need this logic. A missing column in the DataFrame could also mean that column has

[GitHub] [hudi] wangxianghu commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-23 Thread GitBox
wangxianghu commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-697358552 @vinothchandar @yanghua @leesf I have refactored this pr with `parallelDo` function and fixed the checkstyle(lineLength > 200) issue please take a look when free

[GitHub] [hudi] leesf commented on pull request #2096: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
leesf commented on pull request #2096: URL: https://github.com/apache/hudi/pull/2096#issuecomment-697323825 @Karl-WangSK Thanks for your contributing, I see there are lots of unrelated changes, pls rebase to latest master branch, thanks.

[GitHub] [hudi] leesf commented on a change in pull request #2096: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
leesf commented on a change in pull request #2096: URL: https://github.com/apache/hudi/pull/2096#discussion_r493515328 ## File path: hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java ## @@ -190,15 +190,27 @@ protected void rollBackInflightBootstrap() {

[GitHub] [hudi] leesf commented on a change in pull request #2096: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
leesf commented on a change in pull request #2096: URL: https://github.com/apache/hudi/pull/2096#discussion_r493514951 ## File path: hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java ## @@ -190,15 +190,27 @@ protected void rollBackInflightBootstrap() {

[GitHub] [hudi] ashishmgofficial opened a new issue #2104: [SUPPORT] Hive sync - _rt table read issue

2020-09-23 Thread GitBox
ashishmgofficial opened a new issue #2104: URL: https://github.com/apache/hudi/issues/2104 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? Yes - Join the mailing list to engage in conversations and

[GitHub] [hudi] prashantwason commented on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-23 Thread GitBox
prashantwason commented on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-697235280 >> Can all compaction strategies work off of metadata table itself? Does it have all the data Yes. Compaction Strategies are based on the sizes of the base and log

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-23 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 Remaining work items: - [x] 1. Support for rollbacks in MOR Table - [ ] 2. Rollback of metadata if commit eventually fails on dataset - [x] 3. HUDI-CLI

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-23 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 Remaining work items: - [x] 1. Support for rollbacks in MOR Table - [ ] 2. Rollback of metadata if commit eventually fails on dataset - [x] 3. HUDI-CLI

[GitHub] [hudi] sbernauer opened a new issue #2103: [SUPPORT] NullPointerException when using ComplexKeyGenerator

2020-09-23 Thread GitBox
sbernauer opened a new issue #2103: URL: https://github.com/apache/hudi/issues/2103 **Describe the problem you faced** When using the following configuration the deltastreamer crashes with ``` hoodie.datasource.write.recordkey.field=header.happenedTimestamp,header.eventId

[GitHub] [hudi] eigakow commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-23 Thread GitBox
eigakow commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-697174627 I have added the properties file contents to issue description This is an automated message from the Apache Git

[GitHub] [hudi] Karl-WangSK commented on pull request #2096: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-23 Thread GitBox
Karl-WangSK commented on pull request #2096: URL: https://github.com/apache/hudi/pull/2096#issuecomment-697155627 @leesf This is an automated message from the Apache Git Service. To respond to the message, please log on to