[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * 5a724c6c859d67980473db571c9a90b8babcf710 UNKNOWN
   * 22664f2385572b82dbfc4f1316a063308e647735 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2492)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2493)
 
   * 5f444fa98c3f1dbeac6fa6f9c1af98adc81f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


xushiyan commented on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932869164


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * 5a724c6c859d67980473db571c9a90b8babcf710 UNKNOWN
   * 22664f2385572b82dbfc4f1316a063308e647735 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2492)
 
   * 5f444fa98c3f1dbeac6fa6f9c1af98adc81f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * 5a724c6c859d67980473db571c9a90b8babcf710 UNKNOWN
   * 22664f2385572b82dbfc4f1316a063308e647735 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2492)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * be214ea66c7cdb5a5a0aee320db05bea336c39d0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2491)
 
   * 5a724c6c859d67980473db571c9a90b8babcf710 UNKNOWN
   * 22664f2385572b82dbfc4f1316a063308e647735 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2492)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * be214ea66c7cdb5a5a0aee320db05bea336c39d0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2491)
 
   * 5a724c6c859d67980473db571c9a90b8babcf710 UNKNOWN
   * 22664f2385572b82dbfc4f1316a063308e647735 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * be214ea66c7cdb5a5a0aee320db05bea336c39d0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2491)
 
   * 5a724c6c859d67980473db571c9a90b8babcf710 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * be214ea66c7cdb5a5a0aee320db05bea336c39d0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2491)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3745:
URL: https://github.com/apache/hudi/pull/3745#issuecomment-932858096


   
   ## CI report:
   
   * b84020e26d7990ce07fb7f6d821801806084e833 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2490)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * f4d1c821d7f4540c52e457734143b320455af802 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2489)
 
   * be214ea66c7cdb5a5a0aee320db05bea336c39d0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2491)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * f4d1c821d7f4540c52e457734143b320455af802 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2489)
 
   * be214ea66c7cdb5a5a0aee320db05bea336c39d0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3745:
URL: https://github.com/apache/hudi/pull/3745#issuecomment-932858096


   
   ## CI report:
   
   * b84020e26d7990ce07fb7f6d821801806084e833 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2490)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2021-10-02 Thread GitBox


hudi-bot commented on pull request #3745:
URL: https://github.com/apache/hudi/pull/3745#issuecomment-932858096


   
   ## CI report:
   
   * b84020e26d7990ce07fb7f6d821801806084e833 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2514) Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2514:
-
Labels: pull-request-available  (was: )

> Add default hiveTableSerdeProperties for Spark SQL when sync Hive
> -
>
> Key: HUDI-2514
> URL: https://issues.apache.org/jira/browse/HUDI-2514
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun opened a new pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2021-10-02 Thread GitBox


dongkelun opened a new pull request #3745:
URL: https://github.com/apache/hudi/pull/3745


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *Add default hiveTableSerdeProperties for Spark SQL when sync Hive*
   
   ## Brief change log
   
   *(for example:)*
 - *Add default hiveTableSerdeProperties for Spark SQL when sync Hive*
 - *Code optimization/code formatting*
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2514) Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2021-10-02 Thread Jira
董可伦 created HUDI-2514:
-

 Summary: Add default hiveTableSerdeProperties for Spark SQL when 
sync Hive
 Key: HUDI-2514
 URL: https://issues.apache.org/jira/browse/HUDI-2514
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Spark Integration
Reporter: 董可伦
Assignee: 董可伦
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * f4d1c821d7f4540c52e457734143b320455af802 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2489)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot edited a comment on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * f4d1c821d7f4540c52e457734143b320455af802 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2489)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


hudi-bot commented on pull request #3744:
URL: https://github.com/apache/hudi/pull/3744#issuecomment-932845780


   
   ## CI report:
   
   * f4d1c821d7f4540c52e457734143b320455af802 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2108) Flaky test: TestHoodieBackedMetadata.testOnlyValidPartitionsAdded:210

2021-10-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2108:
-
Labels: pull-request-available  (was: )

> Flaky test: TestHoodieBackedMetadata.testOnlyValidPartitionsAdded:210
> -
>
> Key: HUDI-2108
> URL: https://issues.apache.org/jira/browse/HUDI-2108
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=357=logs=864947d5-8fca-5138-8394-999ccb212a1e=552b4d2f-26d5-5f2f-1d5d-e8229058b632



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xushiyan opened a new pull request #3744: [HUDI-2108] Fix flakiness in TestHoodieBackedMetadata

2021-10-02 Thread GitBox


xushiyan opened a new pull request #3744:
URL: https://github.com/apache/hudi/pull/3744


   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2108) Flaky test: TestHoodieBackedMetadata.testOnlyValidPartitionsAdded:210

2021-10-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2108:
-
Status: In Progress  (was: Open)

> Flaky test: TestHoodieBackedMetadata.testOnlyValidPartitionsAdded:210
> -
>
> Key: HUDI-2108
> URL: https://issues.apache.org/jira/browse/HUDI-2108
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Major
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=357=logs=864947d5-8fca-5138-8394-999ccb212a1e=552b4d2f-26d5-5f2f-1d5d-e8229058b632



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2108) Flaky test: TestHoodieBackedMetadata.testOnlyValidPartitionsAdded:210

2021-10-02 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-2108:


Assignee: Raymond Xu  (was: Vinoth Chandar)

> Flaky test: TestHoodieBackedMetadata.testOnlyValidPartitionsAdded:210
> -
>
> Key: HUDI-2108
> URL: https://issues.apache.org/jira/browse/HUDI-2108
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vinoth Chandar
>Assignee: Raymond Xu
>Priority: Major
>
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=357=logs=864947d5-8fca-5138-8394-999ccb212a1e=552b4d2f-26d5-5f2f-1d5d-e8229058b632



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov removed a comment on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-10-02 Thread GitBox


vingov removed a comment on issue #2934:
URL: https://github.com/apache/hudi/issues/2934#issuecomment-932829205


   @t0il3ts0ap - sure, I will confirm in 2 days after checking with @jsbali.
   
   He already has the changes in our internal codebase, he just needs to 
upstream it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vingov commented on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-10-02 Thread GitBox


vingov commented on issue #2934:
URL: https://github.com/apache/hudi/issues/2934#issuecomment-932829205


   @t0il3ts0ap - sure, I will confirm in 2 days after checking with @jsbali.
   
   He already has the changes in our internal codebase, he just needs to 
upstream it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1362) Make deltastreamer support insert_overwrite

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-1362.
--
Resolution: Implemented

Closing Since this is already fixed. 

> Make deltastreamer support insert_overwrite 
> 
>
> Key: HUDI-1362
> URL: https://issues.apache.org/jira/browse/HUDI-1362
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: liujinhui
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1362) Make deltastreamer support insert_overwrite

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1362:
-
Status: Open  (was: New)

> Make deltastreamer support insert_overwrite 
> 
>
> Key: HUDI-1362
> URL: https://issues.apache.org/jira/browse/HUDI-1362
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: liujinhui
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1355) Allowing multipleSourceOrdering fields for doing the preCombine on payload

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1355:
-
Status: Open  (was: New)

> Allowing multipleSourceOrdering fields for doing the preCombine on payload
> --
>
> Key: HUDI-1355
> URL: https://issues.apache.org/jira/browse/HUDI-1355
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Utilities
>Affects Versions: 0.9.0
>Reporter: Bala Mahesh Jampani
>Priority: Major
>  Labels: new-to-hudi, patch, starter
> Fix For: 0.10.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I have come across the use case where some of the incoming events have same 
> timestamps for the insert and update event. In this case I want to depend on 
> the other field for ordering. In simple terms, if the primary sort ties, i 
> want to do secondary sort based on other field, if that too ties, go to the 
> other field etc.,. it would be good if hudi has this functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1079) Cannot upsert on schema with Array of Record with single field

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1079:
-
Status: Open  (was: New)

> Cannot upsert on schema with Array of Record with single field
> --
>
> Key: HUDI-1079
> URL: https://issues.apache.org/jira/browse/HUDI-1079
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.9.0
> Environment: spark 2.4.4, local 
>Reporter: Adrian Tanase
>Priority: Critical
>  Labels: schema, sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> I am trying to trigger upserts on a table that has an array field with 
> records of just one field.
>  Here is the code to reproduce:
> {code:scala}
>   val spark = SparkSession.builder()
>   .master("local[1]")
>   .appName("SparkByExamples.com")
>   .config("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
>   .getOrCreate();
>   // https://sparkbyexamples.com/spark/spark-dataframe-array-of-struct/
>   val arrayStructData = Seq(
> Row("James",List(Row("Java","XX",120),Row("Scala","XA",300))),
> Row("Michael",List(Row("Java","XY",200),Row("Scala","XB",500))),
> Row("Robert",List(Row("Java","XZ",400),Row("Scala","XC",250))),
> Row("Washington",null)
>   )
>   val arrayStructSchema = new StructType()
>   .add("name",StringType)
>   .add("booksIntersted",ArrayType(
> new StructType()
>   .add("bookName",StringType)
> //  .add("author",StringType)
> //  .add("pages",IntegerType)
>   ))
> val df = 
> spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData),arrayStructSchema)
> {code}
> Running insert following by upsert will fail:
> {code:scala}
>   df.write
>   .format("hudi")
>   .options(getQuickstartWriteConfigs)
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "name")
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "name")
>   .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, "COPY_ON_WRITE")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .mode(Overwrite)
>   .save(basePath)
>   df.write
>   .format("hudi")
>   .options(getQuickstartWriteConfigs)
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "name")
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "name")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .mode(Append)
>   .save(basePath)
> {code}
> If I create the books record with all the fields (at least 2), it works as 
> expected.
> The relevant part of the exception is this:
> {noformat}
> Caused by: java.lang.ClassCastException: required binary bookName (UTF8) is 
> not a groupCaused by: java.lang.ClassCastException: required binary bookName 
> (UTF8) is not a group at 
> org.apache.parquet.schema.Type.asGroupType(Type.java:207) at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:232)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.access$100(AvroRecordConverter.java:78)
>  at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter$ElementConverter.(AvroRecordConverter.java:536)
>  at 
> org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:486)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>  at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>  at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
>  at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156) at 
> org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at 
> org.apache.hudi.client.utils.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ... 4 
> more{noformat}
> Another way to test is by changing the generated data in the tips to just the 
> amount, by dropping the currency on the tips_history field, tests will start 

[jira] [Updated] (HUDI-893) Add spark datasource V2 reader support for Hudi tables

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-893:

Status: Open  (was: New)

> Add spark datasource V2 reader support for Hudi tables
> --
>
> Key: HUDI-893
> URL: https://issues.apache.org/jira/browse/HUDI-893
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nan Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1341) hudi cli command such as rollback 、bootstrap support spark sql implement

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1341:
-
Status: Open  (was: New)

> hudi cli command such as rollback 、bootstrap support spark sql  implement
> -
>
> Key: HUDI-1341
> URL: https://issues.apache.org/jira/browse/HUDI-1341
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>
> now rollback 、bootstrap ... command need to use hudi CLI. Some user more like 
> use spark
>  sql or spark code API. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1237) [UMBRELLA] Checkstyle, formatting, warnings, spotless

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1237:
-
Priority: Major  (was: Blocker)

> [UMBRELLA] Checkstyle, formatting, warnings, spotless
> -
>
> Key: HUDI-1237
> URL: https://issues.apache.org/jira/browse/HUDI-1237
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: sivabalan narayanan
>Assignee: leesf
>Priority: Major
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> Umbrella ticket to track all tickets related to checkstyle, spotless, 
> warnings etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1237) [UMBRELLA] Checkstyle, formatting, warnings, spotless

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1237:
-
Fix Version/s: (was: 0.10.0)

> [UMBRELLA] Checkstyle, formatting, warnings, spotless
> -
>
> Key: HUDI-1237
> URL: https://issues.apache.org/jira/browse/HUDI-1237
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: sivabalan narayanan
>Assignee: leesf
>Priority: Blocker
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
>
> Umbrella ticket to track all tickets related to checkstyle, spotless, 
> warnings etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1237) [UMBRELLA] Checkstyle, formatting, warnings, spotless

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1237:
-
Status: Open  (was: New)

> [UMBRELLA] Checkstyle, formatting, warnings, spotless
> -
>
> Key: HUDI-1237
> URL: https://issues.apache.org/jira/browse/HUDI-1237
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup
>Reporter: sivabalan narayanan
>Assignee: leesf
>Priority: Blocker
>  Labels: gsoc, gsoc2021, hudi-umbrellas, mentor
> Fix For: 0.10.0
>
>
> Umbrella ticket to track all tickets related to checkstyle, spotless, 
> warnings etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1500) Support incrementally reading clustering commit via Spark Datasource/DeltaStreamer

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1500:
-
Status: Open  (was: New)

> Support incrementally reading clustering  commit via Spark 
> Datasource/DeltaStreamer
> ---
>
> Key: HUDI-1500
> URL: https://issues.apache.org/jira/browse/HUDI-1500
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Spark Integration
>Reporter: liwei
>Assignee: satish
>Priority: Blocker
> Fix For: 0.10.0
>
>
> now in DeltaSync.readFromSource() can  not read last instant as replace 
> commit, such as clustering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-864:

Status: Open  (was: New)

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core, Spark Integration
>Affects Versions: 0.5.2, 0.6.0, 0.5.3, 0.7.0, 0.8.0, 0.9.0
>Reporter: Roland Johann
>Priority: Blocker
>  Labels: sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 

[jira] [Updated] (HUDI-1015) Audit all getAllPartitionPaths() calls and keep em out of fast path

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1015:
-
Status: Open  (was: New)

> Audit all getAllPartitionPaths() calls and keep em out of fast path
> ---
>
> Key: HUDI-1015
> URL: https://issues.apache.org/jira/browse/HUDI-1015
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Writer Core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1492) Handle DeltaWriteStat correctly for storage schemes that support appends

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1492:
-
Status: Open  (was: New)

> Handle DeltaWriteStat correctly for storage schemes that support appends
> 
>
> Key: HUDI-1492
> URL: https://issues.apache.org/jira/browse/HUDI-1492
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Current implementation simply uses the
> {code:java}
> String pathWithPartition = hoodieWriteStat.getPath(); {code}
> to write the metadata table. this is problematic, if the delta write was 
> merely an append. and can technically add duplicate files into the metadata 
> table 
> (not sure if this is a problem per se. but filing a Jira to track and either 
> close/fix ) 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1180) Upgrade HBase to 2.3.3

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1180:
-
Labels: sev:critical  (was: )

> Upgrade HBase to 2.3.3
> --
>
> Key: HUDI-1180
> URL: https://issues.apache.org/jira/browse/HUDI-1180
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Wenning Ding
>Priority: Blocker
>  Labels: sev:critical
> Fix For: 0.10.0
>
>
> Trying to upgrade HBase to 2.3.3 but ran into several issues.
> According to the Hadoop version support matrix: 
> [http://hbase.apache.org/book.html#hadoop], also need to upgrade Hadoop to 
> 2.8.5+.
>  
> There are several API conflicts between HBase 2.2.3 and HBase 1.2.3, we need 
> to resolve this first. After resolving conflicts, I am able to compile it but 
> then I ran into a tricky jetty version issue during the testing:
> {code:java}
> [ERROR] TestHBaseIndex.testDelete()  Time elapsed: 4.705 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdate()  Time elapsed: 0.174 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdateWithRollback()  Time 
> elapsed: 0.076 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSmallBatchSize()  Time elapsed: 0.122 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTagLocationAndDuplicateUpdate()  Time elapsed: 
> 0.16 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalGetsBatching()  Time elapsed: 1.771 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalPutsBatching()  Time elapsed: 0.082 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> 34206 [Thread-260] WARN  
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner  - DirectoryScanner: 
> shutdown has been called
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager  - 
> IncrementalBlockReportManager interrupted
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.DataNode  - Ending block pool service 
> for: Block pool BP-1058834949-10.0.0.2-1597189606506 (Datanode Uuid 
> cb7bd8aa-5d79-4955-b1ec-bdaf7f1b6431) service to localhost/127.0.0.1:55924
> 34246 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data1/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 34247 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data2/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 37192 [HBase-Metrics2-1] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  
> - Cannot locate configuration: tried 
> hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
> 43904 
> [master/iad1-ws-cor-r12:0:becomeActiveMaster-SendThread(localhost:58768)] 
> WARN  org.apache.zookeeper.ClientCnxn  - Session 0x173dfeb0c8b0004 for server 
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   

[jira] [Updated] (HUDI-1180) Upgrade HBase to 2.3.3

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1180:
-
Status: Open  (was: New)

> Upgrade HBase to 2.3.3
> --
>
> Key: HUDI-1180
> URL: https://issues.apache.org/jira/browse/HUDI-1180
> Project: Apache Hudi
>  Issue Type: Improvement
>Affects Versions: 0.9.0
>Reporter: Wenning Ding
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Trying to upgrade HBase to 2.3.3 but ran into several issues.
> According to the Hadoop version support matrix: 
> [http://hbase.apache.org/book.html#hadoop], also need to upgrade Hadoop to 
> 2.8.5+.
>  
> There are several API conflicts between HBase 2.2.3 and HBase 1.2.3, we need 
> to resolve this first. After resolving conflicts, I am able to compile it but 
> then I ran into a tricky jetty version issue during the testing:
> {code:java}
> [ERROR] TestHBaseIndex.testDelete()  Time elapsed: 4.705 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdate()  Time elapsed: 0.174 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdateWithRollback()  Time 
> elapsed: 0.076 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSmallBatchSize()  Time elapsed: 0.122 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTagLocationAndDuplicateUpdate()  Time elapsed: 
> 0.16 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalGetsBatching()  Time elapsed: 1.771 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalPutsBatching()  Time elapsed: 0.082 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> 34206 [Thread-260] WARN  
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner  - DirectoryScanner: 
> shutdown has been called
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager  - 
> IncrementalBlockReportManager interrupted
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.DataNode  - Ending block pool service 
> for: Block pool BP-1058834949-10.0.0.2-1597189606506 (Datanode Uuid 
> cb7bd8aa-5d79-4955-b1ec-bdaf7f1b6431) service to localhost/127.0.0.1:55924
> 34246 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data1/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 34247 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data2/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 37192 [HBase-Metrics2-1] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  
> - Cannot locate configuration: tried 
> hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
> 43904 
> [master/iad1-ws-cor-r12:0:becomeActiveMaster-SendThread(localhost:58768)] 
> WARN  org.apache.zookeeper.ClientCnxn  - Session 0x173dfeb0c8b0004 for server 
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   

[jira] [Updated] (HUDI-1180) Upgrade HBase to 2.3.3

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1180:
-
Component/s: Writer Core

> Upgrade HBase to 2.3.3
> --
>
> Key: HUDI-1180
> URL: https://issues.apache.org/jira/browse/HUDI-1180
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Wenning Ding
>Priority: Blocker
> Fix For: 0.10.0
>
>
> Trying to upgrade HBase to 2.3.3 but ran into several issues.
> According to the Hadoop version support matrix: 
> [http://hbase.apache.org/book.html#hadoop], also need to upgrade Hadoop to 
> 2.8.5+.
>  
> There are several API conflicts between HBase 2.2.3 and HBase 1.2.3, we need 
> to resolve this first. After resolving conflicts, I am able to compile it but 
> then I ran into a tricky jetty version issue during the testing:
> {code:java}
> [ERROR] TestHBaseIndex.testDelete()  Time elapsed: 4.705 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdate()  Time elapsed: 0.174 
> s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSimpleTagLocationAndUpdateWithRollback()  Time 
> elapsed: 0.076 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testSmallBatchSize()  Time elapsed: 0.122 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTagLocationAndDuplicateUpdate()  Time elapsed: 
> 0.16 s  <<< ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalGetsBatching()  Time elapsed: 1.771 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR] TestHBaseIndex.testTotalPutsBatching()  Time elapsed: 0.082 s  <<< 
> ERROR!
> java.lang.NoSuchMethodError: 
> org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> 34206 [Thread-260] WARN  
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner  - DirectoryScanner: 
> shutdown has been called
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager  - 
> IncrementalBlockReportManager interrupted
> 34240 [BP-1058834949-10.0.0.2-1597189606506 heartbeating to 
> localhost/127.0.0.1:55924] WARN  
> org.apache.hadoop.hdfs.server.datanode.DataNode  - Ending block pool service 
> for: Block pool BP-1058834949-10.0.0.2-1597189606506 (Datanode Uuid 
> cb7bd8aa-5d79-4955-b1ec-bdaf7f1b6431) service to localhost/127.0.0.1:55924
> 34246 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data1/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 34247 
> [refreshUsed-/private/var/folders/98/mxq3vc_n6l5728rf1wmcwrqs52lpwg/T/temp1791820148926982977/dfs/data/data2/current/BP-1058834949-10.0.0.2-1597189606506]
>  WARN  org.apache.hadoop.fs.CachingGetSpaceUsed  - Thread Interrupted waiting 
> to refresh disk information: sleep interrupted
> 37192 [HBase-Metrics2-1] WARN  org.apache.hadoop.metrics2.impl.MetricsConfig  
> - Cannot locate configuration: tried 
> hadoop-metrics2-datanode.properties,hadoop-metrics2.properties
> 43904 
> [master/iad1-ws-cor-r12:0:becomeActiveMaster-SendThread(localhost:58768)] 
> WARN  org.apache.zookeeper.ClientCnxn  - Session 0x173dfeb0c8b0004 for server 
> null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>   at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> [INFO] 
> [INFO] Results:
> [INFO] 
> [ERROR] Errors: 
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V
> [ERROR]   

[jira] [Assigned] (HUDI-1015) Audit all getAllPartitionPaths() calls and keep em out of fast path

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-1015:


Assignee: sivabalan narayanan  (was: Vinoth Chandar)

> Audit all getAllPartitionPaths() calls and keep em out of fast path
> ---
>
> Key: HUDI-1015
> URL: https://issues.apache.org/jira/browse/HUDI-1015
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Writer Core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-864:

Fix Version/s: 0.10.0

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core, Spark Integration
>Affects Versions: 0.5.2, 0.6.0, 0.5.3, 0.7.0, 0.8.0, 0.9.0
>Reporter: Roland Johann
>Priority: Blocker
>  Labels: sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 

[jira] [Updated] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2021-10-02 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1204:
-
Status: Open  (was: New)

> NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob
> ---
>
> Key: HUDI-1204
> URL: https://issues.apache.org/jira/browse/HUDI-1204
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Testing
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Assignee: Nishith Agarwal
>Priority: Major
> Attachments: complex-dag-cow-2.yaml
>
>
> I was trying to run HoodieTestSuiteJob in my local docker set up and ran into 
> dep issue.
>  
> spark-submit --master local --class 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob --packages 
> com.databricks:spark-avro_2.11:4.0.0 
> /opt/hudi-integ-test-bundle-0.6.0-rc1.jar  --source-ordering-field timestamp  
>   --target-base-path /user/hive/warehouse/hudi-test-suite/output    
> --input-base-path /user/hive/warehouse/hudi-test-suite/input    
> --target-table test_table    --props [file:///opt/test-source.properties]    
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider    --source-class 
> org.apache.hudi.utilities.sources.AvroDFSSource    --input-file-size 12582912 
>  --workload-yaml-path 
> /var/hoodie/ws/docker/demo/config/test-suite/complex-dag-cow.yaml 
> --table-type COPY_ON_WRITE    --workload-generator-classname yaml
>  
> {code:java}
> 20/08/19 21:42:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hudi/sync/common/AbstractSyncTool
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$Config.(HoodieDeltaStreamer.java:279)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob$HoodieTestSuiteConfig.(HoodieTestSuiteJob.java:153)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:114)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hudi.sync.common.AbstractSyncTool
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 26 more
>  {code}
> I tried adding hudi-sync-common as dep to hudi-utilities, but didn't fix the 
> issue. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)