[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390300#comment-17390300
 ] 

ASF GitHub Bot commented on HUDI-2254:
--

hudi-bot edited a comment on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255)
 
   * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN
   * a565942ea36395c0b67c7c7827495a9ef5e6c0af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Builtin sort operator for flink bulk insert
> ---
>
> Key: HUDI-2254
> URL: https://issues.apache.org/jira/browse/HUDI-2254
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255)
 
   * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN
   * a565942ea36395c0b67c7c7827495a9ef5e6c0af UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390299#comment-17390299
 ] 

ASF GitHub Bot commented on HUDI-2254:
--

hudi-bot edited a comment on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255)
 
   * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Builtin sort operator for flink bulk insert
> ---
>
> Key: HUDI-2254
> URL: https://issues.apache.org/jira/browse/HUDI-2254
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390298#comment-17390298
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   * 0bb6768327f3a54bb25d4504043acfb94ecfa311 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1256)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255)
 
   * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   * 0bb6768327f3a54bb25d4504043acfb94ecfa311 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1256)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390297#comment-17390297
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   * 0bb6768327f3a54bb25d4504043acfb94ecfa311 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   * 0bb6768327f3a54bb25d4504043acfb94ecfa311 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390296#comment-17390296
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390295#comment-17390295
 ] 

ASF GitHub Bot commented on HUDI-2254:
--

hudi-bot edited a comment on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Builtin sort operator for flink bulk insert
> ---
>
> Key: HUDI-2254
> URL: https://issues.apache.org/jira/browse/HUDI-2254
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390294#comment-17390294
 ] 

ASF GitHub Bot commented on HUDI-2254:
--

hudi-bot commented on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Builtin sort operator for flink bulk insert
> ---
>
> Key: HUDI-2254
> URL: https://issues.apache.org/jira/browse/HUDI-2254
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert

2021-07-29 Thread GitBox


hudi-bot commented on pull request #3372:
URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791


   
   ## CI report:
   
   * 23687df6305830a9382181d0e795c33f4c7d9f98 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2255) Refactor DataSourceOptions

2021-07-29 Thread Wenning Ding (Jira)
Wenning Ding created HUDI-2255:
--

 Summary: Refactor DataSourceOptions
 Key: HUDI-2255
 URL: https://issues.apache.org/jira/browse/HUDI-2255
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Wenning Ding


As discussed with Vinoth, we can rename DataSourceOptions, from xxx_OPT_KEY to 
xxx_OPT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2254:
-
Labels: pull-request-available  (was: )

> Builtin sort operator for flink bulk insert
> ---
>
> Key: HUDI-2254
> URL: https://issues.apache.org/jira/browse/HUDI-2254
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390291#comment-17390291
 ] 

ASF GitHub Bot commented on HUDI-2254:
--

danny0405 opened a new pull request #3372:
URL: https://github.com/apache/hudi/pull/3372


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Builtin sort operator for flink bulk insert
> ---
>
> Key: HUDI-2254
> URL: https://issues.apache.org/jira/browse/HUDI-2254
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert

2021-07-29 Thread GitBox


danny0405 opened a new pull request #3372:
URL: https://github.com/apache/hudi/pull/3372


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2254) Builtin sort operator for flink bulk insert

2021-07-29 Thread Danny Chen (Jira)
Danny Chen created HUDI-2254:


 Summary: Builtin sort operator for flink bulk insert
 Key: HUDI-2254
 URL: https://issues.apache.org/jira/browse/HUDI-2254
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.9.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390290#comment-17390290
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

vinothchandar merged pull request #3371:
URL: https://github.com/apache/hudi/pull/3371


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371)

2021-07-29 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 7bdae69  [HUDI-2253] Refactoring few tests to reduce runningtime. 
DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371)
7bdae69 is described below

commit 7bdae69053afc5ef604a15806d78317cb976f2ce
Author: Sivabalan Narayanan 
AuthorDate: Fri Jul 30 01:22:26 2021 -0400

[HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and 
MultiDeltaStreamer tests. Bulk insert row writer tests (#3371)

Co-authored-by: Sivabalan Narayanan 
---
 .../TestHoodieBulkInsertDataInternalWriter.java|   4 +-
 .../TestHoodieDataSourceInternalWriter.java|   9 +-
 .../TestHoodieBulkInsertDataInternalWriter.java|   4 +-
 .../TestHoodieDataSourceInternalBatchWrite.java|   7 +-
 .../functional/TestHoodieDeltaStreamer.java| 228 +--
 .../functional/TestHoodieDeltaStreamerBase.java| 245 +
 .../TestHoodieMultiTableDeltaStreamer.java |   4 +-
 7 files changed, 267 insertions(+), 234 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java
 
b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java
index 9735379..fd943b7 100644
--- 
a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java
+++ 
b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java
@@ -74,7 +74,7 @@ public class TestHoodieBulkInsertDataInternalWriter extends
 HoodieWriteConfig cfg = getWriteConfig(populateMetaFields);
 HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient);
 // execute N rounds
-for (int i = 0; i < 3; i++) {
+for (int i = 0; i < 2; i++) {
   String instantTime = "00" + i;
   // init writer
   HoodieBulkInsertDataInternalWriter writer = new 
HoodieBulkInsertDataInternalWriter(table, cfg, instantTime, 
RANDOM.nextInt(10), RANDOM.nextLong(), RANDOM.nextLong(),
@@ -82,7 +82,7 @@ public class TestHoodieBulkInsertDataInternalWriter extends
 
   int size = 10 + RANDOM.nextInt(1000);
   // write N rows to partition1, N rows to partition2 and N rows to 
partition3 ... Each batch should create a new RowCreateHandle and a new file
-  int batches = 5;
+  int batches = 3;
   Dataset totalInputRows = null;
 
   for (int j = 0; j < batches; j++) {
diff --git 
a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java
 
b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java
index 342e2ae..eea49e6 100644
--- 
a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java
+++ 
b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java
@@ -30,6 +30,7 @@ import org.apache.spark.sql.Row;
 import org.apache.spark.sql.catalyst.InternalRow;
 import org.apache.spark.sql.sources.v2.DataSourceOptions;
 import org.apache.spark.sql.sources.v2.writer.DataWriter;
+import org.junit.jupiter.api.Disabled;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.params.ParameterizedTest;
 import org.junit.jupiter.params.provider.Arguments;
@@ -87,7 +88,7 @@ public class TestHoodieDataSourceInternalWriter extends
 }
 
 int size = 10 + RANDOM.nextInt(1000);
-int batches = 5;
+int batches = 2;
 Dataset totalInputRows = null;
 for (int j = 0; j < batches; j++) {
   String partitionPath = HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[j 
% 3];
@@ -158,7 +159,7 @@ public class TestHoodieDataSourceInternalWriter extends
 int partitionCounter = 0;
 
 // execute N rounds
-for (int i = 0; i < 5; i++) {
+for (int i = 0; i < 2; i++) {
   String instantTime = "00" + i;
   // init writer
   HoodieDataSourceInternalWriter dataSourceInternalWriter =
@@ -168,7 +169,7 @@ public class TestHoodieDataSourceInternalWriter extends
   DataWriter writer = 
dataSourceInternalWriter.createWriterFactory().createDataWriter(partitionCounter++,
 RANDOM.nextLong(), RANDOM.nextLong());
 
   int size = 10 + RANDOM.nextInt(1000);
-  int batches = 5; // one batch per partition
+  int batches = 2; // one batch per partition
 
   for (int j = 0; j < batches; j++) {
 String partitionPath = 
HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[j % 3];
@@ -195,6 +196,8 @@ public class TestHoodieDataSourceInternalWriter extends
 }
   }
 
+  // takes up lot of running time with CI.
+  @Disabled
   

[GitHub] [hudi] vinothchandar merged pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


vinothchandar merged pull request #3371:
URL: https://github.com/apache/hudi/pull/3371


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390289#comment-17390289
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

vinothchandar commented on a change in pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#discussion_r679655658



##
File path: 
hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java
##
@@ -195,6 +196,8 @@ public void testMultipleDataSourceWrites(boolean 
populateMetaFields) throws Exce
 }
   }
 
+  // takes up lot of running time with CI.
+  @Disabled

Review comment:
   but do we need this re-enabled at some point?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on a change in pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


vinothchandar commented on a change in pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#discussion_r679655658



##
File path: 
hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java
##
@@ -195,6 +196,8 @@ public void testMultipleDataSourceWrites(boolean 
populateMetaFields) throws Exce
 }
   }
 
+  // takes up lot of running time with CI.
+  @Disabled

Review comment:
   but do we need this re-enabled at some point?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ziudu commented on issue #3344: [SUPPORT]Best way to ingest a large number of tables

2021-07-29 Thread GitBox


ziudu commented on issue #3344:
URL: https://github.com/apache/hudi/issues/3344#issuecomment-889636377


   We made a few POC and found:
   
   1. java-client didn't support MOR either so we would not use it.
   2. Multitabledeltastreamer did not work correctly in continuous mode. It did 
work somehow for MOR in single-run mode, but we preferred not to use it as the 
doc said MOR was not supported. Plus, multitabledeltastreamer runs ingestions 
in serial mode, not in parallel mode. 
   
   For the moment, we will stick to delta streamers, and launch them regularly 
(every 5-10 minutes) to process change data from Debezium or Golden Gate. Let's 
see what will happen for 1000 tables.
   
   However, I don't think it's optimized, as
   - 1 delta streamer needs at least 1 spark executor, with usually 2GB memory. 
Most of our tables have only a very small amount of change data (<1MB) during a 
5-10 minutes' period. We might need a large Hadoop cluster with enough memory 
for data ingestion, transformation and PrestoSQL.  
   - We use spark on Yarn, so it takes 10 seconds to create a Yarn delta 
streamer application, which we think is not optimized either. 
   
   Our final thought:
   Is it possible to write a long-running spark application, which listens to 
multiple data change topics, and writes change data in parallel to Hadoop 
hoodie tables via pyspark data frame? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390285#comment-17390285
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-889626943


   @hudi-bot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] mkk1490 commented on issue #3313: [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key

2021-07-29 Thread GitBox


mkk1490 commented on issue #3313:
URL: https://github.com/apache/hudi/issues/3313#issuecomment-889627136


   @nsivabalan I set the row_writer property to False and ingested the data. 
Now, timestamp gets converted to their respective epoch seconds and long 
datatype in hoodie_key
   
![image](https://user-images.githubusercontent.com/16716227/127602181-65d0075d-4757-4280-aa51-75592fb03fa8.png)
   
   This actually solves my issue since during upsert, the key would be in sync 
with the IDL key. But bulk_insert with row.writer:False is very slow. It 
actually takes double the time for the same data ingestion. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390284#comment-17390284
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-889626943


   @hudi-bot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch asf-site updated: Travis CI build asf-site

2021-07-29 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e2f53b1  Travis CI build asf-site
e2f53b1 is described below

commit e2f53b137d78d0f68cc1aaf3a191d9a6679d9d53
Author: CI 
AuthorDate: Fri Jul 30 04:32:18 2021 +

Travis CI build asf-site
---
 content/docs/powered_by.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/content/docs/powered_by.html b/content/docs/powered_by.html
index f6b2312..b89cb7b 100644
--- a/content/docs/powered_by.html
+++ b/content/docs/powered_by.html
@@ -566,7 +566,7 @@ Data Summit Connect, May, 2021
   https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-available-in-amazon-emr/;>“New
 features from Apache hudi in Amazon EMR”
   https://aws.amazon.com/blogs/big-data/build-a-data-lake-using-amazon-kinesis-data-streams-for-amazon-dynamodb-and-apache-hudi/;>“Build
 a data lake using amazon kinesis data stream for amazon dynamodb and apache 
hudi” - Amazon AWS
   https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-athena-expands-apache-hudi-support/;>“Amazon
 Athena expands Apache Hudi support” - Amazon AWS
-  “Part1:
 Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read 
optimized queries”
+  https://aws.amazon.com/blogs/big-data/part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/;>“Part1:
 Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read 
optimized queries” - Amazon AWS
 
 
 Powered by


[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390277#comment-17390277
 ] 

ASF GitHub Bot commented on HUDI-2243:
--

codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679638563



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions}
+import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, 
PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY}
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator}
+import org.apache.hudi.testutils.HoodieClientTestBase
+import org.apache.spark.sql.{Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+class TestTimeTravelQuery extends HoodieClientTestBase {
+  var spark: SparkSession =_
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition",
+DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp",
+HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test"
+  )
+
+  @BeforeEach override def setUp() {
+initPath()
+initSparkContexts()
+spark = sqlContext.sparkSession
+initTestDataGenerator()
+initFileSystem()
+  }
+
+  @AfterEach override def tearDown() = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testTimeTravelQuery(tableType: HoodieTableType): Unit = {
+initMetaClient(tableType)
+val _spark = spark
+import _spark.implicits._
+
+// First write
+val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version")
+df1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name())
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val firstCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Second write
+val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version")
+df2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Append)
+  .save(basePath)
+metaClient.reloadActiveTimeline()
+val secondCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Third write

Review comment:
   Wondering what happens when clean commits are interleaved in between, 
say as.of.instant is 1002 and there are couple of clean commits before that. I 
believe the behavior would be same as we have today when latest instant is 
passed? 




-- 
This is an automated message from the Apache Git Service.
To 

[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table

2021-07-29 Thread GitBox


codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679638563



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions}
+import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, 
PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY}
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator}
+import org.apache.hudi.testutils.HoodieClientTestBase
+import org.apache.spark.sql.{Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+class TestTimeTravelQuery extends HoodieClientTestBase {
+  var spark: SparkSession =_
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition",
+DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp",
+HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test"
+  )
+
+  @BeforeEach override def setUp() {
+initPath()
+initSparkContexts()
+spark = sqlContext.sparkSession
+initTestDataGenerator()
+initFileSystem()
+  }
+
+  @AfterEach override def tearDown() = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testTimeTravelQuery(tableType: HoodieTableType): Unit = {
+initMetaClient(tableType)
+val _spark = spark
+import _spark.implicits._
+
+// First write
+val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version")
+df1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name())
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val firstCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Second write
+val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version")
+df2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Append)
+  .save(basePath)
+metaClient.reloadActiveTimeline()
+val secondCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Third write

Review comment:
   Wondering what happens when clean commits are interleaved in between, 
say as.of.instant is 1002 and there are couple of clean commits before that. I 
believe the behavior would be same as we have today when latest instant is 
passed? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:

[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390276#comment-17390276
 ] 

ASF GitHub Bot commented on HUDI-2243:
--

codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679637630



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java
##
@@ -58,6 +58,11 @@
  */
 Stream getLatestBaseFiles();
 
+/**

Review comment:
   nit: 
   ```
   /**
 * Stream all the latest version data files across partitions with 
precondition that commitTime(file) before
 * maxCommitTime.
*/
   ```
   
   More in line with the existing doc. What do you think? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Time Travel Query For Hoodie Table
> --
>
> Key: HUDI-2243
> URL: https://issues.apache.org/jira/browse/HUDI-2243
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
>
> Support time travel query for hoodie table for both COW and MOR table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table

2021-07-29 Thread GitBox


codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679637630



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java
##
@@ -58,6 +58,11 @@
  */
 Stream getLatestBaseFiles();
 
+/**

Review comment:
   nit: 
   ```
   /**
 * Stream all the latest version data files across partitions with 
precondition that commitTime(file) before
 * maxCommitTime.
*/
   ```
   
   More in line with the existing doc. What do you think? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390275#comment-17390275
 ] 

ASF GitHub Bot commented on HUDI-2243:
--

codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions}
+import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, 
PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY}
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator}
+import org.apache.hudi.testutils.HoodieClientTestBase
+import org.apache.spark.sql.{Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+class TestTimeTravelQuery extends HoodieClientTestBase {
+  var spark: SparkSession =_
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition",
+DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp",
+HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test"
+  )
+
+  @BeforeEach override def setUp() {
+initPath()
+initSparkContexts()
+spark = sqlContext.sparkSession
+initTestDataGenerator()
+initFileSystem()
+  }
+
+  @AfterEach override def tearDown() = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testTimeTravelQuery(tableType: HoodieTableType): Unit = {
+initMetaClient(tableType)
+val _spark = spark
+import _spark.implicits._
+
+// First write
+val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version")
+df1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name())
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val firstCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Second write
+val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version")
+df2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)

Review comment:
   Shouldn't we set this (and other instances below) to `tableType` just 
like on line 70?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Time Travel Query For Hoodie Table
> --
>
> Key: HUDI-2243
> URL: https://issues.apache.org/jira/browse/HUDI-2243
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
> 

[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table

2021-07-29 Thread GitBox


codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions}
+import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, 
PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY}
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator}
+import org.apache.hudi.testutils.HoodieClientTestBase
+import org.apache.spark.sql.{Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+class TestTimeTravelQuery extends HoodieClientTestBase {
+  var spark: SparkSession =_
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition",
+DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp",
+HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test"
+  )
+
+  @BeforeEach override def setUp() {
+initPath()
+initSparkContexts()
+spark = sqlContext.sparkSession
+initTestDataGenerator()
+initFileSystem()
+  }
+
+  @AfterEach override def tearDown() = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testTimeTravelQuery(tableType: HoodieTableType): Unit = {
+initMetaClient(tableType)
+val _spark = spark
+import _spark.implicits._
+
+// First write
+val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version")
+df1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name())
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val firstCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Second write
+val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version")
+df2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)

Review comment:
   Shouldn't we set this (and other instances below) to `tableType` just 
like on line 70?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390273#comment-17390273
 ] 

ASF GitHub Bot commented on HUDI-2243:
--

codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions}
+import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, 
PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY}
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator}
+import org.apache.hudi.testutils.HoodieClientTestBase
+import org.apache.spark.sql.{Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+class TestTimeTravelQuery extends HoodieClientTestBase {
+  var spark: SparkSession =_
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition",
+DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp",
+HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test"
+  )
+
+  @BeforeEach override def setUp() {
+initPath()
+initSparkContexts()
+spark = sqlContext.sparkSession
+initTestDataGenerator()
+initFileSystem()
+  }
+
+  @AfterEach override def tearDown() = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testTimeTravelQuery(tableType: HoodieTableType): Unit = {
+initMetaClient(tableType)
+val _spark = spark
+import _spark.implicits._
+
+// First write
+val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version")
+df1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name())
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val firstCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Second write
+val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version")
+df2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)

Review comment:
   Shouldn't we set this to `tableType` just like on line 70?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support Time Travel Query For Hoodie Table
> --
>
> Key: HUDI-2243
> URL: https://issues.apache.org/jira/browse/HUDI-2243
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Major
> 

[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table

2021-07-29 Thread GitBox


codope commented on a change in pull request #3360:
URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.functional
+
+import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions}
+import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, 
PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY}
+import org.apache.hudi.common.model.HoodieTableType
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator}
+import org.apache.hudi.testutils.HoodieClientTestBase
+import org.apache.spark.sql.{Row, SaveMode, SparkSession}
+import org.junit.jupiter.api.Assertions.assertEquals
+import org.junit.jupiter.api.{AfterEach, BeforeEach}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.EnumSource
+
+class TestTimeTravelQuery extends HoodieClientTestBase {
+  var spark: SparkSession =_
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+"hoodie.bulkinsert.shuffle.parallelism" -> "2",
+"hoodie.delete.shuffle.parallelism" -> "1",
+DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key",
+DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition",
+DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp",
+HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test"
+  )
+
+  @BeforeEach override def setUp() {
+initPath()
+initSparkContexts()
+spark = sqlContext.sparkSession
+initTestDataGenerator()
+initFileSystem()
+  }
+
+  @AfterEach override def tearDown() = {
+cleanupSparkContexts()
+cleanupTestDataGenerator()
+cleanupFileSystem()
+  }
+
+  @ParameterizedTest
+  @EnumSource(value = classOf[HoodieTableType])
+  def testTimeTravelQuery(tableType: HoodieTableType): Unit = {
+initMetaClient(tableType)
+val _spark = spark
+import _spark.implicits._
+
+// First write
+val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version")
+df1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name())
+  .option(RECORDKEY_FIELD_OPT_KEY.key, "id")
+  .option(PRECOMBINE_FIELD_OPT_KEY.key, "version")
+  .option(PARTITIONPATH_FIELD_OPT_KEY.key, "")
+  .option(KEYGENERATOR_CLASS_OPT_KEY.key, 
classOf[NonpartitionedKeyGenerator].getName)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+val firstCommit = 
metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp
+
+// Second write
+val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version")
+df2.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, 
DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)

Review comment:
   Shouldn't we set this to `tableType` just like on line 70?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2101) support z-order for hudi

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390272#comment-17390272
 ] 

ASF GitHub Bot commented on HUDI-2101:
--

hudi-bot edited a comment on pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571


   
   ## CI report:
   
   * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support z-order for hudi
> 
>
> Key: HUDI-2101
> URL: https://issues.apache.org/jira/browse/HUDI-2101
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> support z-order for hudi to optimze the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571


   
   ## CI report:
   
   * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390268#comment-17390268
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390261#comment-17390261
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

hudi-bot edited a comment on pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692


   
   ## CI report:
   
   * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692


   
   ## CI report:
   
   * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2234) MERGE INTO works only ON primary key

2021-07-29 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei reassigned HUDI-2234:


Assignee: pengzhiwei

> MERGE INTO works only ON primary key
> 
>
> Key: HUDI-2234
> URL: https://issues.apache.org/jira/browse/HUDI-2234
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Major
>
> {code:sql}
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (id int, name string, price double, ts long) 
> using hudi options(primaryKey = 'id', precombineField = 'ts') location 
> 'file:///tmp/hudi-h4-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120);
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120);
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120);
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (id int, name string, price double, ts long) using 
> hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) 
> location 'file:///tmp/hudi-h4-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 120);
> MERGE INTO hudi_fixed 
> USING (select id, name, price, ts from hudi_gh_ext_fixed) updates
> ON hudi_fixed.name = updates.name
> WHEN MATCHED THEN
>   UPDATE SET *
> WHEN NOT MATCHED
>   THEN INSERT *;
> -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the 
> defined primary key[id] in table hudi_fixed
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> --at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390257#comment-17390257
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r679615918



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() 
throws Exception {
 });
   }
 
+  @Test

Review comment:
   nice idea, changed

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -118,20 +134,41 @@ public static void main(String[] args) {
 jsc.stop();
   }
 
+  private static void validateRunningMode(Config cfg) {
+// --mode has a higher priority than --schedule
+// If we remove --schedule option in the future we need to change 
runningMode default value to EXECUTE
+if (StringUtils.isNullOrEmpty(cfg.runningMode)) {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390255#comment-17390255
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r679615761



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -449,6 +451,14 @@ static void assertAtleastNDeltaCommits(int minExpected, 
String tablePath, FileSy
   assertTrue(minExpected <= numDeltaCommits, "Got=" + numDeltaCommits + ", 
exp >=" + minExpected);
 }
 
+static void assertAtLeastNCompletedReplaceCommits(int minExpected, String 
tablePath, DistributedFileSystem fs) {

Review comment:
   Sure, changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390256#comment-17390256
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r679615842



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() 
throws Exception {
 });
   }
 
+  @Test
+  public void testHoodieAsyncClusteringJobWithScheduleAndExecute() throws 
Exception {

Review comment:
   nice idea, changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r679615918



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() 
throws Exception {
 });
   }
 
+  @Test

Review comment:
   nice idea, changed

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
##
@@ -118,20 +134,41 @@ public static void main(String[] args) {
 jsc.stop();
   }
 
+  private static void validateRunningMode(Config cfg) {
+// --mode has a higher priority than --schedule
+// If we remove --schedule option in the future we need to change 
runningMode default value to EXECUTE
+if (StringUtils.isNullOrEmpty(cfg.runningMode)) {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r679615842



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() 
throws Exception {
 });
   }
 
+  @Test
+  public void testHoodieAsyncClusteringJobWithScheduleAndExecute() throws 
Exception {

Review comment:
   nice idea, changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


zhangyue19921010 commented on a change in pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#discussion_r679615761



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##
@@ -449,6 +451,14 @@ static void assertAtleastNDeltaCommits(int minExpected, 
String tablePath, FileSy
   assertTrue(minExpected <= numDeltaCommits, "Got=" + numDeltaCommits + ", 
exp >=" + minExpected);
 }
 
+static void assertAtLeastNCompletedReplaceCommits(int minExpected, String 
tablePath, DistributedFileSystem fs) {

Review comment:
   Sure, changed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-2232) MERGE INTO fails with table having nested struct and partioned by

2021-07-29 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei reassigned HUDI-2232:


Assignee: pengzhiwei

> MERGE INTO fails with table having nested struct and partioned by
> -
>
> Key: HUDI-2232
> URL: https://issues.apache.org/jira/browse/HUDI-2232
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Blocker
> Fix For: 0.9.0
>
>
> {code:java}
> // TO reproduce
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (  id int,   name string,   price double,   ts 
> long,   repo struct) using hudi options(primaryKey = 
> 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (  id int,   name string,   price double,   ts long,  
>  repo struct) using hudi options(primaryKey = 'id', 
> precombineField = 'ts') partitioned by (ts) location 
> 'file:///tmp/hudi-h5-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 
> struct(234273476,"onnet/onnet-portal"), 130);
> select * from hudi_gh_ext_fixed;
> 20210727145240  20210727145240_0_6442266  id:3
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet  3 
> AMZN  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301  
> 20210727145301_0_6442269  id:2
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet  2 
> UBER  150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254  
> 20210727145254_0_6442268  id:4
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet  4 
> GOOG  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}
> select * from hudi_fixed;
> 20210727145325  20210727145325_0_6442270  id:2  ts=130  
> ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet  2 
> UBER  200.0 {"id":234273476,"name":"onnet/onnet-portal"}  130
> MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from 
> hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN  
> UPDATE SET *WHEN NOT MATCHED  THEN INSERT *;
> -- java.lang.IllegalArgumentException: UnSupport StructType yet--  at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)--
>   at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)--
>   at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown
>  Source)--  at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2101) support z-order for hudi

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390253#comment-17390253
 ] 

ASF GitHub Bot commented on HUDI-2101:
--

hudi-bot edited a comment on pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571


   
   ## CI report:
   
   * 6912152293bb9336c060d41715dbae14527287a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207)
 
   * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support z-order for hudi
> 
>
> Key: HUDI-2101
> URL: https://issues.apache.org/jira/browse/HUDI-2101
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> support z-order for hudi to optimze the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571


   
   ## CI report:
   
   * 6912152293bb9336c060d41715dbae14527287a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207)
 
   * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390251#comment-17390251
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-889591087


   > @vingov @nsivabalan Shall we move the schema evolution subsection to a new 
page under documentation? It's gonna be a story of its own.
   
   Yes, we can do all structural changes after we land this version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2101) support z-order for hudi

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390250#comment-17390250
 ] 

ASF GitHub Bot commented on HUDI-2101:
--

hudi-bot edited a comment on pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571


   
   ## CI report:
   
   * 6912152293bb9336c060d41715dbae14527287a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207)
 
   * 4112d163dffa737e6bd8761796746d33f7e896cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support z-order for hudi
> 
>
> Key: HUDI-2101
> URL: https://issues.apache.org/jira/browse/HUDI-2101
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> support z-order for hudi to optimze the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)

2021-07-29 Thread GitBox


vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-889591087


   > @vingov @nsivabalan Shall we move the schema evolution subsection to a new 
page under documentation? It's gonna be a story of its own.
   
   Yes, we can do all structural changes after we land this version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571


   
   ## CI report:
   
   * 6912152293bb9336c060d41715dbae14527287a2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207)
 
   * 4112d163dffa737e6bd8761796746d33f7e896cb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2101) support z-order for hudi

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390249#comment-17390249
 ] 

ASF GitHub Bot commented on HUDI-2101:
--

xiarixiaoyao commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r679612078



##
File path: 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
##
@@ -229,6 +229,11 @@ public HoodieWriteMetadata 
deletePartitions(HoodieEngineContext context, String
 throw new HoodieNotSupportedException("DeletePartitions is not supported 
yet");
   }
 
+  @Override
+  public HoodieWriteMetadata> optimize(HoodieEngineContext 
context, String instantTime, List> records) {
+throw new HoodieNotSupportedException("optimize data layouy is not 
supported yet");

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support z-order for hudi
> 
>
> Key: HUDI-2101
> URL: https://issues.apache.org/jira/browse/HUDI-2101
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> support z-order for hudi to optimze the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-07-29 Thread GitBox


xiarixiaoyao commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r679612078



##
File path: 
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java
##
@@ -229,6 +229,11 @@ public HoodieWriteMetadata 
deletePartitions(HoodieEngineContext context, String
 throw new HoodieNotSupportedException("DeletePartitions is not supported 
yet");
   }
 
+  @Override
+  public HoodieWriteMetadata> optimize(HoodieEngineContext 
context, String instantTime, List> records) {
+throw new HoodieNotSupportedException("optimize data layouy is not 
supported yet");

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390247#comment-17390247
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132)
 
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132)
 
   * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132)
 
   * abefb17f2c42c06e9c81ec26c6561172fedf4add UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390246#comment-17390246
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

hudi-bot edited a comment on pull request #3259:
URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249


   
   ## CI report:
   
   * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132)
 
   * abefb17f2c42c06e9c81ec26c6561172fedf4add UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390245#comment-17390245
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

codope commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-889586829


   @vingov @nsivabalan Shall we move the schema evolution subsection to a new 
page under documentation? It's gonna be a story of its own.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)

2021-07-29 Thread GitBox


codope commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-889586829


   @vingov @nsivabalan Shall we move the schema evolution subsection to a new 
page under documentation? It's gonna be a story of its own.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: Fixing article link (#3370)

2021-07-29 Thread garyli
This is an automated email from the ASF dual-hosted git repository.

garyli pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 7945698  Fixing article link (#3370)
7945698 is described below

commit 794569814773d2bc777132d0e9e4d6553a56b443
Author: Sivabalan Narayanan 
AuthorDate: Thu Jul 29 22:37:24 2021 -0400

Fixing article link (#3370)
---
 docs/_docs/1_4_powered_by.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/_docs/1_4_powered_by.md b/docs/_docs/1_4_powered_by.md
index 1c52f15..3ae92a9 100644
--- a/docs/_docs/1_4_powered_by.md
+++ b/docs/_docs/1_4_powered_by.md
@@ -194,7 +194,7 @@ You can check out [our blog 
pages](https://hudi.apache.org/blog.html) for conten
 23. ["New features from Apache hudi in Amazon 
EMR"](https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-available-in-amazon-emr/)
 24. ["Build a data lake using amazon kinesis data stream for amazon dynamodb 
and apache 
hudi"](https://aws.amazon.com/blogs/big-data/build-a-data-lake-using-amazon-kinesis-data-streams-for-amazon-dynamodb-and-apache-hudi/)
 - Amazon AWS
 25. ["Amazon Athena expands Apache Hudi 
support"](https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-athena-expands-apache-hudi-support/)
 - Amazon AWS
-26. ["Part1: Query apache hudi dataset in an amazon S3 data lake with amazon 
athena : Read optimized 
queries"](part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/)
+26. ["Part1: Query apache hudi dataset in an amazon S3 data lake with amazon 
athena : Read optimized 
queries"](https://aws.amazon.com/blogs/big-data/part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/)
 - Amazon AWS
 
 ## Powered by
 


[GitHub] [hudi] garyli1019 merged pull request #3370: [MINOR] Fixing an article Hyperlink

2021-07-29 Thread GitBox


garyli1019 merged pull request #3370:
URL: https://github.com/apache/hudi/pull/3370


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2253:
--
Component/s: Testing

> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390239#comment-17390239
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

hudi-bot edited a comment on pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692


   
   ## CI report:
   
   * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390238#comment-17390238
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

danny0405 commented on a change in pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#discussion_r679603117



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -38,16 +38,14 @@
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.List;
-import java.util.Random;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
-public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer 
{
+public class TestHoodieMultiTableDeltaStreamer extends 
TestHoodieDeltaStreamerBase {

Review comment:
   Nice catch ~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2252) Replace read full data with read latest commit data in flink stream read

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390240#comment-17390240
 ] 

ASF GitHub Bot commented on HUDI-2252:
--

hudi-bot edited a comment on pull request #3368:
URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348


   
   ## CI report:
   
   * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace read full data with read  latest commit data in flink stream read
> -
>
> Key: HUDI-2252
> URL: https://issues.apache.org/jira/browse/HUDI-2252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Replace read full data with read latest commit data in flink stream read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3368: [HUDI-2252] Default consumes from the latest instant for flink streaming reader

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3368:
URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348


   
   ## CI report:
   
   * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692


   
   ## CI report:
   
   * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


danny0405 commented on a change in pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#discussion_r679603117



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -38,16 +38,14 @@
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.List;
-import java.util.Random;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
-public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer 
{
+public class TestHoodieMultiTableDeltaStreamer extends 
TestHoodieDeltaStreamerBase {

Review comment:
   Nice catch ~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390237#comment-17390237
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

hudi-bot commented on pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692


   
   ## CI report:
   
   * 723eb6da23126ad85bbc7f62a182e025026462e7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


hudi-bot commented on pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692


   
   ## CI report:
   
   * 723eb6da23126ad85bbc7f62a182e025026462e7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390236#comment-17390236
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

nsivabalan commented on a change in pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#discussion_r679602005



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -38,16 +38,14 @@
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.List;
-import java.util.Random;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
-public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer 
{
+public class TestHoodieMultiTableDeltaStreamer extends 
TestHoodieDeltaStreamerBase {

Review comment:
   Previously TestHoodieMultiTableDeltaStreamer extended from 
TestHoodieDeltaStreamer and so tests in TestHoodieDeltaStreamer were running 
twice. This refactoring will ensure that TestHoodieDeltaStreamer tests run only 
once. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


nsivabalan commented on a change in pull request #3371:
URL: https://github.com/apache/hudi/pull/3371#discussion_r679602005



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -38,16 +38,14 @@
 import java.io.IOException;
 import java.util.Arrays;
 import java.util.List;
-import java.util.Random;
 
 import static org.junit.jupiter.api.Assertions.assertEquals;
 import static org.junit.jupiter.api.Assertions.assertThrows;
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
-public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer 
{
+public class TestHoodieMultiTableDeltaStreamer extends 
TestHoodieDeltaStreamerBase {

Review comment:
   Previously TestHoodieMultiTableDeltaStreamer extended from 
TestHoodieDeltaStreamer and so tests in TestHoodieDeltaStreamer were running 
twice. This refactoring will ensure that TestHoodieDeltaStreamer tests run only 
once. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2253:
-
Labels: pull-request-available  (was: )

> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390235#comment-17390235
 ] 

ASF GitHub Bot commented on HUDI-2253:
--

nsivabalan opened a new pull request #3371:
URL: https://github.com/apache/hudi/pull/3371


   - DeltaStreamer and MultiTableDeltaStreamer tests. 
   - Bulk insert row writer tests
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan opened a new pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.

2021-07-29 Thread GitBox


nsivabalan opened a new pull request #3371:
URL: https://github.com/apache/hudi/pull/3371


   - DeltaStreamer and MultiTableDeltaStreamer tests. 
   - Bulk insert row writer tests
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2253:
--
Fix Version/s: 0.9.0

> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2253:
-

 Summary: Reduce CI run time for deltastreamer and bulk insert row 
writer tests
 Key: HUDI-2253
 URL: https://issues.apache.org/jira/browse/HUDI-2253
 Project: Apache Hudi
  Issue Type: Test
Reporter: sivabalan narayanan


Reduce CI run time for deltastreamer and bulk insert row writer tests

 

org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2253:
--
Status: In Progress  (was: Open)

> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests

2021-07-29 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2253:
-

Assignee: sivabalan narayanan

> Reduce CI run time for deltastreamer and bulk insert row writer tests
> -
>
> Key: HUDI-2253
> URL: https://issues.apache.org/jira/browse/HUDI-2253
> Project: Apache Hudi
>  Issue Type: Test
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.9.0
>
>
> Reduce CI run time for deltastreamer and bulk insert row writer tests
>  
> org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer
> org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1848) Add support for HMS in Hive-sync-tool

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390234#comment-17390234
 ] 

ASF GitHub Bot commented on HUDI-1848:
--

stym06 commented on pull request #2879:
URL: https://github.com/apache/hudi/pull/2879#issuecomment-889577462


   What parameters are required to be passed to sync with HMS ? Can we use the 
thrift url?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add support for HMS in Hive-sync-tool
> -
>
> Key: HUDI-1848
> URL: https://issues.apache.org/jira/browse/HUDI-1848
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Jagmeet Bali
>Priority: Minor
>  Labels: pull-request-available, sev:normal
>
> Add support for HMS in Hive-sync-tool
> Currently there are two ways to sun DDL queries in hive-sync-tool. 
> This work adds on top of 
> [https://github.com/apache/hudi/pull/2532|https://github.com/apache/hudi/pull/2532/files]
> and adds a pluggable way to support 
> new way to run DDL queries using HMS. 
>  
> Different DDL executors can be selected via diff syncConfig options
> useJDBC true -> JDBCExecutor will be used
> useJDBC false -> QlHiveQueryExecutor will be used
> useHMS true -> HMSDDLExecutor will be used.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] stym06 commented on pull request #2879: [HUDI-1848] Adding support for HMS for running DDL queries in hive-sy…

2021-07-29 Thread GitBox


stym06 commented on pull request #2879:
URL: https://github.com/apache/hudi/pull/2879#issuecomment-889577462


   What parameters are required to be passed to sync with HMS ? Can we use the 
thrift url?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fengjian428 commented on issue #3327: [SUPPORT] ingetst avro nested array field error occur

2021-07-29 Thread GitBox


fengjian428 commented on issue #3327:
URL: https://github.com/apache/hudi/issues/3327#issuecomment-889576271


   > @fengjian428 I will try to reproduce this. Please take a look at 
https://hudi.apache.org/docs/writing_data.html#schema-evolution
   > The exception says that 'array' field not found. Was the scehma changed in 
between and there are partitions with older schema as well?
   
   no,I can reproduce it with new table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2252) Replace read full data with read latest commit data in flink stream read

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390229#comment-17390229
 ] 

ASF GitHub Bot commented on HUDI-2252:
--

hudi-bot edited a comment on pull request #3368:
URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348


   
   ## CI report:
   
   * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243)
 
   * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace read full data with read  latest commit data in flink stream read
> -
>
> Key: HUDI-2252
> URL: https://issues.apache.org/jira/browse/HUDI-2252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Replace read full data with read latest commit data in flink stream read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3368: [HUDI-2252] Default consumes from the latest instant for flink streaming reader

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3368:
URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348


   
   ## CI report:
   
   * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243)
 
   * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2252) Replace read full data with read latest commit data in flink stream read

2021-07-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390227#comment-17390227
 ] 

ASF GitHub Bot commented on HUDI-2252:
--

hudi-bot edited a comment on pull request #3368:
URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348


   
   ## CI report:
   
   * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243)
 
   * fa0e1ada155715e783630f0fb2ff3120b0ca683c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace read full data with read  latest commit data in flink stream read
> -
>
> Key: HUDI-2252
> URL: https://issues.apache.org/jira/browse/HUDI-2252
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Replace read full data with read latest commit data in flink stream read.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3368: [HUDI-2252] Default consumes from the latest instant for flink streaming reader

2021-07-29 Thread GitBox


hudi-bot edited a comment on pull request #3368:
URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348


   
   ## CI report:
   
   * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243)
 
   * fa0e1ada155715e783630f0fb2ff3120b0ca683c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1987) Fix non partition table hive meta sync for flink writer

2021-07-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra resolved HUDI-1987.
-
Resolution: Fixed

> Fix non partition table hive meta sync for flink writer
> ---
>
> Key: HUDI-1987
> URL: https://issues.apache.org/jira/browse/HUDI-1987
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1987) Fix non partition table hive meta sync for flink writer

2021-07-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-1987:

Status: In Progress  (was: Open)

> Fix non partition table hive meta sync for flink writer
> ---
>
> Key: HUDI-1987
> URL: https://issues.apache.org/jira/browse/HUDI-1987
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1986) Skip creating marker files for flink merge handle

2021-07-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-1986:

Status: In Progress  (was: Open)

> Skip creating marker files for flink merge handle
> -
>
> Key: HUDI-1986
> URL: https://issues.apache.org/jira/browse/HUDI-1986
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
>
> Skip creating the marker files for flink merge handle to make it more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1986) Skip creating marker files for flink merge handle

2021-07-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra resolved HUDI-1986.
-
Fix Version/s: 0.9.0
   Resolution: Fixed

> Skip creating marker files for flink merge handle
> -
>
> Key: HUDI-1986
> URL: https://issues.apache.org/jira/browse/HUDI-1986
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Skip creating the marker files for flink merge handle to make it more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1980) Optimize the code to prevent other exceptions from causing resources not to be closed

2021-07-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra resolved HUDI-1980.
-
Fix Version/s: 0.9.0
 Assignee: Wei
   Resolution: Fixed

> Optimize the code to prevent other exceptions from causing resources not to 
> be closed
> -
>
> Key: HUDI-1980
> URL: https://issues.apache.org/jira/browse/HUDI-1980
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Wei
>Assignee: Wei
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
>  When  *HoodieHiveClient* initializing resources, some exceptions may cause 
> resources to fail to close



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1980) Optimize the code to prevent other exceptions from causing resources not to be closed

2021-07-29 Thread Udit Mehrotra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-1980:

Status: In Progress  (was: Open)

> Optimize the code to prevent other exceptions from causing resources not to 
> be closed
> -
>
> Key: HUDI-1980
> URL: https://issues.apache.org/jira/browse/HUDI-1980
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Wei
>Priority: Critical
>  Labels: pull-request-available
>
>  When  *HoodieHiveClient* initializing resources, some exceptions may cause 
> resources to fail to close



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >