[GitHub] [hudi] dongkelun closed pull request #3376: [HUDI-2259]Fix the exception of Merge Into when source table with columnAliases

2021-08-01 Thread GitBox


dongkelun closed pull request #3376:
URL: https://github.com/apache/hudi/pull/3376


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Merge Into when source table with columnAliases throws exception

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391112#comment-17391112
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

dongkelun closed pull request #3376:
URL: https://github.com/apache/hudi/pull/3376


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge Into when source table with columnAliases throws exception
> 
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun opened a new pull request #3377: [HUDI-2259]Fix the exception of Merge Into when source table with col…

2021-08-01 Thread GitBox


dongkelun opened a new pull request #3377:
URL: https://github.com/apache/hudi/pull/3377


   Fix the exception of Merge Into when source table with columnAliases.
   But still not support when the target table with columnAliases,I don't know 
is this because hudi whether itself is not supported this case for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Merge Into when source table with columnAliases throws exception

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391115#comment-17391115
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

dongkelun opened a new pull request #3377:
URL: https://github.com/apache/hudi/pull/3377


   Fix the exception of Merge Into when source table with columnAliases.
   But still not support when the target table with columnAliases,I don't know 
is this because hudi whether itself is not supported this case for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge Into when source table with columnAliases throws exception
> 
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3377: [HUDI-2259]Fix the exception of Merge Into when source table with col…

2021-08-01 Thread GitBox


hudi-bot commented on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890478045


   
   ## CI report:
   
   * bd1fca2aedb3a1e01096cb244a2d55a1e4d11048 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Merge Into when source table with columnAliases throws exception

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391116#comment-17391116
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

hudi-bot commented on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890478045


   
   ## CI report:
   
   * bd1fca2aedb3a1e01096cb244a2d55a1e4d11048 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge Into when source table with columnAliases throws exception
> 
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3377: [HUDI-2259]Fix the exception of Merge Into when source table with col…

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890478045


   
   ## CI report:
   
   * bd1fca2aedb3a1e01096cb244a2d55a1e4d11048 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1292)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Merge Into when source table with columnAliases throws exception

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391117#comment-17391117
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

hudi-bot edited a comment on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890478045


   
   ## CI report:
   
   * bd1fca2aedb3a1e01096cb244a2d55a1e4d11048 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1292)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge Into when source table with columnAliases throws exception
> 
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] leesf commented on a change in pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-08-01 Thread GitBox


leesf commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r680482852



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieOptimizeConfig.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.config;
+
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Hoodie Configs for Data layout optimize.
+ */
+public class HoodieOptimizeConfig extends HoodieConfig {
+  // Any Data layout optimize params can be saved with this prefix
+  public static final String DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX = 
"hoodie.data.layout.optimize.";
+  public static final ConfigProperty DATA_LAYOUT_STRATEGY = ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "strategy")
+  .defaultValue("z-order")
+  .sinceVersion("0.10.0")
+  .withDocumentation("config to provide a way to optimize data layout for 
table, current only support z-order and hilbert");
+
+  public static final ConfigProperty DATA_LAYOUT_BUILD_CURVE_METHOD = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "build.curve.optimize.method")
+  .defaultValue("directly")
+  .sinceVersion("0.10.0")
+  .withDocumentation("Config to provide whether use directly/sample method 
to build curve optimize for data layout,"
+  + " build curve_optimize by directly method is faster than by sample 
method, however sample method produce a better data layout");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SAMPLE_NUMBER 
= ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sample.number")
+  .defaultValue("20")
+  .sinceVersion("0.10.0")
+  .withDocumentation("when set" + DATA_LAYOUT_BUILD_CURVE_METHOD.key() + " 
to sample method, sample number need to be set for it."
+  + " larger number means better layout result, but more memory 
consumer");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SORT_COLUMNS = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sort.columns")
+  .defaultValue("")
+  .sinceVersion("0.10.0")
+  .withDocumentation("sort columns for build curve optimize. default value 
is empty string which means no sort."
+  + " more sort columns you specify, the worse data layout result. No 
more than 4 are recommended");
+
+  public static final ConfigProperty DATA_LAYOUT_DATA_SKIPPING_ENABLE = 
ConfigProperty

Review comment:
   > sorry, i cannot get the point. may be i miss somethings
   
   ConfigProperty




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-08-01 Thread GitBox


leesf commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r680482852



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieOptimizeConfig.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.config;
+
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Hoodie Configs for Data layout optimize.
+ */
+public class HoodieOptimizeConfig extends HoodieConfig {
+  // Any Data layout optimize params can be saved with this prefix
+  public static final String DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX = 
"hoodie.data.layout.optimize.";
+  public static final ConfigProperty DATA_LAYOUT_STRATEGY = ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "strategy")
+  .defaultValue("z-order")
+  .sinceVersion("0.10.0")
+  .withDocumentation("config to provide a way to optimize data layout for 
table, current only support z-order and hilbert");
+
+  public static final ConfigProperty DATA_LAYOUT_BUILD_CURVE_METHOD = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "build.curve.optimize.method")
+  .defaultValue("directly")
+  .sinceVersion("0.10.0")
+  .withDocumentation("Config to provide whether use directly/sample method 
to build curve optimize for data layout,"
+  + " build curve_optimize by directly method is faster than by sample 
method, however sample method produce a better data layout");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SAMPLE_NUMBER 
= ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sample.number")
+  .defaultValue("20")
+  .sinceVersion("0.10.0")
+  .withDocumentation("when set" + DATA_LAYOUT_BUILD_CURVE_METHOD.key() + " 
to sample method, sample number need to be set for it."
+  + " larger number means better layout result, but more memory 
consumer");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SORT_COLUMNS = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sort.columns")
+  .defaultValue("")
+  .sinceVersion("0.10.0")
+  .withDocumentation("sort columns for build curve optimize. default value 
is empty string which means no sort."
+  + " more sort columns you specify, the worse data layout result. No 
more than 4 are recommended");
+
+  public static final ConfigProperty DATA_LAYOUT_DATA_SKIPPING_ENABLE = 
ConfigProperty

Review comment:
   > sorry, i cannot get the point. may be i miss somethings
   
   `ConfigProperty`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2101) support z-order for hudi

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391123#comment-17391123
 ] 

ASF GitHub Bot commented on HUDI-2101:
--

leesf commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r680482852



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieOptimizeConfig.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.config;
+
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Hoodie Configs for Data layout optimize.
+ */
+public class HoodieOptimizeConfig extends HoodieConfig {
+  // Any Data layout optimize params can be saved with this prefix
+  public static final String DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX = 
"hoodie.data.layout.optimize.";
+  public static final ConfigProperty DATA_LAYOUT_STRATEGY = ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "strategy")
+  .defaultValue("z-order")
+  .sinceVersion("0.10.0")
+  .withDocumentation("config to provide a way to optimize data layout for 
table, current only support z-order and hilbert");
+
+  public static final ConfigProperty DATA_LAYOUT_BUILD_CURVE_METHOD = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "build.curve.optimize.method")
+  .defaultValue("directly")
+  .sinceVersion("0.10.0")
+  .withDocumentation("Config to provide whether use directly/sample method 
to build curve optimize for data layout,"
+  + " build curve_optimize by directly method is faster than by sample 
method, however sample method produce a better data layout");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SAMPLE_NUMBER 
= ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sample.number")
+  .defaultValue("20")
+  .sinceVersion("0.10.0")
+  .withDocumentation("when set" + DATA_LAYOUT_BUILD_CURVE_METHOD.key() + " 
to sample method, sample number need to be set for it."
+  + " larger number means better layout result, but more memory 
consumer");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SORT_COLUMNS = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sort.columns")
+  .defaultValue("")
+  .sinceVersion("0.10.0")
+  .withDocumentation("sort columns for build curve optimize. default value 
is empty string which means no sort."
+  + " more sort columns you specify, the worse data layout result. No 
more than 4 are recommended");
+
+  public static final ConfigProperty DATA_LAYOUT_DATA_SKIPPING_ENABLE = 
ConfigProperty

Review comment:
   > sorry, i cannot get the point. may be i miss somethings
   
   ConfigProperty




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support z-order for hudi
> 
>
> Key: HUDI-2101
> URL: https://issues.apache.org/jira/browse/HUDI-2101
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> support z-order for hudi to optimze the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2101) support z-order for hudi

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391124#comment-17391124
 ] 

ASF GitHub Bot commented on HUDI-2101:
--

leesf commented on a change in pull request #3330:
URL: https://github.com/apache/hudi/pull/3330#discussion_r680482852



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieOptimizeConfig.java
##
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.config;
+
+import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
+
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.Properties;
+
+/**
+ * Hoodie Configs for Data layout optimize.
+ */
+public class HoodieOptimizeConfig extends HoodieConfig {
+  // Any Data layout optimize params can be saved with this prefix
+  public static final String DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX = 
"hoodie.data.layout.optimize.";
+  public static final ConfigProperty DATA_LAYOUT_STRATEGY = ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "strategy")
+  .defaultValue("z-order")
+  .sinceVersion("0.10.0")
+  .withDocumentation("config to provide a way to optimize data layout for 
table, current only support z-order and hilbert");
+
+  public static final ConfigProperty DATA_LAYOUT_BUILD_CURVE_METHOD = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "build.curve.optimize.method")
+  .defaultValue("directly")
+  .sinceVersion("0.10.0")
+  .withDocumentation("Config to provide whether use directly/sample method 
to build curve optimize for data layout,"
+  + " build curve_optimize by directly method is faster than by sample 
method, however sample method produce a better data layout");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SAMPLE_NUMBER 
= ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sample.number")
+  .defaultValue("20")
+  .sinceVersion("0.10.0")
+  .withDocumentation("when set" + DATA_LAYOUT_BUILD_CURVE_METHOD.key() + " 
to sample method, sample number need to be set for it."
+  + " larger number means better layout result, but more memory 
consumer");
+
+  public static final ConfigProperty DATA_LAYOUT_CURVE_OPTIMIZE_SORT_COLUMNS = 
ConfigProperty
+  .key(DATA_LAYOUT_OPTIMIZE_PARAM_PREFIX + "curve.optimize.sort.columns")
+  .defaultValue("")
+  .sinceVersion("0.10.0")
+  .withDocumentation("sort columns for build curve optimize. default value 
is empty string which means no sort."
+  + " more sort columns you specify, the worse data layout result. No 
more than 4 are recommended");
+
+  public static final ConfigProperty DATA_LAYOUT_DATA_SKIPPING_ENABLE = 
ConfigProperty

Review comment:
   > sorry, i cannot get the point. may be i miss somethings
   
   `ConfigProperty`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> support z-order for hudi
> 
>
> Key: HUDI-2101
> URL: https://issues.apache.org/jira/browse/HUDI-2101
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> support z-order for hudi to optimze the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3377: [HUDI-2259]Fix the exception of Merge Into when source table with col…

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890478045


   
   ## CI report:
   
   * bd1fca2aedb3a1e01096cb244a2d55a1e4d11048 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1292)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Merge Into when source table with columnAliases throws exception

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391127#comment-17391127
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

hudi-bot edited a comment on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890478045


   
   ## CI report:
   
   * bd1fca2aedb3a1e01096cb244a2d55a1e4d11048 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1292)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Merge Into when source table with columnAliases throws exception
> 
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2199) DynamoDB based external index implementation

2021-08-01 Thread Biswajit mohapatra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391128#comment-17391128
 ] 

Biswajit mohapatra commented on HUDI-2199:
--

so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

> DynamoDB based external index implementation
> 
>
> Key: HUDI-2199
> URL: https://issues.apache.org/jira/browse/HUDI-2199
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: Biswajit mohapatra
>Priority: Major
>
> We have a HBaseIndex, that provides uses with ability to store fileID <=> 
> recordKey mappings in an external kv store, for fast lookups during upsert 
> operations. We can potentially create a similar one for DynamoDB. 
> We just use a single column family in HBase, so we should be able to largely 
> re-use the code/key-value schema across them even. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2199) DynamoDB based external index implementation

2021-08-01 Thread Biswajit mohapatra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391128#comment-17391128
 ] 

Biswajit mohapatra edited comment on HUDI-2199 at 8/1/21, 9:54 AM:
---

so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 

also can you say me the class name for hbase from where i can get an idea ?


was (Author: biswajit11):
so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

> DynamoDB based external index implementation
> 
>
> Key: HUDI-2199
> URL: https://issues.apache.org/jira/browse/HUDI-2199
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: Biswajit mohapatra
>Priority: Major
>
> We have a HBaseIndex, that provides uses with ability to store fileID <=> 
> recordKey mappings in an external kv store, for fast lookups during upsert 
> operations. We can potentially create a similar one for DynamoDB. 
> We just use a single column family in HBase, so we should be able to largely 
> re-use the code/key-value schema across them even. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] cdmikechen opened a new pull request #3378: [HUDI-83] Fix Timestamp type read by Hive

2021-08-01 Thread GitBox


cdmikechen opened a new pull request #3378:
URL: https://github.com/apache/hudi/pull/3378


   ## What is the purpose of the pull request
   
   This pull request let hive can read timestamp type column datas correctly.
   
   The problem was initially related to JIRA 
[HUDI-83](https://issues.apache.org/jira/browse/HUDI-83) and related issues on 
issue https://github.com/apache/hudi/issues/2544
   
   ## Brief change log

   -  Change `HoodieParquetInputFormat` to use a custom `ParquetInputFormat` 
named `HudiAvroParquetInputFormat`
   - In `HudiAvroParquetInputFormat` we use a custom `RecordReader` named 
`HudiAvroParquetReader`. In this class we use `AvroReadSupport` so that Hive 
can get parquet data with an avro GenericRecord.
   - Use 
`org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.avroToArrayWritable`
 to transform GenericRecord to ArrayWriteable. At the same time, timestamp 
processing for different situations of hive2 and hive3 is added to this method.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
`org.apache.hudi.hadoop.TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimestamp`
   
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2021-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-83:
---
Labels: pull-request-available sev:critical user-support-issues  (was: 
sev:critical user-support-issues)

> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391129#comment-17391129
 ] 

ASF GitHub Bot commented on HUDI-83:


cdmikechen opened a new pull request #3378:
URL: https://github.com/apache/hudi/pull/3378


   ## What is the purpose of the pull request
   
   This pull request let hive can read timestamp type column datas correctly.
   
   The problem was initially related to JIRA 
[HUDI-83](https://issues.apache.org/jira/browse/HUDI-83) and related issues on 
issue https://github.com/apache/hudi/issues/2544
   
   ## Brief change log

   -  Change `HoodieParquetInputFormat` to use a custom `ParquetInputFormat` 
named `HudiAvroParquetInputFormat`
   - In `HudiAvroParquetInputFormat` we use a custom `RecordReader` named 
`HudiAvroParquetReader`. In this class we use `AvroReadSupport` so that Hive 
can get parquet data with an avro GenericRecord.
   - Use 
`org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.avroToArrayWritable`
 to transform GenericRecord to ArrayWriteable. At the same time, timestamp 
processing for different situations of hive2 and hive3 is added to this method.
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
`org.apache.hudi.hadoop.TestHoodieParquetInputFormat.testHoodieParquetInputFormatReadTimestamp`
   
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3378: [HUDI-83] Fix Timestamp type read by Hive

2021-08-01 Thread GitBox


hudi-bot commented on pull request #3378:
URL: https://github.com/apache/hudi/pull/3378#issuecomment-890488457


   
   ## CI report:
   
   * e5686c25f71a8fa0568639270ba710e0fe9bbdd2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391131#comment-17391131
 ] 

ASF GitHub Bot commented on HUDI-83:


hudi-bot commented on pull request #3378:
URL: https://github.com/apache/hudi/pull/3378#issuecomment-890488457


   
   ## CI report:
   
   * e5686c25f71a8fa0568639270ba710e0fe9bbdd2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3378: [HUDI-83] Fix Timestamp type read by Hive

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3378:
URL: https://github.com/apache/hudi/pull/3378#issuecomment-890488457


   
   ## CI report:
   
   * e5686c25f71a8fa0568639270ba710e0fe9bbdd2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1293)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391132#comment-17391132
 ] 

ASF GitHub Bot commented on HUDI-83:


hudi-bot edited a comment on pull request #3378:
URL: https://github.com/apache/hudi/pull/3378#issuecomment-890488457


   
   ## CI report:
   
   * e5686c25f71a8fa0568639270ba710e0fe9bbdd2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1293)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2199) DynamoDB based external index implementation

2021-08-01 Thread Biswajit mohapatra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391128#comment-17391128
 ] 

Biswajit mohapatra edited comment on HUDI-2199 at 8/1/21, 10:11 AM:


so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?


was (Author: biswajit11):
so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 

also can you say me the class name for hbase from where i can get an idea ?

> DynamoDB based external index implementation
> 
>
> Key: HUDI-2199
> URL: https://issues.apache.org/jira/browse/HUDI-2199
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: Biswajit mohapatra
>Priority: Major
>
> We have a HBaseIndex, that provides uses with ability to store fileID <=> 
> recordKey mappings in an external kv store, for fast lookups during upsert 
> operations. We can potentially create a similar one for DynamoDB. 
> We just use a single column family in HBase, so we should be able to largely 
> re-use the code/key-value schema across them even. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2199) DynamoDB based external index implementation

2021-08-01 Thread Biswajit mohapatra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391128#comment-17391128
 ] 

Biswajit mohapatra edited comment on HUDI-2199 at 8/1/21, 10:15 AM:


so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?

 

 

EDIT 

 

i was just checking some hbase implementation classes in hudi 

 

i found out it stores the data in this format 

partition_path, fileID, commitTime

 

As in dynamodb there is a concept of PK and SK can you let me know what would 
be the pk and sk for this would be and can you give an example of data that 
gets stored here ?

 

 


was (Author: biswajit11):
so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?

> DynamoDB based external index implementation
> 
>
> Key: HUDI-2199
> URL: https://issues.apache.org/jira/browse/HUDI-2199
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: Biswajit mohapatra
>Priority: Major
>
> We have a HBaseIndex, that provides uses with ability to store fileID <=> 
> recordKey mappings in an external kv store, for fast lookups during upsert 
> operations. We can potentially create a similar one for DynamoDB. 
> We just use a single column family in HBase, so we should be able to largely 
> re-use the code/key-value schema across them even. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2199) DynamoDB based external index implementation

2021-08-01 Thread Biswajit mohapatra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391128#comment-17391128
 ] 

Biswajit mohapatra edited comment on HUDI-2199 at 8/1/21, 10:42 AM:


so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?

 

 

EDIT 

 

i was just checking some hbase implementation classes in hudi 

 

i found out it stores the data in this format 

partition_path  -> is it string ?

, fileID  ---> string or number ?

 

commitTime   -> what is the format of the time ?

 

 

 

As in dynamodb there is a concept of PK and SK can you let me know what would 
be the pk and sk for this would be and can you give an example of data that 
gets stored here ?

 

 


was (Author: biswajit11):
so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?

 

 

EDIT 

 

i was just checking some hbase implementation classes in hudi 

 

i found out it stores the data in this format 

partition_path, fileID, commitTime

 

As in dynamodb there is a concept of PK and SK can you let me know what would 
be the pk and sk for this would be and can you give an example of data that 
gets stored here ?

 

 

> DynamoDB based external index implementation
> 
>
> Key: HUDI-2199
> URL: https://issues.apache.org/jira/browse/HUDI-2199
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: Biswajit mohapatra
>Priority: Major
>
> We have a HBaseIndex, that provides uses with ability to store fileID <=> 
> recordKey mappings in an external kv store, for fast lookups during upsert 
> operations. We can potentially create a similar one for DynamoDB. 
> We just use a single column family in HBase, so we should be able to largely 
> re-use the code/key-value schema across them even. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-2199) DynamoDB based external index implementation

2021-08-01 Thread Biswajit mohapatra (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391128#comment-17391128
 ] 

Biswajit mohapatra edited comment on HUDI-2199 at 8/1/21, 10:43 AM:


so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?

 

 

EDIT 

 

i was just checking some hbase implementation classes in hudi 

 

i found out it stores the data in this format 

partition_path  -> is it string ?

, fileID  ---> string or number ?

 

commitTime   -> new Date().getTime();

 

 

 

As in dynamodb there is a concept of PK and SK can you let me know what would 
be the pk and sk for this would be and can you give an example of data that 
gets stored here ?

 

 


was (Author: biswajit11):
so we will need properties for dynamodb also , like in emr we have a table name 
emrfs , can we have a table like that called as hudifs which gets created if 
the table doesn't exist ?

 Just one question does it creates a table for each table name ingested or it 
uses a generic table for table ingested into hudi ?

 

also can you say me the class name for hbase from where i can get an idea ?

 

 

EDIT 

 

i was just checking some hbase implementation classes in hudi 

 

i found out it stores the data in this format 

partition_path  -> is it string ?

, fileID  ---> string or number ?

 

commitTime   -> what is the format of the time ?

 

 

 

As in dynamodb there is a concept of PK and SK can you let me know what would 
be the pk and sk for this would be and can you give an example of data that 
gets stored here ?

 

 

> DynamoDB based external index implementation
> 
>
> Key: HUDI-2199
> URL: https://issues.apache.org/jira/browse/HUDI-2199
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Index
>Reporter: Vinoth Chandar
>Assignee: Biswajit mohapatra
>Priority: Major
>
> We have a HBaseIndex, that provides uses with ability to store fileID <=> 
> recordKey mappings in an external kv store, for fast lookups during upsert 
> operations. We can potentially create a similar one for DynamoDB. 
> We just use a single column family in HBase, so we should be able to largely 
> re-use the code/key-value schema across them even. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3378: [HUDI-83] Fix Timestamp type read by Hive

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3378:
URL: https://github.com/apache/hudi/pull/3378#issuecomment-890488457


   
   ## CI report:
   
   * e5686c25f71a8fa0568639270ba710e0fe9bbdd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1293)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391134#comment-17391134
 ] 

ASF GitHub Bot commented on HUDI-83:


hudi-bot edited a comment on pull request #3378:
URL: https://github.com/apache/hudi/pull/3378#issuecomment-890488457


   
   ## CI report:
   
   * e5686c25f71a8fa0568639270ba710e0fe9bbdd2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1293)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Map Timestamp type in spark to corresponding Timestamp type in Hive during 
> Hive sync
> 
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration, Usability
>Affects Versions: 0.9.0
>Reporter: Vinoth Chandar
>Assignee: cdmikechen
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.9.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] &; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] jsbali commented on a change in pull request #3364: [HUDI-2248] Fixing the closing of hms client

2021-08-01 Thread GitBox


jsbali commented on a change in pull request #3364:
URL: https://github.com/apache/hudi/pull/3364#discussion_r680533347



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##
@@ -295,7 +295,7 @@ public void close() {
 try {
   ddlExecutor.close();
   if (client != null) {
-client.close();
+Hive.closeCurrent();
 client = null;

Review comment:
   Sure @yanghua will take a look and fix those as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2248) Unable to shutdown local metastore client

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391194#comment-17391194
 ] 

ASF GitHub Bot commented on HUDI-2248:
--

jsbali commented on a change in pull request #3364:
URL: https://github.com/apache/hudi/pull/3364#discussion_r680533347



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java
##
@@ -295,7 +295,7 @@ public void close() {
 try {
   ddlExecutor.close();
   if (client != null) {
-client.close();
+Hive.closeCurrent();
 client = null;

Review comment:
   Sure @yanghua will take a look and fix those as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Unable to shutdown local metastore client
> -
>
> Key: HUDI-2248
> URL: https://issues.apache.org/jira/browse/HUDI-2248
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: Jagmeet Bali
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/hudi/issues/3187



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)

2021-08-01 Thread GitBox


vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890585823


   Thanks for the review @nsivabalan!
   
   1. Updated the roadmap link to streaming data lake's latest blog link.
   2. The "Next" link shows the latest master branch docs, Next meaning the 
next unreleased version docs, it's not selected by default, only the latest 
stable release docs are shown by default, the next version is easier for 
developers to validate their docs changes. - There is a flag to hide the next 
link, if you still prefer to hide it, let me know I can disable it.
   3. Added Robinhood to the powered-by page.
   4. Yes, we can make use of the tabs to separate out the spark/flink configs, 
I see that there are few changes already in the latest unreleased version, we 
can take that later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391242#comment-17391242
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890585823


   Thanks for the review @nsivabalan!
   
   1. Updated the roadmap link to streaming data lake's latest blog link.
   2. The "Next" link shows the latest master branch docs, Next meaning the 
next unreleased version docs, it's not selected by default, only the latest 
stable release docs are shown by default, the next version is easier for 
developers to validate their docs changes. - There is a flag to hide the next 
link, if you still prefer to hide it, let me know I can disable it.
   3. Added Robinhood to the powered-by page.
   4. Yes, we can make use of the tabs to separate out the spark/flink configs, 
I see that there are few changes already in the latest unreleased version, we 
can take that later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vingov commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)

2021-08-01 Thread GitBox


vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890587187


   Moreover, for the next version, there is a banner added to let users know 
that these docs are for the unreleased version:
   
   https://user-images.githubusercontent.com/1142498/127785355-e39f3aa4-1011-4259-83a1-9a806e687839.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391243#comment-17391243
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890587187


   Moreover, for the next version, there is a banner added to let users know 
that these docs are for the unreleased version:
   
   https://user-images.githubusercontent.com/1142498/127785355-e39f3aa4-1011-4259-83a1-9a806e687839.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] lw309637554 merged pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering

2021-08-01 Thread GitBox


lw309637554 merged pull request #3259:
URL: https://github.com/apache/hudi/pull/3259


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering (#3259)

2021-08-01 Thread liway
This is an automated email from the ASF dual-hosted git repository.

liway pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new dde57b2  [HUDI-2164] Let users build cluster plan and execute this 
plan at once using HoodieClusteringJob for async clustering (#3259)
dde57b2 is described below

commit dde57b293cd22ef1bd89f44054f109a52fc20e56
Author: zhangyue19921010 <69956021+zhangyue19921...@users.noreply.github.com>
AuthorDate: Mon Aug 2 08:07:59 2021 +0800

[HUDI-2164] Let users build cluster plan and execute this plan at once 
using HoodieClusteringJob for async clustering (#3259)

* add --mode schedule/execute/scheduleandexecute

* fix checkstyle

* add UT testHoodieAsyncClusteringJobWithScheduleAndExecute

* log changed

* try to make ut success

* try to fix ut

* modify ut

* review changed

* code review

* code review

* code review

* code review

Co-authored-by: yuezhang 
---
 .../apache/hudi/utilities/HoodieClusteringJob.java | 105 +---
 .../functional/TestHoodieDeltaStreamer.java| 139 ++---
 2 files changed, 207 insertions(+), 37 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
index a4dc741..8f74892 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java
@@ -31,7 +31,9 @@ import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.table.HoodieTableMetaClient;
 import org.apache.hudi.common.table.TableSchemaResolver;
 import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
 import org.apache.hudi.exception.HoodieException;
+
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 import org.apache.spark.api.java.JavaRDD;
@@ -49,6 +51,9 @@ public class HoodieClusteringJob {
   private transient FileSystem fs;
   private TypedProperties props;
   private final JavaSparkContext jsc;
+  public static final String EXECUTE = "execute";
+  public static final String SCHEDULE = "schedule";
+  public static final String SCHEDULE_AND_EXECUTE = "scheduleandexecute";
 
   public HoodieClusteringJob(JavaSparkContext jsc, Config cfg) {
 this.cfg = cfg;
@@ -71,8 +76,8 @@ public class HoodieClusteringJob {
 public String basePath = null;
 @Parameter(names = {"--table-name", "-tn"}, description = "Table name", 
required = true)
 public String tableName = null;
-@Parameter(names = {"--instant-time", "-it"}, description = "Clustering 
Instant time, only need when cluster. "
-+ "And schedule clustering can generate it.", required = false)
+@Parameter(names = {"--instant-time", "-it"}, description = "Clustering 
Instant time, only need when set --mode execute. "
++ "When set \"--mode scheduleAndExecute\" this instant-time will 
be ignored.", required = false)
 public String clusteringInstantTime = null;
 @Parameter(names = {"--parallelism", "-pl"}, description = "Parallelism 
for hoodie insert", required = false)
 public int parallelism = 1;
@@ -83,8 +88,14 @@ public class HoodieClusteringJob {
 @Parameter(names = {"--retry", "-rt"}, description = "number of retries", 
required = false)
 public int retry = 0;
 
-@Parameter(names = {"--schedule", "-sc"}, description = "Schedule 
clustering")
+@Parameter(names = {"--schedule", "-sc"}, description = "Schedule 
clustering @desperate soon please use \"--mode schedule\" instead")
 public Boolean runSchedule = false;
+
+@Parameter(names = {"--mode", "-m"}, description = "Set job mode: Set 
\"schedule\" means make a cluster plan; "
++ "Set \"execute\" means execute a cluster plan at given instant 
which means --instant-time is needed here; "
++ "Set \"scheduleAndExecute\" means make a cluster plan first and 
execute that plan immediately", required = false)
+public String runningMode = null;
+
 @Parameter(names = {"--help", "-h"}, help = true)
 public Boolean help = false;
 
@@ -101,15 +112,17 @@ public class HoodieClusteringJob {
   public static void main(String[] args) {
 final Config cfg = new Config();
 JCommander cmd = new JCommander(cfg, null, args);
-if (cfg.help || args.length == 0 || (!cfg.runSchedule && 
cfg.clusteringInstantTime == null)) {
+
+if (cfg.help || args.length == 0) {
   cmd.usage();
   System.exit(1);
 }
+
 final JavaSparkContext jsc = UtilHelpers.buildSparkContext("clustering-" + 
cfg.tableName, cfg.sparkMaster, cfg.sparkMemory);
 HoodieClusteringJob clusteringJob = new HoodieClusteringJob(jsc, cfg);
 

[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391254#comment-17391254
 ] 

ASF GitHub Bot commented on HUDI-2164:
--

lw309637554 merged pull request #3259:
URL: https://github.com/apache/hudi/pull/3259


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Build cluster plan and execute this plan at once for HoodieClusteringJob
> 
>
> Key: HUDI-2164
> URL: https://issues.apache.org/jira/browse/HUDI-2164
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> For now, Hudi can let users submit a HoodieClusteringJob to build a 
> clustering plan or execute a clustering plan through --schedule or 
> --instant-time config.
> If users want to trigger a clustering job, he has to 
>  # Submit a HoodieClusteringJob to build a clustering job through --schedule 
> config
>  # Copy the created clustering Instant time form Log info.
>  # Submit the HoodieClusteringJob again to execute this created clustering 
> plan through --instant-time config.
> The pain point is that there are too many steps when trigger a clustering and 
> need to copy and paste the instant time from log file manually so that we 
> can't make it automatically.
>  
> I just raise a PR to offer a new config named --mode or -m in short 
> ||--mode||remarks||
> |execute|Execute a cluster plan at given instant which means --instant-time 
> is needed here. default value. |
> |schedule|Make a clustering plan.|
> |*scheduleAndExecute*|Make a cluster plan first and execute that plan 
> immediately|
> Now users can use --mode scheduleAndExecute to Build cluster plan and execute 
> this plan at once using HoodieClusteringJob.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)

2021-08-01 Thread GitBox


vinothchandar commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890644163


   @nsivabalan lets keep this to just reflecting the current site as-is and 
avoid adding new links or powered-by? We can do these in smaller steps after? 
   
   @vingov Once we land this, I assume travis will be broken and we will have 
to fix the automatic site build in a follow on PR? Just trying to understand 
the plans


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391257#comment-17391257
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

vinothchandar commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890644163


   @nsivabalan lets keep this to just reflecting the current site as-is and 
avoid adding new links or powered-by? We can do these in smaller steps after? 
   
   @vingov Once we land this, I assume travis will be broken and we will have 
to fix the automatic site build in a follow on PR? Just trying to understand 
the plans


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


vinothchandar commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680614863



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -157,6 +157,11 @@
   .withDocumentation("When enabled, populates all meta fields. When 
disabled, no meta fields are populated "
   + "and incremental queries will not be functional. This is only 
meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty HOODIE_TABLE_KEY_GENERATOR_CLASS 
= ConfigProperty
+  .key("hoodie.datasource.write.keygenerator.class")

Review comment:
   do we really write this as `hoodie.datasource` prefix?  This is a table 
level property, not just for the datasource write. lets fix this?

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/CompactionTestBase.java
##
@@ -198,7 +198,7 @@ protected void executeCompaction(String 
compactionInstantTime, SparkRDDWriteClie
 assertEquals(latestCompactionCommitTime, compactionInstantTime,
 "Expect compaction instant time to be the latest commit time");
 assertEquals(expectedNumRecs,
-HoodieClientTestUtils.countRecordsSince(jsc, basePath, sqlContext, 
timeline, "000"),
+HoodieClientTestUtils.countRecordsWithOptionalSince(jsc, basePath, 
sqlContext, timeline, Option.of("000")),

Review comment:
   rename `countRecordOptionallySince` 

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##
@@ -36,16 +38,20 @@
 
   private String basePath;
 
+  private Option hoodieVirtualKeyInfoOpt = 
Option.empty();
+
   public HoodieRealtimeFileSplit() {
 super();
   }
 
-  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, 
List deltaLogPaths, String maxCommitTime)
+  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, 
List deltaLogPaths, String maxCommitTime,
+ Option 
hoodieVirtualKeyInfoOpt)

Review comment:
   I think its okay to drop the `Opt` everywhere.  `virtualKeyInfo` is 
concise and already conveys the meaning. `virtualKeyInfo.isPresent()` will 
again convey what `Opt` would have conveyed. 
   
   Also lets drop `hoodie` prefix everywhere in the variables as well. 

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -52,8 +53,15 @@
*/
   String getBasePath();
 
+  /**
+   * Returns Virtual key info if meta fields are disabled.
+   * @return
+   */
+  Option getHoodieVirtualKeyInfoOpt();

Review comment:
   fix name.

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -204,23 +226,34 @@ private static Configuration 
addProjectionField(Configuration conf, String field
 return conf;
   }
 
-  public static void addRequiredProjectionFields(Configuration configuration) {
+  public static void addRequiredProjectionFields(Configuration configuration, 
Option hoodieVirtualKeyInfoOpt) {

Review comment:
   namign

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##
@@ -211,8 +213,11 @@ protected String getCommitActionType() {
   
.withBitCaskDiskMapCompressionEnabled(config.getCommonConfig().isBitCaskDiskMapCompressionEnabled())
   .build();
 
+  HoodieTableConfig tableConfig = 
table.getMetaClient().getTableConfig();
   
recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader, 
scanner, readerSchema,
-  table.getMetaClient().getTableConfig().getPayloadClass()));
+  tableConfig.getPayloadClass(),
+  tableConfig.populateMetaFields() ? Option.empty() : 
Option.of(tableConfig.getRecordKeyFieldProp()),

Review comment:
   we can use `Option>` anywhere both a record key and 
partition path field needs to be passed. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##
@@ -80,6 +82,10 @@
   private final HoodieTableMetaClient hoodieTableMetaClient;
   // Merge strategy to use when combining records from log
   private final String payloadClassFQN;
+  // simple recordKey field
+  private Option simpleRecordKeyField = Option.empty();

Review comment:
   lets use a Pair

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##
@@ -60,6 +66,16 @@ public String getBasePath() {
 return basePath;
   }
 
+  @Override
+  public void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt) {

Review comment:
   fix the getter and setter names

##
File path: 
hudi-common/src/main/java/org/a

[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391265#comment-17391265
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

vinothchandar commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680614863



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -157,6 +157,11 @@
   .withDocumentation("When enabled, populates all meta fields. When 
disabled, no meta fields are populated "
   + "and incremental queries will not be functional. This is only 
meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty HOODIE_TABLE_KEY_GENERATOR_CLASS 
= ConfigProperty
+  .key("hoodie.datasource.write.keygenerator.class")

Review comment:
   do we really write this as `hoodie.datasource` prefix?  This is a table 
level property, not just for the datasource write. lets fix this?

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/CompactionTestBase.java
##
@@ -198,7 +198,7 @@ protected void executeCompaction(String 
compactionInstantTime, SparkRDDWriteClie
 assertEquals(latestCompactionCommitTime, compactionInstantTime,
 "Expect compaction instant time to be the latest commit time");
 assertEquals(expectedNumRecs,
-HoodieClientTestUtils.countRecordsSince(jsc, basePath, sqlContext, 
timeline, "000"),
+HoodieClientTestUtils.countRecordsWithOptionalSince(jsc, basePath, 
sqlContext, timeline, Option.of("000")),

Review comment:
   rename `countRecordOptionallySince` 

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##
@@ -36,16 +38,20 @@
 
   private String basePath;
 
+  private Option hoodieVirtualKeyInfoOpt = 
Option.empty();
+
   public HoodieRealtimeFileSplit() {
 super();
   }
 
-  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, 
List deltaLogPaths, String maxCommitTime)
+  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, 
List deltaLogPaths, String maxCommitTime,
+ Option 
hoodieVirtualKeyInfoOpt)

Review comment:
   I think its okay to drop the `Opt` everywhere.  `virtualKeyInfo` is 
concise and already conveys the meaning. `virtualKeyInfo.isPresent()` will 
again convey what `Opt` would have conveyed. 
   
   Also lets drop `hoodie` prefix everywhere in the variables as well. 

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -52,8 +53,15 @@
*/
   String getBasePath();
 
+  /**
+   * Returns Virtual key info if meta fields are disabled.
+   * @return
+   */
+  Option getHoodieVirtualKeyInfoOpt();

Review comment:
   fix name.

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -204,23 +226,34 @@ private static Configuration 
addProjectionField(Configuration conf, String field
 return conf;
   }
 
-  public static void addRequiredProjectionFields(Configuration configuration) {
+  public static void addRequiredProjectionFields(Configuration configuration, 
Option hoodieVirtualKeyInfoOpt) {

Review comment:
   namign

##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##
@@ -211,8 +213,11 @@ protected String getCommitActionType() {
   
.withBitCaskDiskMapCompressionEnabled(config.getCommonConfig().isBitCaskDiskMapCompressionEnabled())
   .build();
 
+  HoodieTableConfig tableConfig = 
table.getMetaClient().getTableConfig();
   
recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader, 
scanner, readerSchema,
-  table.getMetaClient().getTableConfig().getPayloadClass()));
+  tableConfig.getPayloadClass(),
+  tableConfig.populateMetaFields() ? Option.empty() : 
Option.of(tableConfig.getRecordKeyFieldProp()),

Review comment:
   we can use `Option>` anywhere both a record key and 
partition path field needs to be passed. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##
@@ -80,6 +82,10 @@
   private final HoodieTableMetaClient hoodieTableMetaClient;
   // Merge strategy to use when combining records from log
   private final String payloadClassFQN;
+  // simple recordKey field
+  private Option simpleRecordKeyField = Option.empty();

Review comment:
   lets use a Pair

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##
@@ -60,6 +66,16 @@ public S

[GitHub] [hudi] vingov commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)

2021-08-01 Thread GitBox


vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890669535


   @vinothchandar - I have fixed the Travis script as well, similar to the 
jekyll script, it will build and move it to content folder and it will push it 
back to asf-site. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1985) Website re-design implementation

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391266#comment-17391266
 ] 

ASF GitHub Bot commented on HUDI-1985:
--

vingov commented on pull request #3366:
URL: https://github.com/apache/hudi/pull/3366#issuecomment-890669535


   @vinothchandar - I have fixed the Travis script as well, similar to the 
jekyll script, it will build and move it to content folder and it will push it 
back to asf-site. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Website re-design implementation
> 
>
> Key: HUDI-1985
> URL: https://issues.apache.org/jira/browse/HUDI-1985
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Assignee: Vinoth Govindarajan
>Priority: Blocker
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
>
> To provide better navigation and organization of Hudi website's info, we have 
> done a re-design of the web pages.
> Previous discussion
> [https://github.com/apache/hudi/issues/2905]
>  
> See the wireframe and final design in 
> [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6]
> (login Figma to comment)
> The design is ready for implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680643364



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -157,6 +157,11 @@
   .withDocumentation("When enabled, populates all meta fields. When 
disabled, no meta fields are populated "
   + "and incremental queries will not be functional. This is only 
meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty HOODIE_TABLE_KEY_GENERATOR_CLASS 
= ConfigProperty
+  .key("hoodie.datasource.write.keygenerator.class")

Review comment:
   my intention here is to just use the same config someone uses with spark 
datasource. Rather than introducing new configs. But guess we don't follow that 
for partition path nor record keys. Will fix it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391291#comment-17391291
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680643364



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -157,6 +157,11 @@
   .withDocumentation("When enabled, populates all meta fields. When 
disabled, no meta fields are populated "
   + "and incremental queries will not be functional. This is only 
meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty HOODIE_TABLE_KEY_GENERATOR_CLASS 
= ConfigProperty
+  .key("hoodie.datasource.write.keygenerator.class")

Review comment:
   my intention here is to just use the same config someone uses with spark 
datasource. Rather than introducing new configs. But guess we don't follow that 
for partition path nor record keys. Will fix it. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680643619



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, 
WriteOperationType op
 && Boolean.parseBoolean((String) 
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), 
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue( {
   throw new 
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already 
disabled for the table. Can't be re-enabled back");
 }
+
+// meta fields can be disabled only with SimpleKeyGenerator
+if (!getTableConfig().populateMetaFields()
+&& 
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
 "org.apache.hudi.keygen.SimpleKeyGenerator")

Review comment:
   I did respond to that already. I did leave reviewer notes before too. 
SimpleKeyGenerator is not visible from this class. SoI had to hard code. Not 
sure if I understand you suggestion. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, 
WriteOperationType op
 && Boolean.parseBoolean((String) 
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), 
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue( {
   throw new 
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already 
disabled for the table. Can't be re-enabled back");
 }
+
+// meta fields can be disabled only with SimpleKeyGenerator
+if (!getTableConfig().populateMetaFields()
+&& 
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
 "org.apache.hudi.keygen.SimpleKeyGenerator")

Review comment:
   I did respond to that already. I did leave reviewer notes before too. 
SimpleKeyGenerator is not visible from this class. So I had to hard code. Not 
sure if I understand you suggestion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391292#comment-17391292
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680643619



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, 
WriteOperationType op
 && Boolean.parseBoolean((String) 
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), 
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue( {
   throw new 
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already 
disabled for the table. Can't be re-enabled back");
 }
+
+// meta fields can be disabled only with SimpleKeyGenerator
+if (!getTableConfig().populateMetaFields()
+&& 
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
 "org.apache.hudi.keygen.SimpleKeyGenerator")

Review comment:
   I did respond to that already. I did leave reviewer notes before too. 
SimpleKeyGenerator is not visible from this class. SoI had to hard code. Not 
sure if I understand you suggestion. 

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, 
WriteOperationType op
 && Boolean.parseBoolean((String) 
properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), 
HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue( {
   throw new 
HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already 
disabled for the table. Can't be re-enabled back");
 }
+
+// meta fields can be disabled only with SimpleKeyGenerator
+if (!getTableConfig().populateMetaFields()
+&& 
!properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(),
 "org.apache.hudi.keygen.SimpleKeyGenerator")

Review comment:
   I did respond to that already. I did leave reviewer notes before too. 
SimpleKeyGenerator is not visible from this class. So I had to hard code. Not 
sure if I understand you suggestion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680647264



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -157,6 +157,11 @@
   .withDocumentation("When enabled, populates all meta fields. When 
disabled, no meta fields are populated "
   + "and incremental queries will not be functional. This is only 
meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty HOODIE_TABLE_KEY_GENERATOR_CLASS 
= ConfigProperty
+  .key("hoodie.datasource.write.keygenerator.class")

Review comment:
   I will follow similar naming convention as record key and partition path
   hoodie.table.keygenerator.class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391295#comment-17391295
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680647264



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -157,6 +157,11 @@
   .withDocumentation("When enabled, populates all meta fields. When 
disabled, no meta fields are populated "
   + "and incremental queries will not be functional. This is only 
meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty HOODIE_TABLE_KEY_GENERATOR_CLASS 
= ConfigProperty
+  .key("hoodie.datasource.write.keygenerator.class")

Review comment:
   I will follow similar naming convention as record key and partition path
   hoodie.table.keygenerator.class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1842) [SQL] Spark Sql Support For The Exists Hoodie Table

2021-08-01 Thread Sagar Sumit (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391296#comment-17391296
 ] 

Sagar Sumit commented on HUDI-1842:
---

> table name has to match the table name as per hoodie.properties.

This is true otherwise the write will throw an exception as follows:
HoodieException: hoodie table with name hudi_trips_cow already exists at 
file:/tmp/hudi_trips_cow

> [SQL] Spark Sql Support For The Exists Hoodie Table
> ---
>
> Key: HUDI-1842
> URL: https://issues.apache.org/jira/browse/HUDI-1842
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> In order to support spark sql for hoodie, we persist some table properties to 
> the hoodie.properties. e.g. primaryKey, preCombineField, partition columns.  
> For the exists hoodie tables, these  properties are missing. We need do some 
> code in UpgradeDowngrade to support spark sql for the exists tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680653697



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -70,13 +78,24 @@
*/
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
 InputSplitUtils.writeString(getBasePath(), out);
 InputSplitUtils.writeString(getMaxCommitTime(), out);
 out.writeInt(getDeltaLogPaths().size());
 for (String logFilePath : getDeltaLogPaths()) {
   InputSplitUtils.writeString(logFilePath, out);
 }
+if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
   ifPresent returns avoid. will try using .map().orElse() 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391302#comment-17391302
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680653697



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -70,13 +78,24 @@
*/
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
 InputSplitUtils.writeString(getBasePath(), out);
 InputSplitUtils.writeString(getMaxCommitTime(), out);
 out.writeInt(getDeltaLogPaths().size());
 for (String logFilePath : getDeltaLogPaths()) {
   InputSplitUtils.writeString(logFilePath, out);
 }
+if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
   ifPresent returns avoid. will try using .map().orElse() 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680653697



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -70,13 +78,24 @@
*/
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
 InputSplitUtils.writeString(getBasePath(), out);
 InputSplitUtils.writeString(getMaxCommitTime(), out);
 out.writeInt(getDeltaLogPaths().size());
 for (String logFilePath : getDeltaLogPaths()) {
   InputSplitUtils.writeString(logFilePath, out);
 }
+if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
   ifPresent returns a void. Will try to see if there are other options. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391303#comment-17391303
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680653697



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -70,13 +78,24 @@
*/
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
 InputSplitUtils.writeString(getBasePath(), out);
 InputSplitUtils.writeString(getMaxCommitTime(), out);
 out.writeInt(getDeltaLogPaths().size());
 for (String logFilePath : getDeltaLogPaths()) {
   InputSplitUtils.writeString(logFilePath, out);
 }
+if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
   ifPresent returns a void. Will try to see if there are other options. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on a change in pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680658525



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -70,13 +78,24 @@
*/
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
 InputSplitUtils.writeString(getBasePath(), out);
 InputSplitUtils.writeString(getMaxCommitTime(), out);
 out.writeInt(getDeltaLogPaths().size());
 for (String logFilePath : getDeltaLogPaths()) {
   InputSplitUtils.writeString(logFilePath, out);
 }
+if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
   I could not find any other ways to do this. We are not mapping to 
anything. We are just doing X or Y depending on whether its present or not. So, 
we can't use .map().orElse(). I checked other apis in Option, but could not 
find any. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391317#comment-17391317
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

nsivabalan commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680658525



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##
@@ -70,13 +78,24 @@
*/
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option 
hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
 InputSplitUtils.writeString(getBasePath(), out);
 InputSplitUtils.writeString(getMaxCommitTime(), out);
 out.writeInt(getDeltaLogPaths().size());
 for (String logFilePath : getDeltaLogPaths()) {
   InputSplitUtils.writeString(logFilePath, out);
 }
+if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
   I could not find any other ways to do this. We are not mapping to 
anything. We are just doing X or Y depending on whether its present or not. So, 
we can't use .map().orElse(). I checked other apis in Option, but could not 
find any. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#issuecomment-883851530


   
   ## CI report:
   
   * c3bdfdd1e47790f46f4263a7fb5242c01dd02188 UNKNOWN
   * 40cbb91efb34e6360849a42056de6343cc1251d6 UNKNOWN
   * 4d941eb79ab5e769b2643c5ad1b5b2d767a0af17 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1277)
 
   * 7c05ed3df1d3e9bff709d9aba9168f7bb7b1ac54 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391325#comment-17391325
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

hudi-bot edited a comment on pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#issuecomment-883851530


   
   ## CI report:
   
   * c3bdfdd1e47790f46f4263a7fb5242c01dd02188 UNKNOWN
   * 40cbb91efb34e6360849a42056de6343cc1251d6 UNKNOWN
   * 4d941eb79ab5e769b2643c5ad1b5b2d767a0af17 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1277)
 
   * 7c05ed3df1d3e9bff709d9aba9168f7bb7b1ac54 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#issuecomment-883851530


   
   ## CI report:
   
   * c3bdfdd1e47790f46f4263a7fb5242c01dd02188 UNKNOWN
   * 40cbb91efb34e6360849a42056de6343cc1251d6 UNKNOWN
   * 4d941eb79ab5e769b2643c5ad1b5b2d767a0af17 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1277)
 
   * 7c05ed3df1d3e9bff709d9aba9168f7bb7b1ac54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391344#comment-17391344
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

hudi-bot edited a comment on pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#issuecomment-883851530


   
   ## CI report:
   
   * c3bdfdd1e47790f46f4263a7fb5242c01dd02188 UNKNOWN
   * 40cbb91efb34e6360849a42056de6343cc1251d6 UNKNOWN
   * 4d941eb79ab5e769b2643c5ad1b5b2d767a0af17 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1277)
 
   * 7c05ed3df1d3e9bff709d9aba9168f7bb7b1ac54 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

董可伦 updated HUDI-2259:
--
Summary: Support referencing subquery with column aliases by table alias in 
merge into  (was: Merge Into when source table with columnAliases throws 
exception)

> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

董可伦 updated HUDI-2259:
--
Issue Type: Improvement  (was: Bug)

> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2232) [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread pengzhiwei (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

pengzhiwei updated HUDI-2232:
-
Summary: [SQL] MERGE INTO fails with table having nested struct  (was: 
[SQL] MERGE INTO fails with table having nested struct and partioned by)

> [SQL] MERGE INTO fails with table having nested struct
> --
>
> Key: HUDI-2232
> URL: https://issues.apache.org/jira/browse/HUDI-2232
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> {code:java}
> // TO reproduce
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (  id int,   name string,   price double,   ts 
> long,   repo struct) using hudi options(primaryKey = 
> 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (  id int,   name string,   price double,   ts long,  
>  repo struct) using hudi options(primaryKey = 'id', 
> precombineField = 'ts') partitioned by (ts) location 
> 'file:///tmp/hudi-h5-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 
> struct(234273476,"onnet/onnet-portal"), 130);
> select * from hudi_gh_ext_fixed;
> 20210727145240  20210727145240_0_6442266  id:3
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet  3 
> AMZN  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301  
> 20210727145301_0_6442269  id:2
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet  2 
> UBER  150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254  
> 20210727145254_0_6442268  id:4
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet  4 
> GOOG  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}
> select * from hudi_fixed;
> 20210727145325  20210727145325_0_6442270  id:2  ts=130  
> ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet  2 
> UBER  200.0 {"id":234273476,"name":"onnet/onnet-portal"}  130
> MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from 
> hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN  
> UPDATE SET *WHEN NOT MATCHED  THEN INSERT *;
> -- java.lang.IllegalArgumentException: UnSupport StructType yet--  at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)--
>   at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)--
>   at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown
>  Source)--  at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 opened a new pull request #3379: [HUDI-2232] [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread GitBox


pengzhiwei2018 opened a new pull request #3379:
URL: https://github.com/apache/hudi/pull/3379


   
   
   ## What is the purpose of the pull request
   
   Fix the bug that merge into fails when table having nested struct type
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2232) [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391353#comment-17391353
 ] 

ASF GitHub Bot commented on HUDI-2232:
--

pengzhiwei2018 opened a new pull request #3379:
URL: https://github.com/apache/hudi/pull/3379


   
   
   ## What is the purpose of the pull request
   
   Fix the bug that merge into fails when table having nested struct type
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] MERGE INTO fails with table having nested struct
> --
>
> Key: HUDI-2232
> URL: https://issues.apache.org/jira/browse/HUDI-2232
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.9.0
>
>
> {code:java}
> // TO reproduce
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (  id int,   name string,   price double,   ts 
> long,   repo struct) using hudi options(primaryKey = 
> 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (  id int,   name string,   price double,   ts long,  
>  repo struct) using hudi options(primaryKey = 'id', 
> precombineField = 'ts') partitioned by (ts) location 
> 'file:///tmp/hudi-h5-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 
> struct(234273476,"onnet/onnet-portal"), 130);
> select * from hudi_gh_ext_fixed;
> 20210727145240  20210727145240_0_6442266  id:3
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet  3 
> AMZN  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301  
> 20210727145301_0_6442269  id:2
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet  2 
> UBER  150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254  
> 20210727145254_0_6442268  id:4
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet  4 
> GOOG  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}
> select * from hudi_fixed;
> 20210727145325  20210727145325_0_6442270  id:2  ts=130  
> ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet  2 
> UBER  200.0 {"id":234273476,"name":"onnet/onnet-portal"}  130
> MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from 
> hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN  
> UPDATE SET *WHEN NOT MATCHED  THEN INSERT *;
> -- java.lang.IllegalArgumentException: UnSupport StructType yet--  at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)--
>   at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)--
>   at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown
>  Source)--  at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2232) [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2232:
-
Labels: pull-request-available release-blocker  (was: release-blocker)

> [SQL] MERGE INTO fails with table having nested struct
> --
>
> Key: HUDI-2232
> URL: https://issues.apache.org/jira/browse/HUDI-2232
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> {code:java}
> // TO reproduce
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (  id int,   name string,   price double,   ts 
> long,   repo struct) using hudi options(primaryKey = 
> 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (  id int,   name string,   price double,   ts long,  
>  repo struct) using hudi options(primaryKey = 'id', 
> precombineField = 'ts') partitioned by (ts) location 
> 'file:///tmp/hudi-h5-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 
> struct(234273476,"onnet/onnet-portal"), 130);
> select * from hudi_gh_ext_fixed;
> 20210727145240  20210727145240_0_6442266  id:3
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet  3 
> AMZN  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301  
> 20210727145301_0_6442269  id:2
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet  2 
> UBER  150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254  
> 20210727145254_0_6442268  id:4
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet  4 
> GOOG  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}
> select * from hudi_fixed;
> 20210727145325  20210727145325_0_6442270  id:2  ts=130  
> ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet  2 
> UBER  200.0 {"id":234273476,"name":"onnet/onnet-portal"}  130
> MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from 
> hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN  
> UPDATE SET *WHEN NOT MATCHED  THEN INSERT *;
> -- java.lang.IllegalArgumentException: UnSupport StructType yet--  at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)--
>   at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)--
>   at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown
>  Source)--  at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot commented on pull request #3379: [HUDI-2232] [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread GitBox


hudi-bot commented on pull request #3379:
URL: https://github.com/apache/hudi/pull/3379#issuecomment-890755067


   
   ## CI report:
   
   * 7d52a51109644e9655673c6b014f9d0a742b1ca1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun opened a new pull request #3380: [HUDI-2259]Support referencing subquery with column aliases by table alias in me…

2021-08-01 Thread GitBox


dongkelun opened a new pull request #3380:
URL: https://github.com/apache/hudi/pull/3380


   What changes were proposed in this pull request?
   Support referencing subquery with column aliases by table alias in merge into
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2232) [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391355#comment-17391355
 ] 

ASF GitHub Bot commented on HUDI-2232:
--

hudi-bot commented on pull request #3379:
URL: https://github.com/apache/hudi/pull/3379#issuecomment-890755067


   
   ## CI report:
   
   * 7d52a51109644e9655673c6b014f9d0a742b1ca1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] MERGE INTO fails with table having nested struct
> --
>
> Key: HUDI-2232
> URL: https://issues.apache.org/jira/browse/HUDI-2232
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> {code:java}
> // TO reproduce
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (  id int,   name string,   price double,   ts 
> long,   repo struct) using hudi options(primaryKey = 
> 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (  id int,   name string,   price double,   ts long,  
>  repo struct) using hudi options(primaryKey = 'id', 
> precombineField = 'ts') partitioned by (ts) location 
> 'file:///tmp/hudi-h5-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 
> struct(234273476,"onnet/onnet-portal"), 130);
> select * from hudi_gh_ext_fixed;
> 20210727145240  20210727145240_0_6442266  id:3
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet  3 
> AMZN  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301  
> 20210727145301_0_6442269  id:2
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet  2 
> UBER  150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254  
> 20210727145254_0_6442268  id:4
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet  4 
> GOOG  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}
> select * from hudi_fixed;
> 20210727145325  20210727145325_0_6442270  id:2  ts=130  
> ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet  2 
> UBER  200.0 {"id":234273476,"name":"onnet/onnet-portal"}  130
> MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from 
> hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN  
> UPDATE SET *WHEN NOT MATCHED  THEN INSERT *;
> -- java.lang.IllegalArgumentException: UnSupport StructType yet--  at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)--
>   at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)--
>   at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown
>  Source)--  at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391356#comment-17391356
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

dongkelun opened a new pull request #3380:
URL: https://github.com/apache/hudi/pull/3380


   What changes were proposed in this pull request?
   Support referencing subquery with column aliases by table alias in merge into
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3379: [HUDI-2232] [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3379:
URL: https://github.com/apache/hudi/pull/3379#issuecomment-890755067


   
   ## CI report:
   
   * 7d52a51109644e9655673c6b014f9d0a742b1ca1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1296)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3380: [HUDI-2259]Support referencing subquery with column aliases by table alias in me…

2021-08-01 Thread GitBox


hudi-bot commented on pull request #3380:
URL: https://github.com/apache/hudi/pull/3380#issuecomment-890756612


   
   ## CI report:
   
   * 1c1ade9c67059a909278dc0704267a50684a967f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2232) [SQL] MERGE INTO fails with table having nested struct

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391357#comment-17391357
 ] 

ASF GitHub Bot commented on HUDI-2232:
--

hudi-bot edited a comment on pull request #3379:
URL: https://github.com/apache/hudi/pull/3379#issuecomment-890755067


   
   ## CI report:
   
   * 7d52a51109644e9655673c6b014f9d0a742b1ca1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1296)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] MERGE INTO fails with table having nested struct
> --
>
> Key: HUDI-2232
> URL: https://issues.apache.org/jira/browse/HUDI-2232
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
> Fix For: 0.9.0
>
>
> {code:java}
> // TO reproduce
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (  id int,   name string,   price double,   ts 
> long,   repo struct) using hudi options(primaryKey = 
> 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, 
> struct(234273476,"onnet/onnet-portal"));
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (  id int,   name string,   price double,   ts long,  
>  repo struct) using hudi options(primaryKey = 'id', 
> precombineField = 'ts') partitioned by (ts) location 
> 'file:///tmp/hudi-h5-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 
> struct(234273476,"onnet/onnet-portal"), 130);
> select * from hudi_gh_ext_fixed;
> 20210727145240  20210727145240_0_6442266  id:3
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet  3 
> AMZN  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301  
> 20210727145301_0_6442269  id:2
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet  2 
> UBER  150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254  
> 20210727145254_0_6442268  id:4
> 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet  4 
> GOOG  300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}
> select * from hudi_fixed;
> 20210727145325  20210727145325_0_6442270  id:2  ts=130  
> ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet  2 
> UBER  200.0 {"id":234273476,"name":"onnet/onnet-portal"}  130
> MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from 
> hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN  
> UPDATE SET *WHEN NOT MATCHED  THEN INSERT *;
> -- java.lang.IllegalArgumentException: UnSupport StructType yet--  at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)--
>   at 
> org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)--
>   at 
> org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown
>  Source)--  at 
> org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391358#comment-17391358
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

hudi-bot commented on pull request #3380:
URL: https://github.com/apache/hudi/pull/3380#issuecomment-890756612


   
   ## CI report:
   
   * 1c1ade9c67059a909278dc0704267a50684a967f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3380: [HUDI-2259]Support referencing subquery with column aliases by table alias in me…

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3380:
URL: https://github.com/apache/hudi/pull/3380#issuecomment-890756612


   
   ## CI report:
   
   * 1c1ade9c67059a909278dc0704267a50684a967f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1297)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391363#comment-17391363
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

hudi-bot edited a comment on pull request #3380:
URL: https://github.com/apache/hudi/pull/3380#issuecomment-890756612


   
   ## CI report:
   
   * 1c1ade9c67059a909278dc0704267a50684a967f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1297)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun commented on pull request #3380: [HUDI-2259]Support referencing subquery with column aliases by table alias in me…

2021-08-01 Thread GitBox


dongkelun commented on pull request #3380:
URL: https://github.com/apache/hudi/pull/3380#issuecomment-890760974


After I submitted this PR 
[https://github.com/apache/hudi/pull/3377](https://github.com/apache/hudi/pull/3377),Then
 I wondered if this was the same problem with Spark, so I looked at the latest 
code of Spark and found that Spark had solved this problem in this 
PR[https://github.com/apache/spark/pull/31444](https://github.com/apache/spark/pull/31444).In
 order to be consistent with Spark, I resubmitted this new PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391365#comment-17391365
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

dongkelun commented on pull request #3380:
URL: https://github.com/apache/hudi/pull/3380#issuecomment-890760974


After I submitted this PR 
[https://github.com/apache/hudi/pull/3377](https://github.com/apache/hudi/pull/3377),Then
 I wondered if this was the same problem with Spark, so I looked at the latest 
code of Spark and found that Spark had solved this problem in this 
PR[https://github.com/apache/spark/pull/31444](https://github.com/apache/spark/pull/31444).In
 order to be consistent with Spark, I resubmitted this new PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] dongkelun closed pull request #3377: [HUDI-2259]Fix the exception of Merge Into when source table with col…

2021-08-01 Thread GitBox


dongkelun closed pull request #3377:
URL: https://github.com/apache/hudi/pull/3377


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #3377: [HUDI-2259]Fix the exception of Merge Into when source table with col…

2021-08-01 Thread GitBox


dongkelun commented on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890763076


   Please see this new PR 
[https://github.com/apache/hudi/pull/3380](https://github.com/apache/hudi/pull/3380)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391369#comment-17391369
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

dongkelun closed pull request #3377:
URL: https://github.com/apache/hudi/pull/3377


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2259) Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391370#comment-17391370
 ] 

ASF GitHub Bot commented on HUDI-2259:
--

dongkelun commented on pull request #3377:
URL: https://github.com/apache/hudi/pull/3377#issuecomment-890763076


   Please see this new PR 
[https://github.com/apache/hudi/pull/3380](https://github.com/apache/hudi/pull/3380)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support referencing subquery with column aliases by table alias in merge into
> -
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2259) [SQL]Support referencing subquery with column aliases by table alias in merge into

2021-08-01 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HUDI-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

董可伦 updated HUDI-2259:
--
Summary: [SQL]Support referencing subquery with column aliases by table 
alias in merge into  (was: Support referencing subquery with column aliases by 
table alias in merge into)

> [SQL]Support referencing subquery with column aliases by table alias in merge 
> into
> --
>
> Key: HUDI-2259
> URL: https://issues.apache.org/jira/browse/HUDI-2259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> val tableName = "test_hudi_table"
> spark.sql(
>  s"""
>  | create table ${tableName} (
>  | id int,
>  | name string,
>  | price double,
>  | ts long
>  |) using hudi
>  | options (
>  | primaryKey = 'id',
>  | type = 'cow'
>  | )
>  | location '/tmp/${tableName}'
>  |""".stripMargin)
> spark.sql(
>  s"""
>  | merge into $tableName as t0
>  | using (
>  | select 1, 'a1', 12, 1003
>  | ) s0 (id,name,price,ts)
>  | on s0.id = t0.id
>  | when matched and id != 1 then update set *
>  | when matched and s0.id = 1 then delete
>  | when not matched then insert *
>  """.stripMargin)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot 
> resolve 's0.id in (`s0.id` = `t0.id`), the input columns is: [id#4, name#5, 
> price#6, ts#7, _hoodie_commit_time#8, _hoodie_commit_seqno#9, 
> _hoodie_record_key#10, _hoodie_partition_path#11, _hoodie_file_name#12, 
> id#13, name#14, price#15, ts#16L];Exception in thread "main" 
> org.apache.spark.sql.AnalysisException: Cannot resolve 's0.id in (`s0.id` = 
> `t0.id`), the input columns is: [id#4, name#5, price#6, ts#7, 
> _hoodie_commit_time#8, _hoodie_commit_seqno#9, _hoodie_record_key#10, 
> _hoodie_partition_path#11, _hoodie_file_name#12, id#13, name#14, price#15, 
> ts#16L]; at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences.org$apache$spark$sql$hudi$analysis$HoodieResolveReferences$$resolveExpressionFrom(HoodieAnalysis.scala:292)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:160)
>  at 
> org.apache.spark.sql.hudi.analysis.HoodieResolveReferences$$anonfun$apply$1.applyOrElse(HoodieAnalysis.scala:103)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1$$anonfun$apply$1.apply(AnalysisHelper.scala:90)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:89)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsUp$1.apply(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsUp(AnalysisHelper.scala:86)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #3328: [HUDI-2208] Support Bulk Insert For Spark Sql

2021-08-01 Thread GitBox


pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r680693753



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   Just like the upsert operation,  Hudi do the combine automatic, we can 
do this for the user too which is much friendly for our users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2208) [SQL] Support Bulk Insert For Spark Sql

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391377#comment-17391377
 ] 

ASF GitHub Bot commented on HUDI-2208:
--

pengzhiwei2018 commented on a change in pull request #3328:
URL: https://github.com/apache/hudi/pull/3328#discussion_r680693753



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala
##
@@ -243,6 +256,8 @@ object InsertIntoHoodieTableCommand {
 RECORDKEY_FIELD_OPT_KEY.key -> primaryColumns.mkString(","),
 PARTITIONPATH_FIELD_OPT_KEY.key -> partitionFields,
 PAYLOAD_CLASS_OPT_KEY.key -> payloadClassName,
+ENABLE_ROW_WRITER_OPT_KEY.key -> enableBulkInsert.toString,
+HoodieWriteConfig.COMBINE_BEFORE_INSERT_PROP.key -> 
isPrimaryKeyTable.toString, // if the table has primaryKey, enable the combine

Review comment:
   Just like the upsert operation,  Hudi do the combine automatic, we can 
do this for the user too which is much friendly for our users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [SQL] Support Bulk Insert For Spark Sql
> ---
>
> Key: HUDI-2208
> URL: https://issues.apache.org/jira/browse/HUDI-2208
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: pengzhiwei
>Assignee: pengzhiwei
>Priority: Blocker
>  Labels: pull-request-available, release-blocker
>
> Support the bulk insert for spark sql



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yihua commented on a change in pull request #3233: [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency

2021-08-01 Thread GitBox


yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680693810



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
HashMap<>();
+  // A list of pending futures from async marker creation requests
+  private final List createMarkerFutures = new 
ArrayList<>();
+  // A list of use status of marker files. {@code true} means the file is in 
use by a {@code BatchCreateMar

[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391378#comment-17391378
 ] 

ASF GitHub Bot commented on HUDI-1138:
--

yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680693810



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
Ha

[GitHub] [hudi] yihua commented on a change in pull request #3233: [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency

2021-08-01 Thread GitBox


yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680694481



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
HashMap<>();
+  // A list of pending futures from async marker creation requests
+  private final List createMarkerFutures = new 
ArrayList<>();
+  // A list of use status of marker files. {@code true} means the file is in 
use by a {@code BatchCreateMar

[GitHub] [hudi] hudi-bot edited a comment on pull request #3315: [HUDI-2177][HUDI-2200] Adding virtual keys support for MOR

2021-08-01 Thread GitBox


hudi-bot edited a comment on pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#issuecomment-883851530


   
   ## CI report:
   
   * c3bdfdd1e47790f46f4263a7fb5242c01dd02188 UNKNOWN
   * 40cbb91efb34e6360849a42056de6343cc1251d6 UNKNOWN
   * 7c05ed3df1d3e9bff709d9aba9168f7bb7b1ac54 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391379#comment-17391379
 ] 

ASF GitHub Bot commented on HUDI-1138:
--

yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680694481



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
Ha

[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391380#comment-17391380
 ] 

ASF GitHub Bot commented on HUDI-2177:
--

hudi-bot edited a comment on pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#issuecomment-883851530


   
   ## CI report:
   
   * c3bdfdd1e47790f46f4263a7fb5242c01dd02188 UNKNOWN
   * 40cbb91efb34e6360849a42056de6343cc1251d6 UNKNOWN
   * 7c05ed3df1d3e9bff709d9aba9168f7bb7b1ac54 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Virtual keys support for Compaction
> ---
>
> Key: HUDI-2177
> URL: https://issues.apache.org/jira/browse/HUDI-2177
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yihua commented on a change in pull request #3233: [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency

2021-08-01 Thread GitBox


yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680698426



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
HashMap<>();
+  // A list of pending futures from async marker creation requests
+  private final List createMarkerFutures = new 
ArrayList<>();
+  // A list of use status of marker files. {@code true} means the file is in 
use by a {@code BatchCreateMar

[GitHub] [hudi] yihua commented on a change in pull request #3233: [HUDI-1138] Add timeline-server-based marker file strategy for improving marker-related latency

2021-08-01 Thread GitBox


yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680698723



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
HashMap<>();
+  // A list of pending futures from async marker creation requests
+  private final List createMarkerFutures = new 
ArrayList<>();
+  // A list of use status of marker files. {@code true} means the file is in 
use by a {@code BatchCreateMar

[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391384#comment-17391384
 ] 

ASF GitHub Bot commented on HUDI-1138:
--

yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680698426



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
Ha

[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2021-08-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391387#comment-17391387
 ] 

ASF GitHub Bot commented on HUDI-1138:
--

yihua commented on a change in pull request #3233:
URL: https://github.com/apache/hudi/pull/3233#discussion_r680698723



##
File path: 
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/MarkerHandler.java
##
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.timeline.service.handlers;
+
+import org.apache.hudi.common.config.SerializableConfiguration;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.metrics.Registry;
+import org.apache.hudi.common.model.IOType;
+import org.apache.hudi.common.table.view.FileSystemViewManager;
+import org.apache.hudi.common.util.collection.ImmutablePair;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.timeline.service.TimelineService;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import io.javalin.Context;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.util.StringUtils;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStreamWriter;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+
+import static org.apache.hudi.timeline.service.RequestHandler.jsonifyResult;
+
+/**
+ * REST Handler servicing marker requests.
+ *
+ * The marker creation requests are handled asynchronous, while other types of 
requests
+ * are handled synchronous.
+ *
+ * Marker creation requests are batch processed periodically by a thread.  
Each batch
+ * processing thread adds new markers to a marker file.  Given that marker 
file operation
+ * can take time, multiple concurrent threads can run at the same, while they 
operate
+ * on different marker files storing mutually exclusive marker entries.  At 
any given
+ * time, a marker file is touched by at most one thread to guarantee 
consistency.
+ * Below is an example of running batch processing threads.
+ *
+ *   |-| batch interval
+ * Thread 1  |-->| writing to MARKERS1
+ * Thread 2|-->| writing to MARKERS2
+ * Thread 3   |-->| writing to MARKERS3
+ */
+public class MarkerHandler extends Handler {
+  public static final String MARKERS_FILENAME_PREFIX = "MARKERS";
+  private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
+  private static final Logger LOG = LogManager.getLogger(MarkerHandler.class);
+  // Margin time for scheduling the processing of the next batch of marker 
creation requests
+  private static final long SCHEDULING_MARGIN_TIME_MS = 5L;
+
+  private final Registry metricsRegistry;
+  private final ScheduledExecutorService executorService;
+  // A cached copy of all markers in memory
+  // Mapping: {markerDirPath -> all markers}
+  private final Map> allMarkersMap = new HashMap<>();
+  // A cached copy of marker entries in each marker file, stored in 
StringBuilder for efficient appending
+  // Mapping: {markerDirPath -> {markerFileIndex -> markers}}
+  private final Map> fileMarkersMap = new 
Ha