[GitHub] [hudi] hudi-bot removed a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-972587261


   
   ## CI report:
   
   * 44f4575ea51c6dfdbe6424d0d236b242f8bc81ba Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3445)
 
   * eb3383de0645ad08dcd959924b9f3fc6588823b1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3462)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


hudi-bot commented on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-972619707


   
   ## CI report:
   
   * eb3383de0645ad08dcd959924b9f3fc6588823b1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3462)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-order Layout Optimization

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972617117


   
   ## CI report:
   
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   * 24f44584a951375290c52ab04602401af943d845 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3464)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-order Layout Optimization

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972594379


   
   ## CI report:
   
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   * 24f44584a951375290c52ab04602401af943d845 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


nsivabalan edited a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-968058611


   I tried to test this patch locally. I am seeing a stacktrace if 
hudi-defaults.conf file is not present. We should definitely fix this. For 
users who don't have this file, may be one line warning is good. 
   
   
   Excerpts from running our spark quick start guide with this patch. 

   
   ```
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD.key(), "ts").
|   option(RECORDKEY_FIELD.key(), "uuid").
|   option(PARTITIONPATH_FIELD.key(), "partitionpath").
|   option(TBL_NAME.key(), tableName).
|   mode(Overwrite).
|   save(basePath)
   21/11/13 07:07:57 WARN DFSPropertiesConfiguration: Cannot find 
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   21/11/13 07:07:57 ERROR DFSPropertiesConfiguration: Error reading in 
properties from dfs
   java.io.FileNotFoundException: File file:/etc/hudi/conf does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:142)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:123)
at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.loadGlobalProps(DFSPropertiesConfiguration.java:95)
at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:57)
at 
org.apache.hudi.HoodieWriterUtils$.parametersWithWriteDefaults(HoodieWriterUtils.scala:47)
at 
org.apache.hudi.HoodieSparkSqlWriter$.mergeParamsAndGetHoodieConfig(HoodieSparkSqlWriter.scala:733)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:88)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
at 
$line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:55)
at 
$line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:60)
at 
$line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:62)
at 
$line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:64)
at 

[GitHub] [hudi] nsivabalan edited a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


nsivabalan edited a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-968060623


   Tested scenarios:
   1. w/o setting HUDI_CONF_DIR, having issues as reported above. Needs fixing. 
   2. set proper HUDI_CONF_DIR env, but the dir does not contain the file 
(hudi-defaults.conf). Again a similar issue. attaching the stack trace below. 
Needs fixing. 
   3. Set proper HUDI_CONF_DIR which has hudi-defaults.conf. I see writes picks 
up the config from this file. 
   4. same as (3). tried overwriting the config with df.write.hudi for a 
property thats already in hudi-defaults.conf. The one given in spark-shel with 
df.write.hudi takes precedence. 
   5. tried two spark-shells w/ proper env set in both. Both able to pick up 
the configs from hudi-defaults.conf.
   
   
   Stacktrace for (2)
   
   ```
   scala> df.write.format("hudi").
|   options(getQuickstartWriteConfigs).
|   option(PRECOMBINE_FIELD.key(), "ts").
|   option(RECORDKEY_FIELD.key(), "uuid").
|   option(PARTITIONPATH_FIELD.key(), "partitionpath").
|   option(TBL_NAME.key(), tableName).
|   mode(Overwrite).
|   save(basePath)
   21/11/13 07:18:53 ERROR DFSPropertiesConfiguration: Error reading in 
properties from dfs
   java.io.FileNotFoundException: File file:/tmp/hudi_conf/hudi-defaults.conf 
does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:142)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.addPropsFromFile(DFSPropertiesConfiguration.java:123)
at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.loadGlobalProps(DFSPropertiesConfiguration.java:92)
at 
org.apache.hudi.common.config.DFSPropertiesConfiguration.(DFSPropertiesConfiguration.java:57)
at 
org.apache.hudi.HoodieWriterUtils$.parametersWithWriteDefaults(HoodieWriterUtils.scala:47)
at 
org.apache.hudi.HoodieSparkSqlWriter$.mergeParamsAndGetHoodieConfig(HoodieSparkSqlWriter.scala:733)
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:88)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:162)
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
at 
$line25.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:55)
at 

[GitHub] [hudi] hudi-bot commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972606180


   
   ## CI report:
   
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   * b9383b77280419d54fa09206c768ca17a3683fb4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3463)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972591270


   
   ## CI report:
   
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   * b9383b77280419d54fa09206c768ca17a3683fb4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-order Layout Optimization

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972594379


   
   ## CI report:
   
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   * 24f44584a951375290c52ab04602401af943d845 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-order Layout Optimization

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972592878


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   * 24f44584a951375290c52ab04602401af943d845 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-order Layout Optimization

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972585151


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-order Layout Optimization

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972592878


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   * 24f44584a951375290c52ab04602401af943d845 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972591270


   
   ## CI report:
   
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   * b9383b77280419d54fa09206c768ca17a3683fb4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972589905


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   * b9383b77280419d54fa09206c768ca17a3683fb4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4029: [HUDI-2790] Fix the changelog mode of HoodieTableSource

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4029:
URL: https://github.com/apache/hudi/pull/4029#issuecomment-972589960


   
   ## CI report:
   
   * c6817fa8582f0ff62479a4a31a327d6db2dc0ca8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3458)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4029: [HUDI-2790] Fix the changelog mode of HoodieTableSource

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4029:
URL: https://github.com/apache/hudi/pull/4029#issuecomment-972567125


   
   ## CI report:
   
   * c6817fa8582f0ff62479a4a31a327d6db2dc0ca8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3458)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


xiarixiaoyao commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972589911


   @leesf  @vinothchandar @alexeykudinkin  
   address all comments and update the codes and more test case.  could you 
help me review this pr again , thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972589905


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   * b9383b77280419d54fa09206c768ca17a3683fb4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order add fix some bugs

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972577078


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-972586010


   
   ## CI report:
   
   * 44f4575ea51c6dfdbe6424d0d236b242f8bc81ba Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3445)
 
   * eb3383de0645ad08dcd959924b9f3fc6588823b1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


hudi-bot commented on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-972587261


   
   ## CI report:
   
   * 44f4575ea51c6dfdbe6424d0d236b242f8bc81ba Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3445)
 
   * eb3383de0645ad08dcd959924b9f3fc6588823b1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3462)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


hudi-bot commented on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-972586010


   
   ## CI report:
   
   * 44f4575ea51c6dfdbe6424d0d236b242f8bc81ba Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3445)
 
   * eb3383de0645ad08dcd959924b9f3fc6588823b1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #3416: [HUDI-2362] Add external config file support

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-972305777


   
   ## CI report:
   
   * 44f4575ea51c6dfdbe6424d0d236b242f8bc81ba Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3445)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1870) Move spark avro serialization class into hudi repo

2021-11-17 Thread Yann Byron (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445674#comment-17445674
 ] 

Yann Byron commented on HUDI-1870:
--

[~xushiyan]

Got it, and will fix it.

> Move spark avro serialization class into hudi repo
> --
>
> Key: HUDI-1870
> URL: https://issues.apache.org/jira/browse/HUDI-1870
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Gary Li
>Assignee: Yann Byron
>Priority: Blocker
>  Labels: sev:critical
> Fix For: 0.10.0
>
>
> in Spark 3.1.1, avro serialization-related class become private. We need to 
> mvoe those classes into Hudi's repo.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972584061


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972585151


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3461)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972538242


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972584061


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   * e3e66985c79e4081759a4c3dcd5837b85803d4bd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972577078


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3460)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972575950


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972575950


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   * 23ef0dc3cf648cef640263c06432fd7fc4708327 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-972575976


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 3d9f428cef5b085b386bf4ab2ad8cae0953bc9fe Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3457)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-971332508


   
   ## CI report:
   
   * 9a11df297537d5dc7e68deb032f9c8ad70b0e049 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3423)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-972556249


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 27ee85977d48cc464b806e4d4fa3e79a82b69822 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3439)
 
   * 3d9f428cef5b085b386bf4ab2ad8cae0953bc9fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3457)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order

2021-11-17 Thread GitBox


xiarixiaoyao commented on a change in pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#discussion_r751938029



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/DataSkippingUtils.scala
##
@@ -179,7 +179,7 @@ object DataSkippingUtils {
   def getIndexFiles(conf: Configuration, indexPath: String): Seq[FileStatus] = 
{
 val basePath = new Path(indexPath)
 basePath.getFileSystem(conf)
-  .listStatus(basePath).filterNot(f => 
f.getPath.getName.endsWith(".parquet"))
+  .listStatus(basePath).filter(f => f.getPath.getName.endsWith(".parquet"))
   }

Review comment:
   very sorry for that。
   My local code is filter, and it is mistakenly written as filternot when 
submitting




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #4019: [MINOR] Add RocketMQ to hudi lake landing page

2021-11-17 Thread GitBox


yanghua commented on pull request #4019:
URL: https://github.com/apache/hudi/pull/4019#issuecomment-972571274


   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-11-17 Thread GitBox


yanghua commented on pull request #3813:
URL: https://github.com/apache/hudi/pull/3813#issuecomment-972569536


   @danny0405 Double check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dik111 opened a new issue #4030: [SUPPORT] Flink uses updated fields to update data

2021-11-17 Thread GitBox


dik111 opened a new issue #4030:
URL: https://github.com/apache/hudi/issues/4030


   At some point, we can only get the updated fields. For example, Table A has 
three fields (a, b, c), but only two fields (a, b) are in the 
updated-changelog. Can we support updating only some fields operate in flink ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4029: [HUDI-2790] Fix the changelog mode of HoodieTableSource

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4029:
URL: https://github.com/apache/hudi/pull/4029#issuecomment-972567125


   
   ## CI report:
   
   * c6817fa8582f0ff62479a4a31a327d6db2dc0ca8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3458)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4029: [HUDI-2790] Fix the changelog mode of HoodieTableSource

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4029:
URL: https://github.com/apache/hudi/pull/4029#issuecomment-972565837


   
   ## CI report:
   
   * c6817fa8582f0ff62479a4a31a327d6db2dc0ca8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4029: [HUDI-2790] Fix the changelog mode of HoodieTableSource

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4029:
URL: https://github.com/apache/hudi/pull/4029#issuecomment-972565837


   
   ## CI report:
   
   * c6817fa8582f0ff62479a4a31a327d6db2dc0ca8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2790) Fix the changelog mode of HoodieTableSource

2021-11-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2790:
-
Labels: pull-request-available  (was: )

> Fix the changelog mode of HoodieTableSource
> ---
>
> Key: HUDI-2790
> URL: https://issues.apache.org/jira/browse/HUDI-2790
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Affects Versions: 0.9.0
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] danny0405 opened a new pull request #4029: [HUDI-2790] Fix the changelog mode of HoodieTableSource

2021-11-17 Thread GitBox


danny0405 opened a new pull request #4029:
URL: https://github.com/apache/hudi/pull/4029


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2790) Fix the changelog mode of HoodieTableSource

2021-11-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-2790:


 Summary: Fix the changelog mode of HoodieTableSource
 Key: HUDI-2790
 URL: https://issues.apache.org/jira/browse/HUDI-2790
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Affects Versions: 0.9.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-2789) Flink batch upsert for non partitioned table does not work

2021-11-17 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2789.
--

> Flink batch upsert for non partitioned table does not work
> --
>
> Key: HUDI-2789
> URL: https://issues.apache.org/jira/browse/HUDI-2789
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-2789) Flink batch upsert for non partitioned table does not work

2021-11-17 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445652#comment-17445652
 ] 

Danny Chen commented on HUDI-2789:
--

Fixed via master branch: 71a2ae0fd6163d3108c6557ce6295f649246adcb

> Flink batch upsert for non partitioned table does not work
> --
>
> Key: HUDI-2789
> URL: https://issues.apache.org/jira/browse/HUDI-2789
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[hudi] branch master updated (2d3f2a3 -> 71a2ae0)

2021-11-17 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 2d3f2a3  [HUDI-2734] Setting default metadata enable as false for Java 
(#4003)
 add 71a2ae0  [HUDI-2789] Flink batch upsert for non partitioned table does 
not work (#4028)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/configuration/OptionsResolver.java  |  8 
 .../java/org/apache/hudi/sink/utils/Pipelines.java  | 21 ++---
 .../apache/hudi/table/HoodieDataSourceITCase.java   |  2 ++
 .../org/apache/hudi/utils/TestConfigurations.java   |  5 +
 4 files changed, 25 insertions(+), 11 deletions(-)


[GitHub] [hudi] danny0405 merged pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


danny0405 merged pull request #4028:
URL: https://github.com/apache/hudi/pull/4028


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4028:
URL: https://github.com/apache/hudi/pull/4028#issuecomment-972539248


   
   ## CI report:
   
   * ce9a5f9f58fba43ec1a98a2cd898fc25a5ca620e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3456)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4028:
URL: https://github.com/apache/hudi/pull/4028#issuecomment-972560107


   
   ## CI report:
   
   * ce9a5f9f58fba43ec1a98a2cd898fc25a5ca620e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3456)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-972556249


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 27ee85977d48cc464b806e4d4fa3e79a82b69822 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3439)
 
   * 3d9f428cef5b085b386bf4ab2ad8cae0953bc9fe Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3457)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-972541462


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 27ee85977d48cc464b806e4d4fa3e79a82b69822 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3439)
 
   * 3d9f428cef5b085b386bf4ab2ad8cae0953bc9fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-972541462


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 27ee85977d48cc464b806e4d4fa3e79a82b69822 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3439)
 
   * 3d9f428cef5b085b386bf4ab2ad8cae0953bc9fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4016: [HUDI-2675] Fix the exception 'Not an Avro data file' when archive and clean

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4016:
URL: https://github.com/apache/hudi/pull/4016#issuecomment-971744837


   
   ## CI report:
   
   * aec4dde1fb90319de9cf0c6f34771c0f193ccfd9 UNKNOWN
   * f426cb3cc3513d1baf26a70fdcb18114ffe5ddc5 UNKNOWN
   * 32ec46f289e4ecf4c1e66241ea954a9f3b34e9a6 UNKNOWN
   * 46a706bfce715e88ec2d2d53fc6d81815e7471ac UNKNOWN
   * 27ee85977d48cc464b806e4d4fa3e79a82b69822 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3439)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4028:
URL: https://github.com/apache/hudi/pull/4028#issuecomment-972539248


   
   ## CI report:
   
   * ce9a5f9f58fba43ec1a98a2cd898fc25a5ca620e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3456)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4028:
URL: https://github.com/apache/hudi/pull/4028#issuecomment-972538262


   
   ## CI report:
   
   * ce9a5f9f58fba43ec1a98a2cd898fc25a5ca620e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972538242


   
   ## CI report:
   
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972500657


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4028:
URL: https://github.com/apache/hudi/pull/4028#issuecomment-972538262


   
   ## CI report:
   
   * ce9a5f9f58fba43ec1a98a2cd898fc25a5ca620e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2789) Flink batch upsert for non partitioned table does not work

2021-11-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2789:
-
Labels: pull-request-available  (was: )

> Flink batch upsert for non partitioned table does not work
> --
>
> Key: HUDI-2789
> URL: https://issues.apache.org/jira/browse/HUDI-2789
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] danny0405 opened a new pull request #4028: [HUDI-2789] Flink batch upsert for non partitioned table does not work

2021-11-17 Thread GitBox


danny0405 opened a new pull request #4028:
URL: https://github.com/apache/hudi/pull/4028


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2789) Flink batch upsert for non partitioned table does not work

2021-11-17 Thread Danny Chen (Jira)
Danny Chen created HUDI-2789:


 Summary: Flink batch upsert for non partitioned table does not work
 Key: HUDI-2789
 URL: https://issues.apache.org/jira/browse/HUDI-2789
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4025:
URL: https://github.com/apache/hudi/pull/4025#issuecomment-972533498


   
   ## CI report:
   
   * 284f739772e4ff8f337a8d2c6d4c72302be4cdaa UNKNOWN
   * 8dc9c77f5035e0ab54a4d731a43ff545e053d9f1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3454)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4025:
URL: https://github.com/apache/hudi/pull/4025#issuecomment-972498000


   
   ## CI report:
   
   * 5e9a288bf8a8634662f7d3fa42abe7253d80d6b1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3451)
 
   * 284f739772e4ff8f337a8d2c6d4c72302be4cdaa UNKNOWN
   * 8dc9c77f5035e0ab54a4d731a43ff545e053d9f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3454)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-2234) MERGE INTO works only ON primary key

2021-11-17 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445622#comment-17445622
 ] 

Raymond Xu edited comment on HUDI-2234 at 11/18/21, 4:39 AM:
-

 
{code:java}
create table if not exists h1 
(id int, name string, price double, ts long) using hudi 
location '/tmp/hudi/tb1' 
options (type='cow', primaryKey='id', preCombineField='ts');

create table if not exists h2 
(id int, name string, price double, ts long) using hudi 
location '/tmp/hudi/tb2' 
options (type='cow', primaryKey='id', preCombineField='ts');

insert into h1 select 3, 'AMZN', 300, 120;
insert into h1 select 2, 'UBER', 300, 120;
insert into h1 select 4, 'GOOG', 300, 120;
insert into h2 select 2, 'UBER', 200, 120;

merge into h1 as target 
using (select id, name, price, ts from h2) source 
on target.name = source.name 
when matched then update set * 
when not matched then insert *;{code}
verified this with spark 3.1.2 using 0.10.0 SNAPSHOT, the merge into worked 
with non-primary key.

 


was (Author: xushiyan):
 
{code:java}
create table if not exists h1 
(id int, name string, price double, ts long) using hudi 
location '/tmp/hudi/tb1' 
options (type='cow', primaryKey='id', precombineField='ts');

create table if not exists h2 
(id int, name string, price double, ts long) using hudi 
location '/tmp/hudi/tb2' 
options (type='cow', primaryKey='id', precombineField='ts');

insert into h1 select 3, 'AMZN', 300, 120;
insert into h1 select 2, 'UBER', 300, 120;
insert into h1 select 4, 'GOOG', 300, 120;
insert into h2 select 2, 'UBER', 200, 120;

merge into h1 as target 
using (select id, name, price, ts from h2) source 
on target.name = source.name 
when matched then update set * 
when not matched then insert *;{code}
verified this with spark 3.1.2 using 0.10.0 SNAPSHOT, the merge into worked 
with non-primary key.

 

> MERGE INTO works only ON primary key
> 
>
> Key: HUDI-2234
> URL: https://issues.apache.org/jira/browse/HUDI-2234
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Sagar Sumit
>Assignee: Yann Byron
>Priority: Blocker
> Fix For: 0.10.0
>
>
> {code:sql}
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (id int, name string, price double, ts long) 
> using hudi options(primaryKey = 'id', precombineField = 'ts') location 
> 'file:///tmp/hudi-h4-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120);
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120);
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120);
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (id int, name string, price double, ts long) using 
> hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) 
> location 'file:///tmp/hudi-h4-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 120);
> MERGE INTO hudi_fixed 
> USING (select id, name, price, ts from hudi_gh_ext_fixed) updates
> ON hudi_fixed.name = updates.name
> WHEN MATCHED THEN
>   UPDATE SET *
> WHEN NOT MATCHED
>   THEN INSERT *;
> -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the 
> defined primary key[id] in table hudi_fixed
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> --at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

2021-11-17 Thread GitBox


JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi,Here is my test code:
   
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
   def main(args: Array[String]): Unit = {
   val AD_DATE = "1980-01-01 00:00:00"
   val spark = SparkSession
   .builder()
   .config(
   new SparkConf()
   .set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
   .   set("spark.master","local[2]")
   )
   .getOrCreate()
   
   val data = Seq(
   Row(1, "A", 10, DateUtil.now()),
   Row(2, "B", 20, DateUtil.now()),
   Row(3, "C", 30, DateUtil.now()))
   
   val schema = StructType(List(
   StructField("id", IntegerType),
   StructField("name", StringType),
   StructField("age", IntegerType),
   StructField("dt", StringType)))
   
   val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), 
schema)
   
   df.show(false)
   
   df.write.format("org.apache.hudi")
   .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
   .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
   .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
   .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
   .option("hoodie.index.type", "HBASE")
   .option("hoodie.index.hbase.zkport", "2181")
   .option("hoodie.hbase.index.update.partition.path", "true")
   .option("hoodie.index.hbase.max.qps.fraction", "1")
   .option("hoodie.index.hbase.min.qps.fraction", "1000")
   .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
   .option("hoodie.index.hbase.zknode.path", "/hbase")
   .option("hoodie.index.hbase.get.batch.size", "1000")
   .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
   .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
   .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
   .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
   .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
   .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
   .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, 
HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
   
.option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, 
HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, 
HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, 
HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, 
HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, 
HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, 
HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, 
HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, 
HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, 
HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, 
HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, 

[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

2021-11-17 Thread GitBox


JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi,Here is my test code:
   
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
   def main(args: Array[String]): Unit = {
   val AD_DATE = "1980-01-01 00:00:00"
   val spark = SparkSession
   .builder()
   .config(
   new SparkConf()
   .set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
   .   set("spark.master","local[2]")
   )
   .getOrCreate()
   
   val data = Seq(
   Row(1, "A", 10, DateUtil.now()),
   Row(2, "B", 20, DateUtil.now()),
   Row(3, "C", 30, DateUtil.now()))
   
   val schema = StructType(List(
   StructField("id", IntegerType),
   StructField("name", StringType),
   StructField("age", IntegerType),
   StructField("dt", StringType)))
   
   val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), 
schema)
   
   df.show(false)
   
   df.write.format("org.apache.hudi")
   .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
   .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
   .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
   .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
   .option("hoodie.index.type", "HBASE")
   .option("hoodie.index.hbase.zkport", "2181")
   .option("hoodie.hbase.index.update.partition.path", "true")
   .option("hoodie.index.hbase.max.qps.fraction", "1")
   .option("hoodie.index.hbase.min.qps.fraction", "1000")
   .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
   .option("hoodie.index.hbase.zknode.path", "/hbase")
   .option("hoodie.index.hbase.get.batch.size", "1000")
   .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
   .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
   .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
   .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
   .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
   .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
   .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, 
HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
   
.option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, 
HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, 
HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, 
HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, 
HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, 
HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, 
HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, 
HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, 
HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, 
HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, 
HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, 

[GitHub] [hudi] JoshuaZhuCN edited a comment on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

2021-11-17 Thread GitBox


JoshuaZhuCN edited a comment on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-971393339


   @xushiyan Hi,Here is my test code:
   
   ```
   import com.leqee.sparktool.date.DateUtil
   import com.leqee.sparktool.hoodie.HoodieProp
   import com.leqee.sparktool.spark.SparkTool
   import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions, 
DataSourceOptionsHelper}
   import org.apache.spark.SparkConf
   import org.apache.spark.sql.{Row, SaveMode, SparkSession}
   import org.apache.spark.sql.functions.{col, lit}
   import org.apache.spark.sql.types.{IntegerType, StringType, StructField, 
StructType, TimestampType}
   import org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER
   import org.apache.hudi.common.config.HoodieMetadataConfig
   import org.apache.hudi.common.model.HoodieCleaningPolicy
   import org.apache.hudi.config._
   import org.apache.hudi.index.HoodieIndex
   import org.apache.hudi.keygen.constant.KeyGeneratorOptions
   
   object Test4 {
   def main(args: Array[String]): Unit = {
   val AD_DATE = "1980-01-01 00:00:00"
   val spark = SparkSession
   .builder()
   .config(
   new SparkConf()
   .set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
   .   set("spark.master","local[2]")
   )
   .getOrCreate()
   
   val data = Seq(
   Row(1, "A", 10, DateUtil.now()),
   Row(2, "B", 20, DateUtil.now()),
   Row(3, "C", 30, DateUtil.now()))
   
   val schema = StructType(List(
   StructField("id", IntegerType),
   StructField("name", StringType),
   StructField("age", IntegerType),
   StructField("dt", StringType)))
   
   val df = spark.createDataFrame(spark.sparkContext.makeRDD(data), 
schema)
   
   df.show(false)
   
   df.write.format("org.apache.hudi")
   .option(DataSourceWriteOptions.RECORDKEY_FIELD.key(), "id")
   .option(DataSourceWriteOptions.PRECOMBINE_FIELD.key(), "id")
   .option(HoodieWriteConfig.TBL_NAME.key(), "tb_hbase_test")
   .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
   .option("hoodie.datasource.write.table.type", "MERGE_ON_READ")
   .option("hoodie.index.type", "HBASE")
   .option("hoodie.index.hbase.zkport", "2181")
   .option("hoodie.hbase.index.update.partition.path", "true")
   .option("hoodie.index.hbase.max.qps.fraction", "1")
   .option("hoodie.index.hbase.min.qps.fraction", "1000")
   .option("hoodie.index.hbase.table", "hudi:tb_hbase_test")
   .option("hoodie.index.hbase.zknode.path", "/hbase")
   .option("hoodie.index.hbase.get.batch.size", "1000")
   .option("hoodie.index.hbase.zkquorum", "127.0.0.1")
   .option("hoodie.index.hbase.sleep.ms.for.get.batch", "100")
   .option("hoodie.index.hbase.sleep.ms.for.get.batch", "10")
   .option("hoodie.index.hbase.max.qps.per.region.server", "1000")
   .option("hoodie.index.hbase.zk.session_timeout_ms", "5000")
   .option("hoodie.index.hbase.desired_puts_time_in_secs", "3600")
   .option(HoodieProp.INDEX_HBASE_ZKPORT_PROP, 
HoodieProp.INDEX_HBASE_ZKPORT_VALUE_DEFAULT)
   
.option(HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_PROP, 
HoodieProp.INDEX_HBASE_UPDATE_PARTITION_PATH_ENABLE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_PROP, 
HoodieProp.INDEX_HBASE_QPS_ALLOCATOR_CLASS_NAME_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_PROP, 
HoodieProp.INDEX_HBASE_PUT_BATCH_SIZE_AUTO_COMPUTE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_PROP, 
HoodieProp.INDEX_HBASE_ROLLBACK_SYNC_ENABLE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_PROP, 
HoodieProp.INDEX_HBASE_GET_BATCH_SIZE_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_PROP, 
HoodieProp.INDEX_HBASE_ZKPATH_QPS_ROOT_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_PROP, 
HoodieProp.INDEX_HBASE_MAX_QPS_PER_REGION_SERVER_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_MAX_QPS_FRACTION_PROP, 
HoodieProp.INDEX_HABSE_MAX_QPS_FRACTION_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HABSE_MIN_QPS_FRACTION_PROP, 
HoodieProp.INDEX_HBASE_MIN_QPS_FRACTION_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_PROP, 
HoodieProp.INDEX_HBASE_ZK_CONNECTION_TIMEOUT_MS_VALUE_DEFAULT)
   .option(HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_PROP, 
HoodieProp.INDEX_HBASE_COMPUTE_QPS_DYNAMICALLY_VALUE_DEFAULT)
   

[jira] [Closed] (HUDI-2234) MERGE INTO works only ON primary key

2021-11-17 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2234.


> MERGE INTO works only ON primary key
> 
>
> Key: HUDI-2234
> URL: https://issues.apache.org/jira/browse/HUDI-2234
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Sagar Sumit
>Assignee: Yann Byron
>Priority: Blocker
> Fix For: 0.10.0
>
>
> {code:sql}
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (id int, name string, price double, ts long) 
> using hudi options(primaryKey = 'id', precombineField = 'ts') location 
> 'file:///tmp/hudi-h4-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120);
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120);
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120);
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (id int, name string, price double, ts long) using 
> hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) 
> location 'file:///tmp/hudi-h4-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 120);
> MERGE INTO hudi_fixed 
> USING (select id, name, price, ts from hudi_gh_ext_fixed) updates
> ON hudi_fixed.name = updates.name
> WHEN MATCHED THEN
>   UPDATE SET *
> WHEN NOT MATCHED
>   THEN INSERT *;
> -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the 
> defined primary key[id] in table hudi_fixed
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> --at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HUDI-2234) MERGE INTO works only ON primary key

2021-11-17 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-2234.
--

> MERGE INTO works only ON primary key
> 
>
> Key: HUDI-2234
> URL: https://issues.apache.org/jira/browse/HUDI-2234
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Sagar Sumit
>Assignee: Yann Byron
>Priority: Blocker
> Fix For: 0.10.0
>
>
> {code:sql}
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (id int, name string, price double, ts long) 
> using hudi options(primaryKey = 'id', precombineField = 'ts') location 
> 'file:///tmp/hudi-h4-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120);
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120);
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120);
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (id int, name string, price double, ts long) using 
> hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) 
> location 'file:///tmp/hudi-h4-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 120);
> MERGE INTO hudi_fixed 
> USING (select id, name, price, ts from hudi_gh_ext_fixed) updates
> ON hudi_fixed.name = updates.name
> WHEN MATCHED THEN
>   UPDATE SET *
> WHEN NOT MATCHED
>   THEN INSERT *;
> -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the 
> defined primary key[id] in table hudi_fixed
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> --at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-2234) MERGE INTO works only ON primary key

2021-11-17 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445622#comment-17445622
 ] 

Raymond Xu commented on HUDI-2234:
--

 
{code:java}
create table if not exists h1 
(id int, name string, price double, ts long) using hudi 
location '/tmp/hudi/tb1' 
options (type='cow', primaryKey='id', precombineField='ts');

create table if not exists h2 
(id int, name string, price double, ts long) using hudi 
location '/tmp/hudi/tb2' 
options (type='cow', primaryKey='id', precombineField='ts');

insert into h1 select 3, 'AMZN', 300, 120;
insert into h1 select 2, 'UBER', 300, 120;
insert into h1 select 4, 'GOOG', 300, 120;
insert into h2 select 2, 'UBER', 200, 120;

merge into h1 as target 
using (select id, name, price, ts from h2) source 
on target.name = source.name 
when matched then update set * 
when not matched then insert *;{code}
verified this with spark 3.1.2 using 0.10.0 SNAPSHOT, the merge into worked 
with non-primary key.

 

> MERGE INTO works only ON primary key
> 
>
> Key: HUDI-2234
> URL: https://issues.apache.org/jira/browse/HUDI-2234
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Sagar Sumit
>Assignee: Yann Byron
>Priority: Blocker
> Fix For: 0.10.0
>
>
> {code:sql}
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (id int, name string, price double, ts long) 
> using hudi options(primaryKey = 'id', precombineField = 'ts') location 
> 'file:///tmp/hudi-h4-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120);
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120);
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120);
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (id int, name string, price double, ts long) using 
> hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) 
> location 'file:///tmp/hudi-h4-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 120);
> MERGE INTO hudi_fixed 
> USING (select id, name, price, ts from hudi_gh_ext_fixed) updates
> ON hudi_fixed.name = updates.name
> WHEN MATCHED THEN
>   UPDATE SET *
> WHEN NOT MATCHED
>   THEN INSERT *;
> -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the 
> defined primary key[id] in table hudi_fixed
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> --at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #3813:
URL: https://github.com/apache/hudi/pull/3813#issuecomment-972475571


   
   ## CI report:
   
   * 9c1e75f71938f6c57104cf80d5bc5ce010f6fdb3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3431)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3434)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3437)
 
   * 08173246c29fda1c9880f5ad897cf61423523e1a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3453)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-11-17 Thread GitBox


hudi-bot commented on pull request #3813:
URL: https://github.com/apache/hudi/pull/3813#issuecomment-972510246


   
   ## CI report:
   
   * 08173246c29fda1c9880f5ad897cf61423523e1a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3453)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2788) Z-ordering Layout Optimization Strategy fails w/ Data Skipping enabled

2021-11-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2788:
-
Labels: pull-request-available  (was: )

> Z-ordering Layout Optimization Strategy fails w/ Data Skipping enabled
> --
>
> Key: HUDI-2788
> URL: https://issues.apache.org/jira/browse/HUDI-2788
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Index
>Reporter: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> During testing of Z-ordering in test environment i've discovered following 
> issues:
>  # Queries failing for tables w/ enabled Clustering w/ Z-ordering Layout and 
> data-skipping enabled, being unable to read `_SUCCESS` file (automatically 
> created by Spark)
>  # Some of the translations of the original query predicates into Z-index 
> table predicates are translated incorrectly (`!=`, `not like`, etc)
>  # Join merging indexes across commits incorrectly always checks for null 
> first column (instead of Nth column) when picking the result of the merge



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972500657


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3455)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [HUDI-2788][WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972484501


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4025:
URL: https://github.com/apache/hudi/pull/4025#issuecomment-972490440


   
   ## CI report:
   
   * 5e9a288bf8a8634662f7d3fa42abe7253d80d6b1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3451)
 
   * 284f739772e4ff8f337a8d2c6d4c72302be4cdaa UNKNOWN
   * 8dc9c77f5035e0ab54a4d731a43ff545e053d9f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4025:
URL: https://github.com/apache/hudi/pull/4025#issuecomment-972498000


   
   ## CI report:
   
   * 5e9a288bf8a8634662f7d3fa42abe7253d80d6b1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3451)
 
   * 284f739772e4ff8f337a8d2c6d4c72302be4cdaa UNKNOWN
   * 8dc9c77f5035e0ab54a4d731a43ff545e053d9f1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3454)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2788) Z-ordering Layout Optimization Strategy fails w/ Data Skipping enabled

2021-11-17 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-2788:
-

 Summary: Z-ordering Layout Optimization Strategy fails w/ Data 
Skipping enabled
 Key: HUDI-2788
 URL: https://issues.apache.org/jira/browse/HUDI-2788
 Project: Apache Hudi
  Issue Type: Bug
  Components: Index
Reporter: Alexey Kudinkin
 Fix For: 0.10.0


During testing of Z-ordering in test environment i've discovered following 
issues:
 # Queries failing for tables w/ enabled Clustering w/ Z-ordering Layout and 
data-skipping enabled, being unable to read `_SUCCESS` file (automatically 
created by Spark)
 # Some of the translations of the original query predicates into Z-index table 
predicates are translated incorrectly (`!=`, `not like`, etc)
 # Join merging indexes across commits incorrectly always checks for null first 
column (instead of Nth column) when picking the result of the merge



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] prashantwason commented on a change in pull request #4023: [HUDI-2472] Enabling metadata table for TestHoodieMergeOnReadTable and TestHoodieCompactor

2021-11-17 Thread GitBox


prashantwason commented on a change in pull request #4023:
URL: https://github.com/apache/hudi/pull/4023#discussion_r751883243



##
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestTable.java
##
@@ -131,6 +131,7 @@ protected HoodieTestTable(String basePath, FileSystem fs, 
HoodieTableMetaClient
 this.basePath = basePath;
 this.fs = fs;
 this.metaClient = metaClient;
+testTableState = HoodieTestTableState.of();

Review comment:
   Where is this used?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2716) InLineFS support for S3FS

2021-11-17 Thread Manoj Govindassamy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HUDI-2716:
-
Summary: InLineFS support for S3FS  (was: Fix InLineFS path conversions for 
S3FS paths)

> InLineFS support for S3FS
> -
>
> Key: HUDI-2716
> URL: https://issues.apache.org/jira/browse/HUDI-2716
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> with S3, path's format is "s3a://path" which has 2 slashes "//" after the 
> scheme. Inline file system couldn't handle it. resolved path after inline -> 
> unwrapping inline comes to "s3a:/path". 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4025:
URL: https://github.com/apache/hudi/pull/4025#issuecomment-972473090


   
   ## CI report:
   
   * 5e9a288bf8a8634662f7d3fa42abe7253d80d6b1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3451)
 
   * 284f739772e4ff8f337a8d2c6d4c72302be4cdaa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4025: [HUDI-2742] - Added s3 object filter to support multiple S3EventsHood…

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4025:
URL: https://github.com/apache/hudi/pull/4025#issuecomment-972490440


   
   ## CI report:
   
   * 5e9a288bf8a8634662f7d3fa42abe7253d80d6b1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3451)
 
   * 284f739772e4ff8f337a8d2c6d4c72302be4cdaa UNKNOWN
   * 8dc9c77f5035e0ab54a4d731a43ff545e053d9f1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 edited a comment on issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2021-11-17 Thread GitBox


liujinhui1994 edited a comment on issue #4027:
URL: https://github.com/apache/hudi/issues/4027#issuecomment-972486708


   If it is my usage problem, please let me know.
   If you need me to provide other information, please let me know.
   thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2021-11-17 Thread GitBox


liujinhui1994 commented on issue #4027:
URL: https://github.com/apache/hudi/issues/4027#issuecomment-972486708


   If it is my usage problem, please let me know.
   If you need me to provide other information, please let me know


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2021-11-17 Thread GitBox


liujinhui1994 commented on issue #4027:
URL: https://github.com/apache/hudi/issues/4027#issuecomment-972486455


   
![image](https://user-images.githubusercontent.com/25769285/142346521-c6c53e17-6d94-47d2-859e-01c6f31648b6.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2021-11-17 Thread GitBox


liujinhui1994 commented on issue #4027:
URL: https://github.com/apache/hudi/issues/4027#issuecomment-972485283


   
![image](https://user-images.githubusercontent.com/25769285/142346330-f90a656e-3401-452a-9be4-4b320cd68275.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] 0x574C closed issue #3971: [SUPPORT] HoodieFlinkStreamer parameter bug

2021-11-17 Thread GitBox


0x574C closed issue #3971:
URL: https://github.com/apache/hudi/issues/3971


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972483415


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4026: [WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972484501


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-1974) Run pyspark and validate that it works correctly with all hudi versions

2021-11-17 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra reassigned HUDI-1974:
-

Assignee: Rajesh Mahindra

> Run pyspark and validate that it works correctly with all hudi versions
> ---
>
> Key: HUDI-1974
> URL: https://issues.apache.org/jira/browse/HUDI-1974
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Nishith Agarwal
>Assignee: Rajesh Mahindra
>Priority: Blocker
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2761) IllegalArgException from timeline server when serving getLastestBaseFiles with multi-writer

2021-11-17 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra reassigned HUDI-2761:
-

Assignee: Manoj Govindassamy  (was: sivabalan narayanan)

> IllegalArgException from timeline server when serving getLastestBaseFiles 
> with multi-writer
> ---
>
> Key: HUDI-2761
> URL: https://issues.apache.org/jira/browse/HUDI-2761
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Blocker
> Fix For: 0.10.0
>
> Attachments: Screen Shot 2021-11-15 at 8.27.11 AM.png, Screen Shot 
> 2021-11-15 at 8.27.33 AM.png, Screen Shot 2021-11-15 at 8.28.03 AM.png, 
> Screen Shot 2021-11-15 at 8.28.25 AM.png
>
>
> When concurrent writes try to ingest to hudi, occasionally, we run into 
> IllegalArgumentException as below. Even though exception is seen, the actual 
> write succeeds though. 
> Here is what is happening from my understanding. 
>  
> Lets say table's latest commit is C3. 
> Writer1 tries to commit C4, writer2 tries to do C5 and writer3 tries to do C6 
> (all 3 are non-overlapping and so expected to succeed) 
> I started C4 from writer1 and then switched to writer 2 and triggered C5 and 
> then did the same for writer3. 
> C4 went through fine for writer1 and succeeded. 
> for writer2, when timeline got instantiated, it's latest snapshot was C3, but 
> when it received the getLatestBaseFiles() request, latest commit was C4 and 
> so it fails. Similar issue happend w/ writer3 as well. 
>  
>  
> {code:java}
> scala> df.write.format("hudi").
>      |   options(getQuickstartWriteConfigs).
>      |   option(PRECOMBINE_FIELD.key(), "created_at").
>      |   option(RECORDKEY_FIELD.key(), "other").
>      |   option(PARTITIONPATH_FIELD.key(), "type").
>      |   option("hoodie.cleaner.policy.failed.writes","LAZY").
>      |   
> option("hoodie.write.concurrency.mode","OPTIMISTIC_CONCURRENCY_CONTROL").
>      |   
> option("hoodie.write.lock.provider","org.apache.hudi.client.transaction.lock.ZookeeperBasedLockProvider").
>      |   option("hoodie.write.lock.zookeeper.url","localhost").
>      |   option("hoodie.write.lock.zookeeper.port","2181").
>      |   option("hoodie.write.lock.zookeeper.lock_key","locks").
>      |   
> option("hoodie.write.lock.zookeeper.base_path","/tmp/mw_testing/.locks").
>      |   option(TBL_NAME.key(), tableName).
>      |   mode(Append).
>      |   save(basePath)
> 21/11/15 07:47:33 WARN HoodieSparkSqlWriter$: Commit time 2025074733457
> 21/11/15 07:47:35 WARN EmbeddedTimelineService: Started embedded timeline 
> server at 10.0.0.202:57644
> [Stage 2:>                                                        (0          
>                                                           21/11/15 07:47:39 
> ERROR RequestHandler: Got runtime exception servicing request 
> partition=CreateEvent=2025074301094=file%3A%2Ftmp%2Fmw_testing%2Ftrial2=2025074301094=ce963fe977a9d2176fadecf16c223cb3b98d7f6f7aaaf41cd7855eb098aee47d
> java.lang.IllegalArgumentException: Last known instant from client was 
> 2025074301094 but server has the following timeline 
> [[2025074301094__commit__COMPLETED], 
> [2025074731908__commit__COMPLETED]]
>     at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
>     at 
> org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:510)
>     at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22)
>     at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606)
>     at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46)
>     at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17)
>     at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143)
>     at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41)
>     at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107)
>     at 
> io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
>     at 
> org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>     at 
> org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
>     at 
> org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
>     at 
> org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>     at 
> org.apache.hudi.org.apache.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
>     at 
> org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>     at 
> 

[jira] [Assigned] (HUDI-2590) Validate Diff key gen w/ and w/o glob path with and w/o metadata enabled

2021-11-17 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra reassigned HUDI-2590:
-

Assignee: Manoj Govindassamy  (was: sivabalan narayanan)

> Validate Diff key gen w/ and w/o glob path with and w/o metadata enabled
> 
>
> Key: HUDI-2590
> URL: https://issues.apache.org/jira/browse/HUDI-2590
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-2527) Flaky test: TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict

2021-11-17 Thread Rajesh Mahindra (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra reassigned HUDI-2527:
-

Assignee: Manoj Govindassamy  (was: sivabalan narayanan)

> Flaky test: 
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict
> -
>
> Key: HUDI-2527
> URL: https://issues.apache.org/jira/browse/HUDI-2527
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Manoj Govindassamy
>Priority: Blocker
> Fix For: 0.10.0
>
>
>  
> {code:java}
>  [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 61.795 s <<< FAILURE! - in org.apache.hudi.client.TestHoodieClientMultiWriter 
>[ERROR] 
> org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(HoodieTableType)[1]
>  Time elapsed: 9.689 s <<< ERROR!java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieHeartbeatException: Unable to generate 
> heartbeat at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict(TestHoodieClientMultiWriter.java:227)
> Caused by: java.lang.RuntimeException: 
> org.apache.hudi.exception.HoodieHeartbeatException: Unable to generate 
> heartbeat at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.lambda$testMultiWriterWithAsyncTableServicesWithConflict$5(TestHoodieClientMultiWriter.java:205)
> Caused by: org.apache.hudi.exception.HoodieHeartbeatException: Unable to 
> generate heartbeat at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.createCommitWithInserts(TestHoodieClientMultiWriter.java:285)
>  at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.lambda$testMultiWriterWithAsyncTableServicesWithConflict$5(TestHoodieClientMultiWriter.java:202)
> Caused by: org.apache.hadoop.util.Shell$ExitCodeException:chmod: 
> cannot access 
> '/tmp/junit213441136342269/dataset/.hoodie/.heartbeat/.007.crc': No such 
> file or directory       at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.createCommitWithInserts(TestHoodieClientMultiWriter.java:285)
>  at 
> org.apache.hudi.client.TestHoodieClientMultiWriter.lambda$testMultiWriterWithAsyncTableServicesWithConflict$5(TestHoodieClientMultiWriter.java:202)
>  
> [ERROR] Errors: 
> [ERROR]   
> TestHoodieClientMultiWriter.testMultiWriterWithAsyncTableServicesWithConflict:227
>  » Execution{code}
>  
>  
> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=2352=logs=600e7de6-e133-5e69-e615-50ee129b3c08=bbbd7bcc-ae73-56b8-887a-cd2d6deaafc7
>  
> Test case does not make sense for COW table. Should remove COW from the test 
> param.
> Consider rewrite the prep logic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4026: [WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot commented on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972483415


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   * 35605e6aa6afa0c5333dfcc5a3b63f767789be09 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4026: [WIP] Fixing issues w/ Z-ordering

2021-11-17 Thread GitBox


hudi-bot removed a comment on pull request #4026:
URL: https://github.com/apache/hudi/pull/4026#issuecomment-972469707


   
   ## CI report:
   
   * 90cf3fd6c12fb8d4b0b302deebf3e60f96603842 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3452)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yanghua commented on a change in pull request #3888: [HUDI-2624] Implement Non Index type for HUDI

2021-11-17 Thread GitBox


yanghua commented on a change in pull request #3888:
URL: https://github.com/apache/hudi/pull/3888#discussion_r751875479



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestNonIndex.scala
##
@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.hudi.common.testutils.RawTripTestPayload.recordsToStrings
+import org.apache.hudi.config.{HoodieIndexConfig, HoodieWriteConfig}
+import org.apache.hudi.index.HoodieIndex
+import org.apache.hudi.testutils.{DataSourceTestUtils, HoodieClientTestBase}
+import org.junit.jupiter.api.Test
+
+import scala.collection.JavaConversions._
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.{HoodieRecord, WriteOperationType}
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions
+import org.apache.spark.sql.{Dataset, Row, SaveMode}
+
+import scala.collection.JavaConverters
+
+class TestNonIndex extends HoodieClientTestBase {
+  val commonOpts = Map(
+"hoodie.insert.shuffle.parallelism" -> "4",
+"hoodie.upsert.shuffle.parallelism" -> "4",
+KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key() -> "partition",
+HoodieIndexConfig.INDEX_TYPE.key() -> 
HoodieIndex.IndexType.NON_INDEX.name(),
+HoodieWriteConfig.KEYGENERATOR_CLASS_NAME.key() -> 
"org.apache.hudi.keygen.EmptyKeyGenerator"
+  )
+
+  @Test
+  def testNonIndexMORInsert(): Unit = {
+val spark = sqlContext.sparkSession
+
+val records1 = recordsToStrings(dataGen.generateInserts("001", 100)).toList
+// first insert, parquet files
+val inputDF1: Dataset[Row] = 
spark.read.json(spark.sparkContext.parallelize(records1, 2))
+inputDF1.write.format("org.apache.hudi")
+  .options(commonOpts)
+  .option(DataSourceWriteOptions.OPERATION.key(), 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+  .option(DataSourceWriteOptions.TABLE_TYPE.key(), 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
+  .mode(SaveMode.Overwrite)
+  .save(basePath)
+
+// second insert, log files
+val records2 = recordsToStrings(dataGen.generateInserts("002", 100)).toList
+// first insert, parquet files

Review comment:
   ?

##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestEmptyKeyGenerator.java
##
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
+import org.apache.hudi.testutils.KeyGeneratorTestUtilities;
+import org.apache.spark.sql.Row;
+
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+
+public class TestEmptyKeyGenerator extends KeyGeneratorTestUtilities {
+
+  private TypedProperties getCommonProps() {
+TypedProperties properties = new TypedProperties();
+properties.put(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key(), 
"true");
+return properties;
+  }
+
+  private TypedProperties getPropertiesWithoutPartitionPathProp() {
+return getCommonProps();
+  }
+
+  private TypedProperties 

[GitHub] [hudi] vinothchandar commented on pull request #4013: [HUDI-2778] Optimize statistics collection related codes and add some docs for z-order

2021-11-17 Thread GitBox


vinothchandar commented on pull request #4013:
URL: https://github.com/apache/hudi/pull/4013#issuecomment-972482783


   cc @alexeykudinkin can you review this once as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2314) Add DynamoDb based lock provider

2021-11-17 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2314:
--
Status: Closed  (was: Patch Available)

> Add DynamoDb based lock provider
> 
>
> Key: HUDI-2314
> URL: https://issues.apache.org/jira/browse/HUDI-2314
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Similar to zookeeper & hive metastore based lock provider, we need to add 
> DynamoDb based lock provider. The benefit of having DynamoDb based lock 
> provider is for the customers who use AWS EMR, they can share the lock 
> information across different EMR clusters and it's more easy for them to 
> config.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (HUDI-2314) Add DynamoDb based lock provider

2021-11-17 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2314:
---

> Add DynamoDb based lock provider
> 
>
> Key: HUDI-2314
> URL: https://issues.apache.org/jira/browse/HUDI-2314
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Wenning Ding
>Assignee: Wenning Ding
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Similar to zookeeper & hive metastore based lock provider, we need to add 
> DynamoDb based lock provider. The benefit of having DynamoDb based lock 
> provider is for the customers who use AWS EMR, they can share the lock 
> information across different EMR clusters and it's more easy for them to 
> config.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   >