(hudi) branch asf-site updated: Update docker_demo.md (#10522)

2024-01-19 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 053c05794e2 Update docker_demo.md (#10522)
053c05794e2 is described below

commit 053c05794e29ad6af179360507f3e27e6a6b81ac
Author: Dan Roscigno 
AuthorDate: Sat Jan 20 02:06:40 2024 -0500

Update docker_demo.md (#10522)

* Update docker_demo.md

Based on my experience trying the demo and issue #10262 I am suggesting 
that instead of using the master branch for the demo the tab 0.14.1 be used.

Additionally, the `mvn` command should specify specific versions: 
`-Dscala-2.11 -Dspark2.4`
---
 website/versioned_docs/version-0.14.1/docker_demo.md | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/website/versioned_docs/version-0.14.1/docker_demo.md 
b/website/versioned_docs/version-0.14.1/docker_demo.md
index 0564bce20a7..ae8b232dad1 100644
--- a/website/versioned_docs/version-0.14.1/docker_demo.md
+++ b/website/versioned_docs/version-0.14.1/docker_demo.md
@@ -49,9 +49,10 @@ The first step is to build Hudi. **Note** This step builds 
Hudi on default suppo
 
 NOTE: Make sure you've cloned the [Hudi 
repository](https://github.com/apache/hudi) first. 
 
-```java
+```bash
 cd 
-mvn clean package -Pintegration-tests -DskipTests
+git checkout release-0.14.1
+mvn clean package -Pintegration-tests -DskipTests -Dscala-2.11 -Dspark2.4
 ```
 
 ### Bringing up Demo Cluster
@@ -134,9 +135,10 @@ $ docker ps
 
 :::note Please note the following for Mac AArch64 users
 
-   The demo must be built and run using the master branch. We currently 
plan to include support starting with the
-0.13.0 release. 
+   The demo must be built and run using the release-0.14.1 tag. 
Presto and Trino are not currently supported in the demo. 
+   You will see warnings that there is no history server for your 
architecture. You can ignore these. 
+   You wil see the warning "Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable." You can ignore this. 

 
 :::
 
@@ -339,7 +341,7 @@ After executing the above command, you will notice
 
 1. A hive table named `stock_ticks_cow` created which supports Snapshot and 
Incremental queries on Copy On Write table.
 2. Two new tables `stock_ticks_mor_rt` and `stock_ticks_mor_ro` created for 
the Merge On Read table. The former
-supports Snapshot and Incremental queries (providing near-real time data) 
while the later supports ReadOptimized queries.
+supports Snapshot and Incremental queries (providing near-real time data) 
while the later supports ReadOptimized queries. 
`http://namenode:50070/explorer.html#/user/hive/warehouse/stock_ticks_mor`
 
 
 ### Step 4 (a): Run Hive Queries



[hudi] branch master updated (61fc3c03a6 -> 59f652a19c)

2022-08-08 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 61fc3c03a6 [HUDI-4447] fix SQL metasync when perform delete table 
operation (#6180)
 add 59f652a19c [HUDI-4424] Add new compactoin trigger stratgy: 
NUM_COMMITS_AFTER_REQ… (#6144)

No new revisions were added by this update.

Summary of changes:
 .../action/compact/CompactionTriggerStrategy.java  |  2 +
 .../compact/ScheduleCompactionActionExecutor.java  | 23 +++
 .../table/action/compact/TestInlineCompaction.java | 74 ++
 .../apache/hudi/common/util/CompactionUtils.java   | 30 +
 4 files changed, 129 insertions(+)



[hudi] branch master updated (1ea1e659c2 -> e5faf2cc84)

2022-07-26 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 1ea1e659c2 [HUDI-4474] Infer metasync configs (#6217)
 add e5faf2cc84 [HUDI-4210] Create custom hbase index to solve data skew 
issue on hbase regions (#5797)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/config/HoodieHBaseIndexConfig.java |  4 
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  4 
 .../hbase/RebalancedSparkHoodieHBaseIndex.java}| 26 +++---
 .../hudi/index/hbase/SparkHoodieHBaseIndex.java| 10 ++---
 4 files changed, 23 insertions(+), 21 deletions(-)
 copy 
hudi-client/{hudi-client-common/src/main/java/org/apache/hudi/table/storage/HoodieDefaultLayout.java
 => 
hudi-spark-client/src/main/java/org/apache/hudi/index/hbase/RebalancedSparkHoodieHBaseIndex.java}
 (60%)



[hudi] branch master updated: [HUDI-4065] Add FileBasedLockProvider (#6071)

2022-07-18 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 382d19e85b [HUDI-4065] Add FileBasedLockProvider (#6071)
382d19e85b is described below

commit 382d19e85b06d0fc7f4ad37bb4e4eae3f5e76b78
Author: 冯健 
AuthorDate: Tue Jul 19 07:52:47 2022 +0800

[HUDI-4065] Add FileBasedLockProvider (#6071)
---
 .../lock/FileSystemBasedLockProvider.java  | 152 +
 .../org/apache/hudi/config/HoodieLockConfig.java   |   9 +-
 .../hudi/client/TestFileBasedLockProvider.java | 135 ++
 .../hudi/client/TestHoodieClientMultiWriter.java   |  87 +++-
 .../hudi/common/config/LockConfiguration.java  |   2 +
 5 files changed, 349 insertions(+), 36 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/FileSystemBasedLockProvider.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/FileSystemBasedLockProvider.java
new file mode 100644
index 00..96a42e8409
--- /dev/null
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/FileSystemBasedLockProvider.java
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.client.transaction.lock;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.config.LockConfiguration;
+import org.apache.hudi.common.fs.FSUtils;
+import org.apache.hudi.common.lock.LockProvider;
+import org.apache.hudi.common.lock.LockState;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieIOException;
+import org.apache.hudi.exception.HoodieLockException;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.concurrent.TimeUnit;
+
+import static 
org.apache.hudi.common.config.LockConfiguration.FILESYSTEM_LOCK_EXPIRE_PROP_KEY;
+import static 
org.apache.hudi.common.config.LockConfiguration.FILESYSTEM_LOCK_PATH_PROP_KEY;
+
+/**
+ * A FileSystem based lock. This {@link LockProvider} implementation allows to 
lock table operations
+ * using DFS. Users might need to manually clean the Locker's path if 
writeClient crash and never run again.
+ * NOTE: This only works for DFS with atomic create/delete operation
+ */
+public class FileSystemBasedLockProvider implements LockProvider, 
Serializable {
+
+  private static final Logger LOG = 
LogManager.getLogger(FileSystemBasedLockProvider.class);
+
+  private static final String LOCK_FILE_NAME = "lock";
+
+  private final int lockTimeoutMinutes;
+  private transient FileSystem fs;
+  private transient Path lockFile;
+  protected LockConfiguration lockConfiguration;
+
+  public FileSystemBasedLockProvider(final LockConfiguration 
lockConfiguration, final Configuration configuration) {
+checkRequiredProps(lockConfiguration);
+this.lockConfiguration = lockConfiguration;
+String lockDirectory = 
lockConfiguration.getConfig().getString(FILESYSTEM_LOCK_PATH_PROP_KEY, null);
+if (StringUtils.isNullOrEmpty(lockDirectory)) {
+  lockDirectory = 
lockConfiguration.getConfig().getString(HoodieWriteConfig.BASE_PATH.key())
++ Path.SEPARATOR + HoodieTableMetaClient.METAFOLDER_NAME;
+}
+this.lockTimeoutMinutes = 
lockConfiguration.getConfig().getInteger(FILESYSTEM_LOCK_EXPIRE_PROP_KEY);
+this.lockFile = new Path(lockDirectory + Path.SEPARATOR + LOCK_FILE_NAME);
+this.fs = FSUtils.getFs(this.lockFile.toString(), configuration);
+  }
+
+  @Override
+  public void close() {
+synchronized (LOCK_FILE_NAME) {
+  try {
+fs.delete(this.lockFile, true);
+  } catch (IOException e) {
+throw new 
HoodieLock

[jira] [Updated] (HUDI-4409) Improve LockManager wait logic when catch exception

2022-07-18 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-4409:
---
Summary: Improve LockManager wait logic when catch exception  (was: 
LockManager improve wait time logic)

> Improve LockManager wait logic when catch exception
> ---
>
> Key: HUDI-4409
> URL: https://issues.apache.org/jira/browse/HUDI-4409
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> //public void lock() {
>   if 
> (writeConfig.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl())
>  {
> LockProvider lockProvider = getLockProvider();
> int retryCount = 0;
> boolean acquired = false;
> while (retryCount <= maxRetries) {
>   try {
> acquired = 
> lockProvider.tryLock(writeConfig.getLockAcquireWaitTimeoutInMs(), 
> TimeUnit.MILLISECONDS);
> if (acquired) {
>   break;
> }
> LOG.info("Retrying to acquire lock...");
> Thread.sleep(maxWaitTimeInMs);
>   } catch (HoodieLockException | InterruptedException e) {
> if (retryCount >= maxRetries) {
>   throw new HoodieLockException("Unable to acquire lock, lock object 
> ", e);
> }
>   } finally {
> retryCount++;
>   }
> }
> if (!acquired) {
>   throw new HoodieLockException("Unable to acquire lock, lock object " + 
> lockProvider.getLock());
> }
>   }
> } {code}
> We should put sleep in catch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4409) Improve LockManager wait logic when catch exception

2022-07-18 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-4409:
--

Assignee: liujinhui

> Improve LockManager wait logic when catch exception
> ---
>
> Key: HUDI-4409
> URL: https://issues.apache.org/jira/browse/HUDI-4409
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> //public void lock() {
>   if 
> (writeConfig.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl())
>  {
> LockProvider lockProvider = getLockProvider();
> int retryCount = 0;
> boolean acquired = false;
> while (retryCount <= maxRetries) {
>   try {
> acquired = 
> lockProvider.tryLock(writeConfig.getLockAcquireWaitTimeoutInMs(), 
> TimeUnit.MILLISECONDS);
> if (acquired) {
>   break;
> }
> LOG.info("Retrying to acquire lock...");
> Thread.sleep(maxWaitTimeInMs);
>   } catch (HoodieLockException | InterruptedException e) {
> if (retryCount >= maxRetries) {
>   throw new HoodieLockException("Unable to acquire lock, lock object 
> ", e);
> }
>   } finally {
> retryCount++;
>   }
> }
> if (!acquired) {
>   throw new HoodieLockException("Unable to acquire lock, lock object " + 
> lockProvider.getLock());
> }
>   }
> } {code}
> We should put sleep in catch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-4409) Improve LockManager wait logic when catch exception

2022-07-18 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-4409.
--
Resolution: Done

> Improve LockManager wait logic when catch exception
> ---
>
> Key: HUDI-4409
> URL: https://issues.apache.org/jira/browse/HUDI-4409
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> {code:java}
> //public void lock() {
>   if 
> (writeConfig.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl())
>  {
> LockProvider lockProvider = getLockProvider();
> int retryCount = 0;
> boolean acquired = false;
> while (retryCount <= maxRetries) {
>   try {
> acquired = 
> lockProvider.tryLock(writeConfig.getLockAcquireWaitTimeoutInMs(), 
> TimeUnit.MILLISECONDS);
> if (acquired) {
>   break;
> }
> LOG.info("Retrying to acquire lock...");
> Thread.sleep(maxWaitTimeInMs);
>   } catch (HoodieLockException | InterruptedException e) {
> if (retryCount >= maxRetries) {
>   throw new HoodieLockException("Unable to acquire lock, lock object 
> ", e);
> }
>   } finally {
> retryCount++;
>   }
> }
> if (!acquired) {
>   throw new HoodieLockException("Unable to acquire lock, lock object " + 
> lockProvider.getLock());
> }
>   }
> } {code}
> We should put sleep in catch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-4409) Improve LockManager wait logic when catch exception

2022-07-18 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-4409:
---
Fix Version/s: 0.12.0

> Improve LockManager wait logic when catch exception
> ---
>
> Key: HUDI-4409
> URL: https://issues.apache.org/jira/browse/HUDI-4409
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.0
>
>
> {code:java}
> //public void lock() {
>   if 
> (writeConfig.getWriteConcurrencyMode().supportsOptimisticConcurrencyControl())
>  {
> LockProvider lockProvider = getLockProvider();
> int retryCount = 0;
> boolean acquired = false;
> while (retryCount <= maxRetries) {
>   try {
> acquired = 
> lockProvider.tryLock(writeConfig.getLockAcquireWaitTimeoutInMs(), 
> TimeUnit.MILLISECONDS);
> if (acquired) {
>   break;
> }
> LOG.info("Retrying to acquire lock...");
> Thread.sleep(maxWaitTimeInMs);
>   } catch (HoodieLockException | InterruptedException e) {
> if (retryCount >= maxRetries) {
>   throw new HoodieLockException("Unable to acquire lock, lock object 
> ", e);
> }
>   } finally {
> retryCount++;
>   }
> }
> if (!acquired) {
>   throw new HoodieLockException("Unable to acquire lock, lock object " + 
> lockProvider.getLock());
> }
>   }
> } {code}
> We should put sleep in catch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[hudi] branch master updated: [HUDI-4409] Improve LockManager wait logic when catch exception (#6122)

2022-07-18 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 1959b843b7 [HUDI-4409] Improve LockManager wait logic when catch 
exception (#6122)
1959b843b7 is described below

commit 1959b843b706babed8c16ee31c6fc266871d709f
Author: liujinhui <965147...@qq.com>
AuthorDate: Mon Jul 18 22:45:52 2022 +0800

[HUDI-4409] Improve LockManager wait logic when catch exception (#6122)
---
 .../java/org/apache/hudi/client/transaction/lock/LockManager.java| 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java
index ca15c4fdc2..6ebae44fd4 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/transaction/lock/LockManager.java
@@ -74,6 +74,11 @@ public class LockManager implements Serializable, 
AutoCloseable {
   if (retryCount >= maxRetries) {
 throw new HoodieLockException("Unable to acquire lock, lock object 
", e);
   }
+  try {
+Thread.sleep(maxWaitTimeInMs);
+  } catch (InterruptedException ex) {
+// ignore InterruptedException here
+  }
 } finally {
   retryCount++;
 }



[hudi] branch master updated (0ff34b6974 -> 7689e62cd9)

2022-06-17 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 0ff34b6974 [HUDI-4214] improve repeat init write schema in 
ExpressionPayload (#5820)
 add 7689e62cd9 [HUDI-4265] Deprecate useless targetTableName parameter in 
HoodieMultiTableDeltaStreamer (#5883)

No new revisions were added by this update.

Summary of changes:
 .../deltastreamer/HoodieMultiTableDeltaStreamer.java| 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)



[hudi] branch master updated: [HUDI-4218] [HUDI-4218] Expose the real exception information when an exception occurs in the tableExists method (#5827)

2022-06-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c291b05699 [HUDI-4218] [HUDI-4218] Expose the real exception 
information when an exception occurs in the tableExists method (#5827)
c291b05699 is described below

commit c291b056996f7c5c2c25ad75f5ac57dd64028327
Author: 董可伦 
AuthorDate: Wed Jun 15 18:10:35 2022 +0800

[HUDI-4218] [HUDI-4218] Expose the real exception information when an 
exception occurs in the tableExists method (#5827)
---
 hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
index f389695f7b..175ba3d66f 100644
--- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
+++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
@@ -403,7 +403,7 @@ public class UtilHelpers {
   
statement.setQueryTimeout(Integer.parseInt(options.get(JDBCOptions.JDBC_QUERY_TIMEOUT(;
   statement.executeQuery();
 } catch (SQLException e) {
-  return false;
+  throw new HoodieException(e);
 }
 return true;
   }



[hudi] branch asf-site updated: [MINOR] Fix incorrect full-width comma usage in the doc DDL demo (#5721)

2022-05-31 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9761fde718 [MINOR] Fix incorrect full-width comma usage in the doc DDL 
demo (#5721)
9761fde718 is described below

commit 9761fde718642238705833ea1b4b0cc5930634f1
Author: 木木夕120 
AuthorDate: Tue May 31 19:58:59 2022 +0800

[MINOR] Fix incorrect full-width comma usage in the doc DDL demo (#5721)
---
 website/docs/table_management.md  | 2 +-
 website/versioned_docs/version-0.10.0/table_management.md | 2 +-
 website/versioned_docs/version-0.10.1/table_management.md | 2 +-
 website/versioned_docs/version-0.11.0/table_management.md | 2 +-
 website/versioned_docs/version-0.9.0/quick-start-guide.md | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/website/docs/table_management.md b/website/docs/table_management.md
index 92cb6092aa..762b7c5916 100644
--- a/website/docs/table_management.md
+++ b/website/docs/table_management.md
@@ -82,7 +82,7 @@ Here is an example of creating a COW partitioned table.
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
-dt string,
+dt string,
 hh string  
 ) using hudi
 options (
diff --git a/website/versioned_docs/version-0.10.0/table_management.md 
b/website/versioned_docs/version-0.10.0/table_management.md
index 76c02edc6d..ad7af11c55 100644
--- a/website/versioned_docs/version-0.10.0/table_management.md
+++ b/website/versioned_docs/version-0.10.0/table_management.md
@@ -82,7 +82,7 @@ Here is an example of creating a COW partitioned table.
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
-dt string,
+dt string,
 hh string  
 ) using hudi
 options (
diff --git a/website/versioned_docs/version-0.10.1/table_management.md 
b/website/versioned_docs/version-0.10.1/table_management.md
index 76c02edc6d..ad7af11c55 100644
--- a/website/versioned_docs/version-0.10.1/table_management.md
+++ b/website/versioned_docs/version-0.10.1/table_management.md
@@ -82,7 +82,7 @@ Here is an example of creating a COW partitioned table.
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
-dt string,
+dt string,
 hh string  
 ) using hudi
 options (
diff --git a/website/versioned_docs/version-0.11.0/table_management.md 
b/website/versioned_docs/version-0.11.0/table_management.md
index 92cb6092aa..762b7c5916 100644
--- a/website/versioned_docs/version-0.11.0/table_management.md
+++ b/website/versioned_docs/version-0.11.0/table_management.md
@@ -82,7 +82,7 @@ Here is an example of creating a COW partitioned table.
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
-dt string,
+dt string,
 hh string  
 ) using hudi
 options (
diff --git a/website/versioned_docs/version-0.9.0/quick-start-guide.md 
b/website/versioned_docs/version-0.9.0/quick-start-guide.md
index 7196332003..6a7e1cc7c8 100644
--- a/website/versioned_docs/version-0.9.0/quick-start-guide.md
+++ b/website/versioned_docs/version-0.9.0/quick-start-guide.md
@@ -251,7 +251,7 @@ Here is an example of creating an external COW partitioned 
table.
 create table if not exists hudi_table_p0 (
 id bigint,
 name string,
-dt string,
+dt string,
 hh string  
 ) using hudi
 location '/tmp/hudi/hudi_table_p0'



[hudi] branch master updated: [MINOR] Minor fixes to exception log and removing unwanted metrics flush in integ test (#5646)

2022-05-20 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d02b1fd3c [MINOR] Minor fixes to exception log and removing unwanted 
metrics flush in integ test (#5646)
7d02b1fd3c is described below

commit 7d02b1fd3c74abfbd118f69a10a8c106cc900a3e
Author: Sivabalan Narayanan 
AuthorDate: Fri May 20 19:27:35 2022 -0400

[MINOR] Minor fixes to exception log and removing unwanted metrics flush in 
integ test (#5646)
---
 .../org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java| 3 ---
 .../main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java   | 2 +-
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git 
a/hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java
 
b/hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java
index 0183f52c2a..ab80df0d6a 100644
--- 
a/hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java
+++ 
b/hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java
@@ -117,9 +117,6 @@ public class DagScheduler {
   if (curRound < workflowDag.getRounds()) {
 new 
DelayNode(workflowDag.getIntermittentDelayMins()).execute(executionContext, 
curRound);
   }
-
-  // After each level, report and flush the metrics
-  Metrics.flush();
 } while (curRound++ < workflowDag.getRounds());
 log.info("Finished workloads");
   }
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
index 8f44b8b7d0..a1a804b9ed 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
@@ -846,7 +846,7 @@ public class DeltaSync implements Serializable {
   }
   return newWriteSchema;
 } catch (Exception e) {
-  throw new HoodieException("Failed to fetch schema from table.");
+  throw new HoodieException("Failed to fetch schema from table ", e);
 }
   }
 



[hudi] branch master updated: [HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ configuration (#5287)

2022-05-07 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9625d16937 [HUDI-3849] AvroDeserializer supports 
AVRO_REBASE_MODE_IN_READ configuration (#5287)
9625d16937 is described below

commit 9625d16937954a54420384b41f964e48cba8cc2f
Author: cxzl25 
AuthorDate: Sat May 7 15:39:14 2022 +0800

[HUDI-3849] AvroDeserializer supports AVRO_REBASE_MODE_IN_READ 
configuration (#5287)
---
 .../org/apache/spark/sql/avro/HoodieSpark3_2AvroDeserializer.scala   | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/avro/HoodieSpark3_2AvroDeserializer.scala
 
b/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/avro/HoodieSpark3_2AvroDeserializer.scala
index 0275e2f635..d839c73032 100644
--- 
a/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/avro/HoodieSpark3_2AvroDeserializer.scala
+++ 
b/hudi-spark-datasource/hudi-spark3/src/main/scala/org/apache/spark/sql/avro/HoodieSpark3_2AvroDeserializer.scala
@@ -18,13 +18,14 @@
 package org.apache.spark.sql.avro
 
 import org.apache.avro.Schema
-import org.apache.hudi.HoodieSparkUtils
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.DataType
 
 class HoodieSpark3_2AvroDeserializer(rootAvroType: Schema, rootCatalystType: 
DataType)
   extends HoodieAvroDeserializer {
 
-  private val avroDeserializer = new AvroDeserializer(rootAvroType, 
rootCatalystType, "EXCEPTION")
+  private val avroDeserializer = new AvroDeserializer(rootAvroType, 
rootCatalystType,
+SQLConf.get.getConf(SQLConf.AVRO_REBASE_MODE_IN_READ))
 
   def deserialize(data: Any): Option[Any] = avroDeserializer.deserialize(data)
 }



[jira] [Closed] (HUDI-184) Integrate Hudi with Apache Flink

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-184.
-
Resolution: Implemented

This feature has been tracked via 
https://issues.apache.org/jira/browse/HUDI-1521

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (HUDI-184) Integrate Hudi with Apache Flink

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reopened HUDI-184:
---

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-609) Implement a Flink specific HoodieIndex

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-609.
-
Resolution: Won't Do

> Implement a Flink specific HoodieIndex
> --
>
> Key: HUDI-609
> URL: https://issues.apache.org/jira/browse/HUDI-609
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: vinoyang
>    Assignee: vinoyang
>Priority: Major
>
> Indexing is a key step in hudi's write flow. {{HoodieIndex}} is the super 
> abstract class of all the implement of the index. Currently, {{HoodieIndex}} 
> couples with Spark in the design. However, HUDI-538 is doing the restructure 
> for hudi-client so that hudi can be decoupled with Spark. After that, we 
> would get an engine-irrelevant implementation of {{HoodieIndex}}. And 
> extending that class, we could implement a Flink specific index.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-608) Implement a flink datastream execution context

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-608.
-
Resolution: Won't Do

> Implement a flink datastream execution context
> --
>
> Key: HUDI-608
> URL: https://issues.apache.org/jira/browse/HUDI-608
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: vinoyang
>    Assignee: vinoyang
>Priority: Major
>
> Currently {{HoodieWriteClient}} does something like 
> `hoodieRecordRDD.map().sort()` internally.. if we want to support Flink 
> DataStream as the object, then we need to somehow define an abstraction like 
> {{HoodieExecutionContext}}  which will have a common set of map(T) -> T, 
> filter(), repartition() methods. There will be subclass like 
> {{HoodieFlinkDataStreamExecutionContext}} which will implement it 
> in Flink specific ways and hand back the transformed T object.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (HUDI-184) Integrate Hudi with Apache Flink

2022-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-184.
-
Resolution: Won't Do

> Integrate Hudi with Apache Flink
> 
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: writer-core
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2418) Support HiveSchemaProvider

2021-12-02 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2418:
---
Summary: Support HiveSchemaProvider   (was: add HiveSchemaProvider )

> Support HiveSchemaProvider 
> ---
>
> Key: HUDI-2418
> URL: https://issues.apache.org/jira/browse/HUDI-2418
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> when using DeltaStreamer to migrate exist Hive table, it better to have a 
> HiveSchemaProvider instead of avro schema file.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[hudi] branch asf-site updated: [MINOR] Fix RocketMQ logo in landing page (#4061)

2021-11-21 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new c57cc91  [MINOR] Fix RocketMQ logo in landing page (#4061)
c57cc91 is described below

commit c57cc91bb7d5c49461713cffcc2bf461799a694a
Author: leesf <490081...@qq.com>
AuthorDate: Mon Nov 22 10:16:10 2021 +0800

[MINOR] Fix RocketMQ logo in landing page (#4061)
---
 website/static/assets/images/hudi-lake.png | Bin 150248 -> 152033 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/website/static/assets/images/hudi-lake.png 
b/website/static/assets/images/hudi-lake.png
index 103c040..4e6f9cf 100644
Binary files a/website/static/assets/images/hudi-lake.png and 
b/website/static/assets/images/hudi-lake.png differ


[hudi] branch master updated (aec5d11 -> 4d884bd)

2021-11-17 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from aec5d11  Check --source-avro-schema-path  parameter (#3987)
 add 4d884bd  [MINOR] Fix typo,'Hooide' corrected to 'Hoodie' (#4007)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/hudi/HoodieOptionConfig.scala | 2 +-
 .../org/apache/spark/sql/hudi/command/CreateHoodieTableCommand.scala  | 2 +-
 .../test/scala/org/apache/spark/sql/hudi/TestHoodieOptionConfig.scala | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)


[jira] [Created] (HUDI-2699) Remove duplicated zookeeper with tests classifier exists in bundles

2021-11-05 Thread vinoyang (Jira)
vinoyang created HUDI-2699:
--

 Summary: Remove duplicated zookeeper with tests classifier exists 
in bundles
 Key: HUDI-2699
 URL: https://issues.apache.org/jira/browse/HUDI-2699
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2643) Remove duplicated hbase-common with tests classifier exists in bundles

2021-11-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2643.
--
Resolution: Done

13b637ddc3ab9fba51e303cfa0343a496e476d26

> Remove duplicated hbase-common with tests classifier exists in bundles
> --
>
> Key: HUDI-2643
> URL: https://issues.apache.org/jira/browse/HUDI-2643
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: vinoyang
>    Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2643) Remove duplicated hbase-common with tests classifier exists in bundles

2021-11-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2643:
---
Fix Version/s: 0.10.0

> Remove duplicated hbase-common with tests classifier exists in bundles
> --
>
> Key: HUDI-2643
> URL: https://issues.apache.org/jira/browse/HUDI-2643
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: vinoyang
>    Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2643] Remove duplicated hbase-common with tests classifier exists in bundles (#3886)

2021-11-01 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 13b637d  [HUDI-2643] Remove duplicated hbase-common with tests 
classifier exists in bundles (#3886)
13b637d is described below

commit 13b637ddc3ab9fba51e303cfa0343a496e476d26
Author: vinoyang 
AuthorDate: Mon Nov 1 20:11:00 2021 +0800

[HUDI-2643] Remove duplicated hbase-common with tests classifier exists in 
bundles (#3886)
---
 dependencies/hudi-flink-bundle_2.11.txt |  1 -
 dependencies/hudi-flink-bundle_2.12.txt |  7 +++---
 dependencies/hudi-hadoop-mr-bundle.txt  |  8 +--
 dependencies/hudi-integ-test-bundle.txt | 35 -
 dependencies/hudi-spark-bundle_2.11.txt |  1 -
 dependencies/hudi-spark-bundle_2.12.txt |  1 -
 dependencies/hudi-spark3-bundle_2.12.txt|  1 -
 dependencies/hudi-utilities-bundle_2.11.txt |  1 -
 dependencies/hudi-utilities-bundle_2.12.txt |  7 +++---
 packaging/hudi-flink-bundle/pom.xml |  4 
 packaging/hudi-hadoop-mr-bundle/pom.xml | 10 +
 packaging/hudi-spark-bundle/pom.xml |  4 
 packaging/hudi-utilities-bundle/pom.xml |  4 
 13 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/dependencies/hudi-flink-bundle_2.11.txt 
b/dependencies/hudi-flink-bundle_2.11.txt
index 9252d0a..7ece1e8 100644
--- a/dependencies/hudi-flink-bundle_2.11.txt
+++ b/dependencies/hudi-flink-bundle_2.11.txt
@@ -133,7 +133,6 @@ hamcrest-core/org.hamcrest/1.3//hamcrest-core-1.3.jar
 hbase-annotations/org.apache.hbase/1.2.3//hbase-annotations-1.2.3.jar
 hbase-client/org.apache.hbase/1.2.3//hbase-client-1.2.3.jar
 hbase-common/org.apache.hbase/1.2.3//hbase-common-1.2.3.jar
-hbase-common/org.apache.hbase/1.2.3/tests/hbase-common-1.2.3-tests.jar
 hbase-hadoop-compat/org.apache.hbase/1.2.3//hbase-hadoop-compat-1.2.3.jar
 hbase-hadoop2-compat/org.apache.hbase/1.2.3//hbase-hadoop2-compat-1.2.3.jar
 hbase-prefix-tree/org.apache.hbase/1.2.3//hbase-prefix-tree-1.2.3.jar
diff --git a/dependencies/hudi-flink-bundle_2.12.txt 
b/dependencies/hudi-flink-bundle_2.12.txt
index 84eacdc..d7566b5 100644
--- a/dependencies/hudi-flink-bundle_2.12.txt
+++ b/dependencies/hudi-flink-bundle_2.12.txt
@@ -134,7 +134,6 @@ hamcrest-core/org.hamcrest/1.3//hamcrest-core-1.3.jar
 hbase-annotations/org.apache.hbase/1.2.3//hbase-annotations-1.2.3.jar
 hbase-client/org.apache.hbase/1.2.3//hbase-client-1.2.3.jar
 hbase-common/org.apache.hbase/1.2.3//hbase-common-1.2.3.jar
-hbase-common/org.apache.hbase/1.2.3/tests/hbase-common-1.2.3-tests.jar
 hbase-hadoop-compat/org.apache.hbase/1.2.3//hbase-hadoop-compat-1.2.3.jar
 hbase-hadoop2-compat/org.apache.hbase/1.2.3//hbase-hadoop2-compat-1.2.3.jar
 hbase-prefix-tree/org.apache.hbase/1.2.3//hbase-prefix-tree-1.2.3.jar
@@ -163,10 +162,10 @@ 
htrace-core/org.apache.htrace/3.1.0-incubating//htrace-core-3.1.0-incubating.jar
 httpclient/org.apache.httpcomponents/4.4.1//httpclient-4.4.1.jar
 httpcore/org.apache.httpcomponents/4.4.1//httpcore-4.4.1.jar
 ivy/org.apache.ivy/2.4.0//ivy-2.4.0.jar
-jackson-annotations/com.fasterxml.jackson.core/2.6.7//jackson-annotations-2.6.7.jar
+jackson-annotations/com.fasterxml.jackson.core/2.10.0//jackson-annotations-2.10.0.jar
 jackson-core-asl/org.codehaus.jackson/1.9.13//jackson-core-asl-1.9.13.jar
-jackson-core/com.fasterxml.jackson.core/2.6.7//jackson-core-2.6.7.jar
-jackson-databind/com.fasterxml.jackson.core/2.6.7.3//jackson-databind-2.6.7.3.jar
+jackson-core/com.fasterxml.jackson.core/2.10.0//jackson-core-2.10.0.jar
+jackson-databind/com.fasterxml.jackson.core/2.10.0//jackson-databind-2.10.0.jar
 jackson-jaxrs/org.codehaus.jackson/1.9.13//jackson-jaxrs-1.9.13.jar
 jackson-mapper-asl/org.codehaus.jackson/1.9.13//jackson-mapper-asl-1.9.13.jar
 jackson-xc/org.codehaus.jackson/1.9.13//jackson-xc-1.9.13.jar
diff --git a/dependencies/hudi-hadoop-mr-bundle.txt 
b/dependencies/hudi-hadoop-mr-bundle.txt
index a9c4afe..bcc2659 100644
--- a/dependencies/hudi-hadoop-mr-bundle.txt
+++ b/dependencies/hudi-hadoop-mr-bundle.txt
@@ -70,7 +70,6 @@ hamcrest-core/org.hamcrest/1.3//hamcrest-core-1.3.jar
 hbase-annotations/org.apache.hbase/1.2.3//hbase-annotations-1.2.3.jar
 hbase-client/org.apache.hbase/1.2.3//hbase-client-1.2.3.jar
 hbase-common/org.apache.hbase/1.2.3//hbase-common-1.2.3.jar
-hbase-common/org.apache.hbase/1.2.3/tests/hbase-common-1.2.3-tests.jar
 hbase-hadoop-compat/org.apache.hbase/1.2.3//hbase-hadoop-compat-1.2.3.jar
 hbase-hadoop2-compat/org.apache.hbase/1.2.3//hbase-hadoop2-compat-1.2.3.jar
 hbase-prefix-tree/org.apache.hbase/1.2.3//hbase-prefix-tree-1.2.3.jar
@@ -85,7 +84,9 @@ 
jackson-annotations/com.fasterxml.jackson.core/2.6.7//jackson-annotations-2.6.7.
 jackson-core-asl/org.codehaus.jackson/1.9.13//jackson-core-asl-1.9.13.jar
 jackson-core/com.fasterxml.jackson.core/2.6.7//jackson-core-2.6.7.jar

[jira] [Created] (HUDI-2643) Remove duplicated hbase-common with tests classifier exists in bundles

2021-10-28 Thread vinoyang (Jira)
vinoyang created HUDI-2643:
--

 Summary: Remove duplicated hbase-common with tests classifier 
exists in bundles
 Key: HUDI-2643
 URL: https://issues.apache.org/jira/browse/HUDI-2643
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2614.
--
Resolution: Done

b1c4acf0aeb0f3d650c8e704828b1c2b0d2b5b40

> Remove duplicated hadoop-hdfs with tests classifier exists in bundles
> -
>
> Key: HUDI-2614
> URL: https://issues.apache.org/jira/browse/HUDI-2614
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: vinoyang
>    Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-26 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2614:
---
Fix Version/s: 0.10.0

> Remove duplicated hadoop-hdfs with tests classifier exists in bundles
> -
>
> Key: HUDI-2614
> URL: https://issues.apache.org/jira/browse/HUDI-2614
> Project: Apache Hudi
>  Issue Type: Sub-task
>    Reporter: vinoyang
>    Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (e3fc746 -> b1c4acf)

2021-10-26 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from e3fc746  [HUDI-2625] Revert "[HUDI-2005] Avoiding direct fs calls in 
HoodieLogFileReader (#3757)" (#3863)
 add b1c4acf  [HUDI-2614] Remove duplicated hadoop-hdfs with tests 
classifier exists in bundles (#3864)

No new revisions were added by this update.

Summary of changes:
 dependencies/hudi-flink-bundle_2.11.txt |  1 -
 dependencies/hudi-flink-bundle_2.12.txt | 13 +--
 dependencies/hudi-hive-sync-bundle.txt  |  5 
 dependencies/hudi-integ-test-bundle.txt | 36 +
 dependencies/hudi-kafka-connect-bundle.txt  |  1 -
 dependencies/hudi-spark-bundle_2.11.txt |  1 -
 dependencies/hudi-spark-bundle_2.12.txt |  4 +---
 dependencies/hudi-spark3-bundle_2.12.txt|  4 +---
 dependencies/hudi-utilities-bundle_2.11.txt |  1 -
 dependencies/hudi-utilities-bundle_2.12.txt | 10 
 hudi-client/hudi-client-common/pom.xml  |  1 +
 hudi-client/hudi-java-client/pom.xml| 22 ++
 hudi-integ-test/pom.xml |  1 +
 hudi-spark-datasource/hudi-spark/pom.xml| 22 ++
 hudi-sync/hudi-hive-sync/pom.xml|  1 +
 packaging/hudi-integ-test-bundle/pom.xml|  1 +
 pom.xml |  1 +
 17 files changed, 87 insertions(+), 38 deletions(-)


[jira] [Created] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-24 Thread vinoyang (Jira)
vinoyang created HUDI-2614:
--

 Summary: Remove duplicated hadoop-hdfs with tests classifier 
exists in bundles
 Key: HUDI-2614
 URL: https://issues.apache.org/jira/browse/HUDI-2614
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-24 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2600:
---
Fix Version/s: 0.10.0

> Remove duplicated hadoop-common with tests classifier exists in bundles
> ---
>
> Key: HUDI-2600
> URL: https://issues.apache.org/jira/browse/HUDI-2600
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We found many duplicated dependencies in the generated dependency list, 
> `hadoop-common` is one of them:
> {code:java}
> hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-24 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2600.
--
Resolution: Done

220bf6a7e6f5cdf0efbbbee9df6852a8b2288570

> Remove duplicated hadoop-common with tests classifier exists in bundles
> ---
>
> Key: HUDI-2600
> URL: https://issues.apache.org/jira/browse/HUDI-2600
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We found many duplicated dependencies in the generated dependency list, 
> `hadoop-common` is one of them:
> {code:java}
> hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles (#3847)

2021-10-24 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 220bf6a  [HUDI-2600] Remove duplicated hadoop-common with tests 
classifier exists in bundles (#3847)
220bf6a is described below

commit 220bf6a7e6f5cdf0efbbbee9df6852a8b2288570
Author: vinoyang 
AuthorDate: Mon Oct 25 13:45:28 2021 +0800

[HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in 
bundles (#3847)
---
 dependencies/hudi-flink-bundle_2.11.txt  | 6 +++---
 dependencies/hudi-hive-sync-bundle.txt   | 7 +--
 dependencies/hudi-kafka-connect-bundle.txt   | 3 +--
 dependencies/hudi-spark-bundle_2.11.txt  | 3 +--
 dependencies/hudi-timeline-server-bundle.txt | 1 -
 dependencies/hudi-utilities-bundle_2.11.txt  | 3 +--
 hudi-client/hudi-client-common/pom.xml   | 1 +
 hudi-sync/hudi-hive-sync/pom.xml | 1 +
 hudi-timeline-service/pom.xml| 1 +
 9 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/dependencies/hudi-flink-bundle_2.11.txt 
b/dependencies/hudi-flink-bundle_2.11.txt
index b97995c..4414594 100644
--- a/dependencies/hudi-flink-bundle_2.11.txt
+++ b/dependencies/hudi-flink-bundle_2.11.txt
@@ -64,7 +64,7 @@ commons-lang/commons-lang/2.6//commons-lang-2.6.jar
 commons-lang3/org.apache.commons/3.1//commons-lang3-3.1.jar
 commons-logging/commons-logging/1.2//commons-logging-1.2.jar
 commons-math/org.apache.commons/2.2//commons-math-2.2.jar
-commons-math3/org.apache.commons/3.1.1//commons-math3-3.1.1.jar
+commons-math3/org.apache.commons/3.5//commons-math3-3.5.jar
 commons-net/commons-net/3.1//commons-net-3.1.jar
 commons-pool/commons-pool/1.6//commons-pool-1.6.jar
 config/com.typesafe/1.3.3//config-1.3.3.jar
@@ -107,6 +107,7 @@ 
force-shading/org.apache.flink/1.13.1//force-shading-1.13.1.jar
 grizzled-slf4j_2.11/org.clapper/1.3.2//grizzled-slf4j_2.11-1.3.2.jar
 groovy-all/org.codehaus.groovy/2.4.4//groovy-all-2.4.4.jar
 gson/com.google.code.gson/2.3.1//gson-2.3.1.jar
+guava/com.google.guava/12.0.1//guava-12.0.1.jar
 
guice-assistedinject/com.google.inject.extensions/3.0//guice-assistedinject-3.0.jar
 guice-servlet/com.google.inject.extensions/3.0//guice-servlet-3.0.jar
 guice/com.google.inject/3.0//guice-3.0.jar
@@ -114,7 +115,6 @@ 
hadoop-annotations/org.apache.hadoop/2.7.3//hadoop-annotations-2.7.3.jar
 hadoop-auth/org.apache.hadoop/2.7.3//hadoop-auth-2.7.3.jar
 hadoop-client/org.apache.hadoop/2.7.3//hadoop-client-2.7.3.jar
 hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
-hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
 hadoop-hdfs/org.apache.hadoop/2.7.3//hadoop-hdfs-2.7.3.jar
 hadoop-hdfs/org.apache.hadoop/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
 
hadoop-mapreduce-client-app/org.apache.hadoop/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
@@ -132,7 +132,7 @@ 
hadoop-yarn-server-resourcemanager/org.apache.hadoop/2.7.2//hadoop-yarn-server-r
 
hadoop-yarn-server-web-proxy/org.apache.hadoop/2.7.2//hadoop-yarn-server-web-proxy-2.7.2.jar
 hamcrest-core/org.hamcrest/1.3//hamcrest-core-1.3.jar
 hbase-annotations/org.apache.hbase/1.2.3//hbase-annotations-1.2.3.jar
-hbase-client/org.apache.hbase/1.1.1//hbase-client-1.1.1.jar
+hbase-client/org.apache.hbase/1.2.3//hbase-client-1.2.3.jar
 hbase-common/org.apache.hbase/1.2.3//hbase-common-1.2.3.jar
 hbase-common/org.apache.hbase/1.2.3/tests/hbase-common-1.2.3-tests.jar
 hbase-hadoop-compat/org.apache.hbase/1.2.3//hbase-hadoop-compat-1.2.3.jar
diff --git a/dependencies/hudi-hive-sync-bundle.txt 
b/dependencies/hudi-hive-sync-bundle.txt
index aefcfbb..f80ee31 100644
--- a/dependencies/hudi-hive-sync-bundle.txt
+++ b/dependencies/hudi-hive-sync-bundle.txt
@@ -56,7 +56,6 @@ 
hadoop-annotations/org.apache.hadoop/2.7.3//hadoop-annotations-2.7.3.jar
 hadoop-auth/org.apache.hadoop/2.7.3//hadoop-auth-2.7.3.jar
 hadoop-client/org.apache.hadoop/2.7.3//hadoop-client-2.7.3.jar
 hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
-hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
 hadoop-hdfs/org.apache.hadoop/2.7.3//hadoop-hdfs-2.7.3.jar
 hadoop-hdfs/org.apache.hadoop/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
 
hadoop-mapreduce-client-app/org.apache.hadoop/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
@@ -87,9 +86,7 @@ 
jackson-annotations/com.fasterxml.jackson.core/2.6.7//jackson-annotations-2.6.7.
 jackson-core-asl/org.codehaus.jackson/1.9.13//jackson-core-asl-1.9.13.jar
 jackson-core/com.fasterxml.jackson.core/2.6.7//jackson-core-2.6.7.jar
 
jackson-databind/com.fasterxml.jackson.core/2.6.7.3//jackson-databind-2.6.7.3.jar
-jackson-jaxrs/org.codehaus.jackson/1.9.13//jackson-jaxrs-1.9.13.jar
 jackson-mapper-asl/org.codehaus.jackson/1.9.13//jackson-mapper-asl-1.9.13.jar
-jackson-xc/org.codehaus.jackson/1.9.13//jackson-xc-1.9.13.jar
 jamon-runtime/org.jamon/2.4.1//jamon

[hudi] branch master updated: [MINOR] Show source table operator details on the flink web when reading hudi table (#3842)

2021-10-24 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 91845e2  [MINOR] Show source table operator details on the flink web 
when reading hudi table (#3842)
91845e2 is described below

commit 91845e241da242cede95f705b0637331ce9222ff
Author: mincwang <33626973+mincw...@users.noreply.github.com>
AuthorDate: Sun Oct 24 23:18:01 2021 +0800

[MINOR] Show source table operator details on the flink web when reading 
hudi table (#3842)
---
 .../java/org/apache/hudi/table/HoodieTableSource.java | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java 
b/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
index 4e193fa..f0dbffd 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java
@@ -180,7 +180,7 @@ public class HoodieTableSource implements
   conf, FilePathUtils.toFlinkPath(path), 
maxCompactionMemoryInBytes, getRequiredPartitionPaths());
   InputFormat inputFormat = getInputFormat(true);
   OneInputStreamOperatorFactory 
factory = StreamReadOperator.factory((MergeOnReadInputFormat) inputFormat);
-  SingleOutputStreamOperator source = 
execEnv.addSource(monitoringFunction, "split_monitor")
+  SingleOutputStreamOperator source = 
execEnv.addSource(monitoringFunction, getSourceOperatorName("split_monitor"))
   .setParallelism(1)
   .transform("split_reader", typeInfo, factory)
   .setParallelism(conf.getInteger(FlinkOptions.READ_TASKS));
@@ -188,7 +188,7 @@ public class HoodieTableSource implements
 } else {
   InputFormatSourceFunction func = new 
InputFormatSourceFunction<>(getInputFormat(), typeInfo);
   DataStreamSource source = execEnv.addSource(func, 
asSummaryString(), typeInfo);
-  return 
source.name("bounded_source").setParallelism(conf.getInteger(FlinkOptions.READ_TASKS));
+  return 
source.name(getSourceOperatorName("bounded_source")).setParallelism(conf.getInteger(FlinkOptions.READ_TASKS));
 }
   }
 };
@@ -266,6 +266,21 @@ public class HoodieTableSource implements
 return requiredPartitions;
   }
 
+  private String getSourceOperatorName(String operatorName) {
+String[] schemaFieldNames = this.schema.getColumnNames().toArray(new 
String[0]);
+List fields = Arrays.stream(this.requiredPos)
+.mapToObj(i -> schemaFieldNames[i])
+.collect(Collectors.toList());
+StringBuilder sb = new StringBuilder();
+sb.append(operatorName)
+.append("(")
+
.append("table=").append(Collections.singletonList(conf.getString(FlinkOptions.TABLE_NAME)))
+.append(", ")
+.append("fields=").append(fields)
+.append(")");
+return sb.toString();
+  }
+
   @Nullable
   private Set getRequiredPartitionPaths() {
 if (this.requiredPartitions == null) {


[jira] [Closed] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-10-22 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2592.
--
Resolution: Fixed

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-10-22 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reopened HUDI-2592:


> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-10-22 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432925#comment-17432925
 ] 

vinoyang commented on HUDI-2592:


[~Matrix42] I have given you Jira contributor permission. Thanks for your 
contribution!

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-10-22 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2592:
--

Assignee: Matrix42

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-10-22 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2592:
---
Status: Closed  (was: Patch Available)

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (84ca981 -> 499af7c)

2021-10-22 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 84ca981  [HUDI-2553] Metadata table compaction trigger max delta 
commits (#3794)
 add 499af7c  [HUDI-2592] Fix write empty array when write.precombine.field 
is decimal type (#3837)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/avro/HoodieAvroUtils.java | 11 +++---
 .../org/apache/hudi/avro/TestHoodieAvroUtils.java  | 40 +-
 2 files changed, 39 insertions(+), 12 deletions(-)


[jira] [Created] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-22 Thread vinoyang (Jira)
vinoyang created HUDI-2600:
--

 Summary: Remove duplicated hadoop-common with tests classifier 
exists in bundles
 Key: HUDI-2600
 URL: https://issues.apache.org/jira/browse/HUDI-2600
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Release  Administrative
Reporter: vinoyang


We found many duplicated dependencies in the generated dependency list, 
`hadoop-common` is one of them:
{code:java}
hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-22 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2600:
--

Assignee: vinoyang

> Remove duplicated hadoop-common with tests classifier exists in bundles
> ---
>
> Key: HUDI-2600
> URL: https://issues.apache.org/jira/browse/HUDI-2600
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release  Administrative
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> We found many duplicated dependencies in the generated dependency list, 
> `hadoop-common` is one of them:
> {code:java}
> hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2507) Generate more dependency list file for other bundles

2021-10-21 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2507.
--
Resolution: Done

b480294e792b6344d37560587f8f6e170e210d14

> Generate more dependency list file for other bundles
> 
>
> Key: HUDI-2507
> URL: https://issues.apache.org/jira/browse/HUDI-2507
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2507) Generate more dependency list file for other bundles

2021-10-21 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2507:
---
Fix Version/s: 0.10.0

> Generate more dependency list file for other bundles
> 
>
> Key: HUDI-2507
> URL: https://issues.apache.org/jira/browse/HUDI-2507
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (aa3c4ec -> b480294)

2021-10-21 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from aa3c4ec  [HUDI-2583] Refactor TestWriteCopyOnWrite test cases (#3832)
 add b480294  [HUDI-2507] Generate more dependency list file for other 
bundles (#3773)

No new revisions were added by this update.

Summary of changes:
 .../hudi-flink-bundle_2.11.txt |   0
 .../hudi-flink-bundle_2.12.txt |   9 +-
 .../hudi-hadoop-mr-bundle.txt  |   1 -
 .../hudi-hive-sync-bundle.txt  |  14 +-
 .../hudi-integ-test-bundle.txt |  64 ++---
 .../hudi-kafka-connect-bundle.txt  | 143 +++
 .../hudi-presto-bundle.txt |   0
 .../hudi-spark-bundle_2.11.txt |   0
 .../hudi-spark-bundle_2.12.txt |   4 +-
 .../hudi-spark3-bundle_2.12.txt|  10 +-
 .../hudi-timeline-server-bundle.txt|   0
 .../hudi-utilities-bundle_2.11.txt |   0
 .../hudi-utilities-bundle_2.12.txt |   6 +-
 scripts/dependency.sh  | 155 +++--
 14 files changed, 202 insertions(+), 204 deletions(-)
 copy dev/dependencyList_hudi-flink-bundle_2.11.txt => 
dependencies/hudi-flink-bundle_2.11.txt (100%)
 rename dev/dependencyList_hudi-flink-bundle_2.11.txt => 
dependencies/hudi-flink-bundle_2.12.txt (97%)
 copy dev/dependencyList_hudi-presto-bundle.txt => 
dependencies/hudi-hadoop-mr-bundle.txt (99%)
 copy dev/dependencyList_hudi-presto-bundle.txt => 
dependencies/hudi-hive-sync-bundle.txt (91%)
 copy dev/dependencyList_hudi-utilities-bundle_2.11.txt => 
dependencies/hudi-integ-test-bundle.txt (87%)
 copy dev/dependencyList_hudi-utilities-bundle_2.11.txt => 
dependencies/hudi-kafka-connect-bundle.txt (70%)
 rename dev/dependencyList_hudi-presto-bundle.txt => 
dependencies/hudi-presto-bundle.txt (100%)
 copy dev/dependencyList_hudi-spark-bundle_2.11.txt => 
dependencies/hudi-spark-bundle_2.11.txt (100%)
 copy dev/dependencyList_hudi-spark-bundle_2.11.txt => 
dependencies/hudi-spark-bundle_2.12.txt (99%)
 rename dev/dependencyList_hudi-spark-bundle_2.11.txt => 
dependencies/hudi-spark3-bundle_2.12.txt (97%)
 rename dev/dependencyList_hudi-timeline-server-bundle.txt => 
dependencies/hudi-timeline-server-bundle.txt (100%)
 copy dev/dependencyList_hudi-utilities-bundle_2.11.txt => 
dependencies/hudi-utilities-bundle_2.11.txt (100%)
 rename dev/dependencyList_hudi-utilities-bundle_2.11.txt => 
dependencies/hudi-utilities-bundle_2.12.txt (98%)


[hudi] branch master updated: [MINOR] Fix typo, 'intance' corrected to 'instance' (#3788)

2021-10-19 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 46f0496  [MINOR] Fix typo,'intance' corrected to 'instance' (#3788)
46f0496 is described below

commit 46f0496a0838431cd8886ca882a902d801c4dfb8
Author: 董可伦 
AuthorDate: Tue Oct 19 23:16:48 2021 +0800

[MINOR] Fix typo,'intance' corrected to 'instance' (#3788)
---
 .../java/org/apache/hudi/table/action/clean/CleanActionExecutor.java| 2 +-
 .../org/apache/hudi/table/action/restore/BaseRestoreActionExecutor.java | 2 +-
 .../apache/hudi/table/action/rollback/BaseRollbackActionExecutor.java   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
index abe88b9..1b229ca 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanActionExecutor.java
@@ -211,7 +211,7 @@ public class CleanActionExecutor extends
 
   /**
* Update metadata table if available. Any update to metadata table happens 
within data table lock.
-   * @param cleanMetadata intance of {@link HoodieCleanMetadata} to be applied 
to metadata.
+   * @param cleanMetadata instance of {@link HoodieCleanMetadata} to be 
applied to metadata.
*/
   private void writeMetadata(HoodieCleanMetadata cleanMetadata) {
 try {
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/restore/BaseRestoreActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/restore/BaseRestoreActionExecutor.java
index 8b0085c..ac8f994 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/restore/BaseRestoreActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/restore/BaseRestoreActionExecutor.java
@@ -105,7 +105,7 @@ public abstract class BaseRestoreActionExecutor

[jira] [Created] (HUDI-2508) Build GA for the dependeny diff check workflow

2021-09-30 Thread vinoyang (Jira)
vinoyang created HUDI-2508:
--

 Summary: Build GA for the dependeny diff check workflow
 Key: HUDI-2508
 URL: https://issues.apache.org/jira/browse/HUDI-2508
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Usability
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2508) Build GA for the dependeny diff check workflow

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2508:
--

Assignee: vinoyang

> Build GA for the dependeny diff check workflow
> --
>
> Key: HUDI-2508
> URL: https://issues.apache.org/jira/browse/HUDI-2508
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2507) Generate more dependency list file for other bundles

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2507:
--

Assignee: vinoyang

> Generate more dependency list file for other bundles
> 
>
> Key: HUDI-2507
> URL: https://issues.apache.org/jira/browse/HUDI-2507
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2507) Generate more dependency list file for other bundles

2021-09-30 Thread vinoyang (Jira)
vinoyang created HUDI-2507:
--

 Summary: Generate more dependency list file for other bundles
 Key: HUDI-2507
 URL: https://issues.apache.org/jira/browse/HUDI-2507
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: Usability
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2506) Hudi dependency governance

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2506:
--

Assignee: vinoyang

> Hudi dependency governance
> --
>
> Key: HUDI-2506
> URL: https://issues.apache.org/jira/browse/HUDI-2506
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Usability
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2440.
--
Resolution: Done

> Add dependency change diff script for dependency governace
> --
>
> Key: HUDI-2440
> URL: https://issues.apache.org/jira/browse/HUDI-2440
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability, Utilities
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently, hudi's dependency management is chaotic, e.g. for 
> `hudi-spark-bundle_2.11`, the dependency list is here:
> {code:java}
> HikariCP/2.5.1//HikariCP-2.5.1.jar
> ST4/4.0.4//ST4-4.0.4.jar
> aircompressor/0.15//aircompressor-0.15.jar
> annotations/17.0.0//annotations-17.0.0.jar
> ant-launcher/1.9.1//ant-launcher-1.9.1.jar
> ant/1.6.5//ant-1.6.5.jar
> ant/1.9.1//ant-1.9.1.jar
> antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
> aopalliance/1.0//aopalliance-1.0.jar
> apache-curator/2.7.1//apache-curator-2.7.1.pom
> apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> api-util/1.0.0-M20//api-util-1.0.0-M20.jar
> asm/3.1//asm-3.1.jar
> avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
> avatica/1.8.0//avatica-1.8.0.jar
> avro/1.8.2//avro-1.8.2.jar
> bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
> calcite-core/1.10.0//calcite-core-1.10.0.jar
> calcite-druid/1.10.0//calcite-druid-1.10.0.jar
> calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
> commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
> commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
> commons-cli/1.2//commons-cli-1.2.jar
> commons-codec/1.4//commons-codec-1.4.jar
> commons-collections/3.2.2//commons-collections-3.2.2.jar
> commons-compiler/2.7.6//commons-compiler-2.7.6.jar
> commons-compress/1.9//commons-compress-1.9.jar
> commons-configuration/1.6//commons-configuration-1.6.jar
> commons-daemon/1.0.13//commons-daemon-1.0.13.jar
> commons-dbcp/1.4//commons-dbcp-1.4.jar
> commons-digester/1.8//commons-digester-1.8.jar
> commons-el/1.0//commons-el-1.0.jar
> commons-httpclient/3.1//commons-httpclient-3.1.jar
> commons-io/2.4//commons-io-2.4.jar
> commons-lang/2.6//commons-lang-2.6.jar
> commons-lang3/3.1//commons-lang3-3.1.jar
> commons-logging/1.2//commons-logging-1.2.jar
> commons-math/2.2//commons-math-2.2.jar
> commons-math3/3.1.1//commons-math3-3.1.1.jar
> commons-net/3.1//commons-net-3.1.jar
> commons-pool/1.5.4//commons-pool-1.5.4.jar
> curator-client/2.7.1//curator-client-2.7.1.jar
> curator-framework/2.7.1//curator-framework-2.7.1.jar
> curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
> datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
> datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
> derby/10.10.2.0//derby-10.10.2.0.jar
> disruptor/3.3.0//disruptor-3.3.0.jar
> dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
> eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
> fastutil/7.0.13//fastutil-7.0.13.jar
> findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
> fluent-hc/4.4.1//fluent-hc-4.4.1.jar
> groovy-all/2.4.4//groovy-all-2.4.4.jar
> gson/2.3.1//gson-2.3.1.jar
> guava/14.0.1//guava-14.0.1.jar
> guice-assistedinject/3.0//guice-assistedinject-3.0.jar
> guice-servlet/3.0//guice-servlet-3.0.jar
> guice/3.0//guice-3.0.jar
> hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
> hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
> hadoop-client/2.7.3//hadoop-client-2.7.3.jar
> hadoop-common/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
> hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
> hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
> hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
> hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
> hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
> hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
> hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
> hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
> hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.jar
> hadoop-yarn-registry/2.7.1//hadoop-yarn-registry

[jira] [Created] (HUDI-2506) Hudi dependency governance

2021-09-30 Thread vinoyang (Jira)
vinoyang created HUDI-2506:
--

 Summary: Hudi dependency governance
 Key: HUDI-2506
 URL: https://issues.apache.org/jira/browse/HUDI-2506
 Project: Apache Hudi
  Issue Type: Task
  Components: Usability
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reopened HUDI-2440:


> Add dependency change diff script for dependency governace
> --
>
> Key: HUDI-2440
> URL: https://issues.apache.org/jira/browse/HUDI-2440
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability, Utilities
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently, hudi's dependency management is chaotic, e.g. for 
> `hudi-spark-bundle_2.11`, the dependency list is here:
> {code:java}
> HikariCP/2.5.1//HikariCP-2.5.1.jar
> ST4/4.0.4//ST4-4.0.4.jar
> aircompressor/0.15//aircompressor-0.15.jar
> annotations/17.0.0//annotations-17.0.0.jar
> ant-launcher/1.9.1//ant-launcher-1.9.1.jar
> ant/1.6.5//ant-1.6.5.jar
> ant/1.9.1//ant-1.9.1.jar
> antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
> aopalliance/1.0//aopalliance-1.0.jar
> apache-curator/2.7.1//apache-curator-2.7.1.pom
> apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> api-util/1.0.0-M20//api-util-1.0.0-M20.jar
> asm/3.1//asm-3.1.jar
> avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
> avatica/1.8.0//avatica-1.8.0.jar
> avro/1.8.2//avro-1.8.2.jar
> bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
> calcite-core/1.10.0//calcite-core-1.10.0.jar
> calcite-druid/1.10.0//calcite-druid-1.10.0.jar
> calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
> commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
> commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
> commons-cli/1.2//commons-cli-1.2.jar
> commons-codec/1.4//commons-codec-1.4.jar
> commons-collections/3.2.2//commons-collections-3.2.2.jar
> commons-compiler/2.7.6//commons-compiler-2.7.6.jar
> commons-compress/1.9//commons-compress-1.9.jar
> commons-configuration/1.6//commons-configuration-1.6.jar
> commons-daemon/1.0.13//commons-daemon-1.0.13.jar
> commons-dbcp/1.4//commons-dbcp-1.4.jar
> commons-digester/1.8//commons-digester-1.8.jar
> commons-el/1.0//commons-el-1.0.jar
> commons-httpclient/3.1//commons-httpclient-3.1.jar
> commons-io/2.4//commons-io-2.4.jar
> commons-lang/2.6//commons-lang-2.6.jar
> commons-lang3/3.1//commons-lang3-3.1.jar
> commons-logging/1.2//commons-logging-1.2.jar
> commons-math/2.2//commons-math-2.2.jar
> commons-math3/3.1.1//commons-math3-3.1.1.jar
> commons-net/3.1//commons-net-3.1.jar
> commons-pool/1.5.4//commons-pool-1.5.4.jar
> curator-client/2.7.1//curator-client-2.7.1.jar
> curator-framework/2.7.1//curator-framework-2.7.1.jar
> curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
> datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
> datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
> derby/10.10.2.0//derby-10.10.2.0.jar
> disruptor/3.3.0//disruptor-3.3.0.jar
> dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
> eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
> fastutil/7.0.13//fastutil-7.0.13.jar
> findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
> fluent-hc/4.4.1//fluent-hc-4.4.1.jar
> groovy-all/2.4.4//groovy-all-2.4.4.jar
> gson/2.3.1//gson-2.3.1.jar
> guava/14.0.1//guava-14.0.1.jar
> guice-assistedinject/3.0//guice-assistedinject-3.0.jar
> guice-servlet/3.0//guice-servlet-3.0.jar
> guice/3.0//guice-3.0.jar
> hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
> hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
> hadoop-client/2.7.3//hadoop-client-2.7.3.jar
> hadoop-common/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
> hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
> hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
> hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
> hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
> hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
> hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
> hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
> hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
> hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.jar
> hadoop-yarn-registry/2.7.1//hadoop-yarn-registry-2.7.1.jar
> 

[jira] [Updated] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2440:
---
Parent: HUDI-2506
Issue Type: Sub-task  (was: Improvement)

> Add dependency change diff script for dependency governace
> --
>
> Key: HUDI-2440
> URL: https://issues.apache.org/jira/browse/HUDI-2440
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability, Utilities
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently, hudi's dependency management is chaotic, e.g. for 
> `hudi-spark-bundle_2.11`, the dependency list is here:
> {code:java}
> HikariCP/2.5.1//HikariCP-2.5.1.jar
> ST4/4.0.4//ST4-4.0.4.jar
> aircompressor/0.15//aircompressor-0.15.jar
> annotations/17.0.0//annotations-17.0.0.jar
> ant-launcher/1.9.1//ant-launcher-1.9.1.jar
> ant/1.6.5//ant-1.6.5.jar
> ant/1.9.1//ant-1.9.1.jar
> antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
> aopalliance/1.0//aopalliance-1.0.jar
> apache-curator/2.7.1//apache-curator-2.7.1.pom
> apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> api-util/1.0.0-M20//api-util-1.0.0-M20.jar
> asm/3.1//asm-3.1.jar
> avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
> avatica/1.8.0//avatica-1.8.0.jar
> avro/1.8.2//avro-1.8.2.jar
> bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
> calcite-core/1.10.0//calcite-core-1.10.0.jar
> calcite-druid/1.10.0//calcite-druid-1.10.0.jar
> calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
> commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
> commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
> commons-cli/1.2//commons-cli-1.2.jar
> commons-codec/1.4//commons-codec-1.4.jar
> commons-collections/3.2.2//commons-collections-3.2.2.jar
> commons-compiler/2.7.6//commons-compiler-2.7.6.jar
> commons-compress/1.9//commons-compress-1.9.jar
> commons-configuration/1.6//commons-configuration-1.6.jar
> commons-daemon/1.0.13//commons-daemon-1.0.13.jar
> commons-dbcp/1.4//commons-dbcp-1.4.jar
> commons-digester/1.8//commons-digester-1.8.jar
> commons-el/1.0//commons-el-1.0.jar
> commons-httpclient/3.1//commons-httpclient-3.1.jar
> commons-io/2.4//commons-io-2.4.jar
> commons-lang/2.6//commons-lang-2.6.jar
> commons-lang3/3.1//commons-lang3-3.1.jar
> commons-logging/1.2//commons-logging-1.2.jar
> commons-math/2.2//commons-math-2.2.jar
> commons-math3/3.1.1//commons-math3-3.1.1.jar
> commons-net/3.1//commons-net-3.1.jar
> commons-pool/1.5.4//commons-pool-1.5.4.jar
> curator-client/2.7.1//curator-client-2.7.1.jar
> curator-framework/2.7.1//curator-framework-2.7.1.jar
> curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
> datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
> datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
> derby/10.10.2.0//derby-10.10.2.0.jar
> disruptor/3.3.0//disruptor-3.3.0.jar
> dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
> eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
> fastutil/7.0.13//fastutil-7.0.13.jar
> findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
> fluent-hc/4.4.1//fluent-hc-4.4.1.jar
> groovy-all/2.4.4//groovy-all-2.4.4.jar
> gson/2.3.1//gson-2.3.1.jar
> guava/14.0.1//guava-14.0.1.jar
> guice-assistedinject/3.0//guice-assistedinject-3.0.jar
> guice-servlet/3.0//guice-servlet-3.0.jar
> guice/3.0//guice-3.0.jar
> hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
> hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
> hadoop-client/2.7.3//hadoop-client-2.7.3.jar
> hadoop-common/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
> hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
> hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
> hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
> hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
> hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
> hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
> hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
> hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
> hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.ja

[jira] [Closed] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2440.
--
Resolution: Implemented

47ed91799943271f219419cf209793a98b3f09b5

> Add dependency change diff script for dependency governace
> --
>
> Key: HUDI-2440
> URL: https://issues.apache.org/jira/browse/HUDI-2440
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability, Utilities
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently, hudi's dependency management is chaotic, e.g. for 
> `hudi-spark-bundle_2.11`, the dependency list is here:
> {code:java}
> HikariCP/2.5.1//HikariCP-2.5.1.jar
> ST4/4.0.4//ST4-4.0.4.jar
> aircompressor/0.15//aircompressor-0.15.jar
> annotations/17.0.0//annotations-17.0.0.jar
> ant-launcher/1.9.1//ant-launcher-1.9.1.jar
> ant/1.6.5//ant-1.6.5.jar
> ant/1.9.1//ant-1.9.1.jar
> antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
> aopalliance/1.0//aopalliance-1.0.jar
> apache-curator/2.7.1//apache-curator-2.7.1.pom
> apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> api-util/1.0.0-M20//api-util-1.0.0-M20.jar
> asm/3.1//asm-3.1.jar
> avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
> avatica/1.8.0//avatica-1.8.0.jar
> avro/1.8.2//avro-1.8.2.jar
> bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
> calcite-core/1.10.0//calcite-core-1.10.0.jar
> calcite-druid/1.10.0//calcite-druid-1.10.0.jar
> calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
> commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
> commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
> commons-cli/1.2//commons-cli-1.2.jar
> commons-codec/1.4//commons-codec-1.4.jar
> commons-collections/3.2.2//commons-collections-3.2.2.jar
> commons-compiler/2.7.6//commons-compiler-2.7.6.jar
> commons-compress/1.9//commons-compress-1.9.jar
> commons-configuration/1.6//commons-configuration-1.6.jar
> commons-daemon/1.0.13//commons-daemon-1.0.13.jar
> commons-dbcp/1.4//commons-dbcp-1.4.jar
> commons-digester/1.8//commons-digester-1.8.jar
> commons-el/1.0//commons-el-1.0.jar
> commons-httpclient/3.1//commons-httpclient-3.1.jar
> commons-io/2.4//commons-io-2.4.jar
> commons-lang/2.6//commons-lang-2.6.jar
> commons-lang3/3.1//commons-lang3-3.1.jar
> commons-logging/1.2//commons-logging-1.2.jar
> commons-math/2.2//commons-math-2.2.jar
> commons-math3/3.1.1//commons-math3-3.1.1.jar
> commons-net/3.1//commons-net-3.1.jar
> commons-pool/1.5.4//commons-pool-1.5.4.jar
> curator-client/2.7.1//curator-client-2.7.1.jar
> curator-framework/2.7.1//curator-framework-2.7.1.jar
> curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
> datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
> datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
> derby/10.10.2.0//derby-10.10.2.0.jar
> disruptor/3.3.0//disruptor-3.3.0.jar
> dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
> eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
> fastutil/7.0.13//fastutil-7.0.13.jar
> findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
> fluent-hc/4.4.1//fluent-hc-4.4.1.jar
> groovy-all/2.4.4//groovy-all-2.4.4.jar
> gson/2.3.1//gson-2.3.1.jar
> guava/14.0.1//guava-14.0.1.jar
> guice-assistedinject/3.0//guice-assistedinject-3.0.jar
> guice-servlet/3.0//guice-servlet-3.0.jar
> guice/3.0//guice-3.0.jar
> hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
> hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
> hadoop-client/2.7.3//hadoop-client-2.7.3.jar
> hadoop-common/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
> hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
> hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
> hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
> hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
> hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
> hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
> hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
> hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
> hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.ja

[jira] [Updated] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-30 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2440:
---
Fix Version/s: 0.10.0

> Add dependency change diff script for dependency governace
> --
>
> Key: HUDI-2440
> URL: https://issues.apache.org/jira/browse/HUDI-2440
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability, Utilities
>    Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently, hudi's dependency management is chaotic, e.g. for 
> `hudi-spark-bundle_2.11`, the dependency list is here:
> {code:java}
> HikariCP/2.5.1//HikariCP-2.5.1.jar
> ST4/4.0.4//ST4-4.0.4.jar
> aircompressor/0.15//aircompressor-0.15.jar
> annotations/17.0.0//annotations-17.0.0.jar
> ant-launcher/1.9.1//ant-launcher-1.9.1.jar
> ant/1.6.5//ant-1.6.5.jar
> ant/1.9.1//ant-1.9.1.jar
> antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
> aopalliance/1.0//aopalliance-1.0.jar
> apache-curator/2.7.1//apache-curator-2.7.1.pom
> apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> api-util/1.0.0-M20//api-util-1.0.0-M20.jar
> asm/3.1//asm-3.1.jar
> avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
> avatica/1.8.0//avatica-1.8.0.jar
> avro/1.8.2//avro-1.8.2.jar
> bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
> calcite-core/1.10.0//calcite-core-1.10.0.jar
> calcite-druid/1.10.0//calcite-druid-1.10.0.jar
> calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
> commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
> commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
> commons-cli/1.2//commons-cli-1.2.jar
> commons-codec/1.4//commons-codec-1.4.jar
> commons-collections/3.2.2//commons-collections-3.2.2.jar
> commons-compiler/2.7.6//commons-compiler-2.7.6.jar
> commons-compress/1.9//commons-compress-1.9.jar
> commons-configuration/1.6//commons-configuration-1.6.jar
> commons-daemon/1.0.13//commons-daemon-1.0.13.jar
> commons-dbcp/1.4//commons-dbcp-1.4.jar
> commons-digester/1.8//commons-digester-1.8.jar
> commons-el/1.0//commons-el-1.0.jar
> commons-httpclient/3.1//commons-httpclient-3.1.jar
> commons-io/2.4//commons-io-2.4.jar
> commons-lang/2.6//commons-lang-2.6.jar
> commons-lang3/3.1//commons-lang3-3.1.jar
> commons-logging/1.2//commons-logging-1.2.jar
> commons-math/2.2//commons-math-2.2.jar
> commons-math3/3.1.1//commons-math3-3.1.1.jar
> commons-net/3.1//commons-net-3.1.jar
> commons-pool/1.5.4//commons-pool-1.5.4.jar
> curator-client/2.7.1//curator-client-2.7.1.jar
> curator-framework/2.7.1//curator-framework-2.7.1.jar
> curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
> datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
> datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
> derby/10.10.2.0//derby-10.10.2.0.jar
> disruptor/3.3.0//disruptor-3.3.0.jar
> dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
> eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
> fastutil/7.0.13//fastutil-7.0.13.jar
> findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
> fluent-hc/4.4.1//fluent-hc-4.4.1.jar
> groovy-all/2.4.4//groovy-all-2.4.4.jar
> gson/2.3.1//gson-2.3.1.jar
> guava/14.0.1//guava-14.0.1.jar
> guice-assistedinject/3.0//guice-assistedinject-3.0.jar
> guice-servlet/3.0//guice-servlet-3.0.jar
> guice/3.0//guice-3.0.jar
> hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
> hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
> hadoop-client/2.7.3//hadoop-client-2.7.3.jar
> hadoop-common/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
> hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
> hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
> hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
> hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
> hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
> hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
> hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
> hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
> hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.jar
> hadoop-yarn-registry/2.7.1//hadoop-yarn

[hudi] branch master updated: [HUDI-2440] Add dependency change diff script for dependency governace (#3674)

2021-09-30 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 47ed917  [HUDI-2440] Add dependency change diff script for dependency 
governace (#3674)
47ed917 is described below

commit 47ed91799943271f219419cf209793a98b3f09b5
Author: vinoyang 
AuthorDate: Thu Sep 30 16:56:11 2021 +0800

[HUDI-2440] Add dependency change diff script for dependency governace 
(#3674)
---
 dev/dependencyList_hudi-flink-bundle_2.11.txt  | 296 +++
 dev/dependencyList_hudi-presto-bundle.txt  | 132 +
 dev/dependencyList_hudi-spark-bundle_2.11.txt  | 262 +
 dev/dependencyList_hudi-timeline-server-bundle.txt | 144 +
 dev/dependencyList_hudi-utilities-bundle_2.11.txt  | 324 +
 scripts/dependency.sh  | 127 
 6 files changed, 1285 insertions(+)

diff --git a/dev/dependencyList_hudi-flink-bundle_2.11.txt 
b/dev/dependencyList_hudi-flink-bundle_2.11.txt
new file mode 100644
index 000..b97995c
--- /dev/null
+++ b/dev/dependencyList_hudi-flink-bundle_2.11.txt
@@ -0,0 +1,296 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+HikariCP/com.zaxxer/2.5.1//HikariCP-2.5.1.jar
+ST4/org.antlr/4.0.4//ST4-4.0.4.jar
+aircompressor/io.airlift/0.15//aircompressor-0.15.jar
+akka-actor_2.11/com.typesafe.akka/2.5.21//akka-actor_2.11-2.5.21.jar
+akka-protobuf_2.11/com.typesafe.akka/2.5.21//akka-protobuf_2.11-2.5.21.jar
+akka-slf4j_2.11/com.typesafe.akka/2.5.21//akka-slf4j_2.11-2.5.21.jar
+akka-stream_2.11/com.typesafe.akka/2.5.21//akka-stream_2.11-2.5.21.jar
+annotations/org.jetbrains/17.0.0//annotations-17.0.0.jar
+ant-launcher/org.apache.ant/1.9.1//ant-launcher-1.9.1.jar
+ant/ant/1.6.5//ant-1.6.5.jar
+ant/org.apache.ant/1.9.1//ant-1.9.1.jar
+antlr-runtime/org.antlr/3.5.2//antlr-runtime-3.5.2.jar
+aopalliance/aopalliance/1.0//aopalliance-1.0.jar
+apache-curator/org.apache.curator/2.7.1//apache-curator-2.7.1.pom
+apacheds-i18n/org.apache.directory.server/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
+apacheds-kerberos-codec/org.apache.directory.server/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
+api-asn1-api/org.apache.directory.api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
+api-util/org.apache.directory.api/1.0.0-M20//api-util-1.0.0-M20.jar
+asm/asm/3.1//asm-3.1.jar
+audience-annotations/org.apache.yetus/0.11.0//audience-annotations-0.11.0.jar
+avatica-metrics/org.apache.calcite.avatica/1.8.0//avatica-metrics-1.8.0.jar
+avatica/org.apache.calcite.avatica/1.8.0//avatica-1.8.0.jar
+avro/org.apache.avro/1.10.0//avro-1.10.0.jar
+bijection-avro_2.11/com.twitter/0.9.7//bijection-avro_2.11-0.9.7.jar
+bijection-core_2.11/com.twitter/0.9.7//bijection-core_2.11-0.9.7.jar
+bonecp/com.jolbox/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
+calcite-core/org.apache.calcite/1.10.0//calcite-core-1.10.0.jar
+calcite-druid/org.apache.calcite/1.10.0//calcite-druid-1.10.0.jar
+calcite-linq4j/org.apache.calcite/1.10.0//calcite-linq4j-1.10.0.jar
+chill-java/com.twitter/0.7.6//chill-java-0.7.6.jar
+chill_2.11/com.twitter/0.7.6//chill_2.11-0.7.6.jar
+commons-beanutils-core/commons-beanutils/1.8.0//commons-beanutils-core-1.8.0.jar
+commons-beanutils/commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
+commons-cli/commons-cli/1.2//commons-cli-1.2.jar
+commons-codec/commons-codec/1.4//commons-codec-1.4.jar
+commons-collections/commons-collections/3.2.2//commons-collections-3.2.2.jar
+commons-compiler/org.codehaus.janino/2.7.6//commons-compiler-2.7.6.jar
+commons-compress/org.apache.commons/1.20//commons-compress-1.20.jar
+commons-configuration/commons-configuration/1.6//commons-configuration-1.6.jar
+commons-daemon/commons-daemon/1.0.13//commons-daemon-1.0.13.jar
+commons-dbcp/commons-dbcp/1.4//commons-dbcp-1.4.jar
+commons-digester/commons-digester/1.8//commons-digester-1.8.jar
+commons-el/commons-el/1.0//commons-el-1.0.jar
+commons-httpclient/commons-httpclient/3.0.1//commons-httpclient-3.0.1.jar
+commons-io/commons-io/2.4//commons-io-2.4.jar
+commons-lang/commons-lang/2.6//commons-lang-2.6.jar
+commons-lang3/or

[hudi] branch master updated (dd1bd62 -> 2f07e12)

2021-09-29 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from dd1bd62  [HUDI-2277] HoodieDeltaStreamer reading ORC files directly 
using ORCDFSSource (#3413)
 add 2f07e12  [MINOR] Fix typo Hooodie corrected to Hoodie & reuqired 
corrected to required (#3730)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/internal/DefaultSource.java   | 2 +-
 .../src/main/java/org/apache/hudi/spark3/internal/DefaultSource.java| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


[jira] [Closed] (HUDI-2487) An empty message in Kafka causes a task exception

2021-09-27 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2487.
--
Fix Version/s: (was: 0.9.0)
   0.10.0
   Resolution: Implemented

9067657a5ff313990c819065ad12d71fa8bb0f06

> An empty message in Kafka causes a task exception
> -
>
> Key: HUDI-2487
> URL: https://issues.apache.org/jira/browse/HUDI-2487
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: qianchutao
>Assignee: qianchutao
>Priority: Major
>  Labels: easyfix, newbie, pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> h1. Question:
>       When I use deltaStreamer to update hive tables in upsert mode from json 
> data in Kafka to HUDi, if the value of the message body in Kafka is null, the 
> task throws an exception.
> h2. Exception description:
> Lost task 0.1 in stage 2.0 (TID 24, 
> node-group-1UtpO.1f562475-6982-4b16-a50d-d19b0ebff950.com, executor 6): 
> org.apache.hudi.exception.HoodieException: The value of tmSmp can not be null
>  at 
> org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:463)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.lambda$readFromSource$d62e16$1(DeltaSync.java:389)
>  at 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
>  at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:196)
>  at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:58)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>  at org.apache.spark.scheduler.Task.run(Task.scala:123)
>  at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:413)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1551)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> h1. The task Settings:
>  
> {code:java}
> hoodie.datasource.write.precombine.field=tmSmp
> hoodie.datasource.write.recordkey.field=subOrderId,activityId,ticketId
> hoodie.datasource.hive_sync.partition_fields=db,dt
> hoodie.datasource.write.partitionpath.field=db:SIMPLE,dt:SIMPLE
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
> hoodie.datasource.hive_sync.enable=true
> hoodie.datasource.meta.sync.enable=true
> hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
> hoodie.datasource.hive_sync.support_timestamp=true
> hoodie.datasource.hive_sync.auto_create_database=true
> hoodie.meta.sync.client.tool.class=org.apache.hudi.hive.HiveSyncTool
> hoodie.datasource.hive_sync.base_file_format=PARQUET
> {code}
>  
>  
> h1. Spark-submit Script parameter Settings:
>  
> {code:java}
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
> --source-ordering-field tmSmp \
> --table-type MERGE_ON_READ  \
> --target-table ${TABLE_NAME} \
> --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
> --enable-sync \
> --op UPSERT \
> --continuous \
> {code}
>  
>  
>        So I think some optimizations can be made to prevent task throwing, 
> such as filtering messages with a null value in Kafka.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka (#3715)

2021-09-27 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9067657  [HUDI-2487] Fix JsonKafkaSource cannot filter empty messages 
from kafka (#3715)
9067657 is described below

commit 9067657a5ff313990c819065ad12d71fa8bb0f06
Author: qianchutao <72595723+qianchu...@users.noreply.github.com>
AuthorDate: Tue Sep 28 13:47:15 2021 +0800

[HUDI-2487] Fix JsonKafkaSource cannot filter empty messages from kafka 
(#3715)
---
 .../hudi/utilities/sources/JsonKafkaSource.java|  6 +-
 .../utilities/sources/TestJsonKafkaSource.java | 22 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
index cf9e905..39340d0 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JsonKafkaSource.java
@@ -69,7 +69,11 @@ public class JsonKafkaSource extends JsonSource {
 
   private JavaRDD toRDD(OffsetRange[] offsetRanges) {
 return KafkaUtils.createRDD(sparkContext, offsetGen.getKafkaParams(), 
offsetRanges,
-LocationStrategies.PreferConsistent()).map(x -> (String) 
x.value());
+LocationStrategies.PreferConsistent()).filter(x -> {
+  String msgValue = (String) x.value();
+  //Filter null messages from Kafka to prevent Exceptions
+  return msgValue != null;
+}).map(x -> (String) x.value());
   }
 
   @Override
diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonKafkaSource.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonKafkaSource.java
index da11035..2ed4c42 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonKafkaSource.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonKafkaSource.java
@@ -151,6 +151,28 @@ public class TestJsonKafkaSource extends UtilitiesTestBase 
{
 assertEquals(Option.empty(), fetch4AsRows.getBatch());
   }
 
+  // test whether empty messages can be filtered
+  @Test
+  public void testJsonKafkaSourceFilterNullMsg() {
+// topic setup.
+testUtils.createTopic(TEST_TOPIC_NAME, 2);
+HoodieTestDataGenerator dataGenerator = new HoodieTestDataGenerator();
+TypedProperties props = createPropsForJsonSource(null, "earliest");
+
+Source jsonSource = new JsonKafkaSource(props, jsc, sparkSession, 
schemaProvider, metrics);
+SourceFormatAdapter kafkaSource = new SourceFormatAdapter(jsonSource);
+
+// 1. Extract without any checkpoint => get all the data, respecting 
sourceLimit
+assertEquals(Option.empty(), 
kafkaSource.fetchNewDataInAvroFormat(Option.empty(), 
Long.MAX_VALUE).getBatch());
+// Send  1000 non-null messages to Kafka
+testUtils.sendMessages(TEST_TOPIC_NAME, 
Helpers.jsonifyRecords(dataGenerator.generateInserts("000", 1000)));
+// Send  100 null messages to Kafka
+testUtils.sendMessages(TEST_TOPIC_NAME,new String[100]);
+InputBatch> fetch1 = 
kafkaSource.fetchNewDataInAvroFormat(Option.empty(), Long.MAX_VALUE);
+// Verify that messages with null values are filtered
+assertEquals(1000, fetch1.getBatch().get().count());
+  }
+
   // test case with kafka offset reset strategy
   @Test
   public void testJsonKafkaSourceResetStrategy() {


[jira] [Closed] (HUDI-2447) Extract common parts from 'if' & Fix typo

2021-09-17 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2447.
--
Resolution: Done

> Extract common parts from 'if' & Fix typo
> -
>
> Key: HUDI-2447
> URL: https://issues.apache.org/jira/browse/HUDI-2447
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Extract common parts from 'if' & Fix typo



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2447) Extract common parts from 'if' & Fix typo

2021-09-17 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2447:
---
Priority: Minor  (was: Major)

> Extract common parts from 'if' & Fix typo
> -
>
> Key: HUDI-2447
> URL: https://issues.apache.org/jira/browse/HUDI-2447
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Extract common parts from 'if' & Fix typo



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (61d0096 -> 3a150ee)

2021-09-17 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 61d0096  [HUDI-2434] Make periodSeconds of GraphiteReporter 
configurable (#3667)
 add 3a150ee  [HUDI-2447] Extract common business logic & Fix typo (#3683)

No new revisions were added by this update.

Summary of changes:
 .../main/java/org/apache/hudi/dla/DLASyncTool.java | 28 ++
 1 file changed, 12 insertions(+), 16 deletions(-)


[jira] [Closed] (HUDI-2434) Add GraphiteReporter reporter periodSeconds config

2021-09-17 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2434.
--
Resolution: Done

> Add GraphiteReporter reporter periodSeconds config
> --
>
> Key: HUDI-2434
> URL: https://issues.apache.org/jira/browse/HUDI-2434
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2434] Make periodSeconds of GraphiteReporter configurable (#3667)

2021-09-17 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 61d0096  [HUDI-2434] Make periodSeconds of GraphiteReporter 
configurable (#3667)
61d0096 is described below

commit 61d009608899bc70c1372d5cb00a2f35e188c30c
Author: liujinhui <965147...@qq.com>
AuthorDate: Fri Sep 17 19:39:55 2021 +0800

[HUDI-2434] Make periodSeconds of GraphiteReporter configurable (#3667)
---
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  4 ++
 .../metrics/HoodieMetricsGraphiteConfig.java   | 11 
 .../hudi/metrics/MetricsGraphiteReporter.java  |  4 +-
 .../hudi/metrics/TestHoodieGraphiteMetrics.java| 60 ++
 4 files changed, 78 insertions(+), 1 deletion(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index c871253..7f0ec10 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -1475,6 +1475,10 @@ public class HoodieWriteConfig extends HoodieConfig {
 return getString(HoodieMetricsGraphiteConfig.GRAPHITE_METRIC_PREFIX_VALUE);
   }
 
+  public int getGraphiteReportPeriodSeconds() {
+return 
getInt(HoodieMetricsGraphiteConfig.GRAPHITE_REPORT_PERIOD_IN_SECONDS);
+  }
+
   public String getJmxHost() {
 return getString(HoodieMetricsJmxConfig.JMX_HOST_NAME);
   }
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/metrics/HoodieMetricsGraphiteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/metrics/HoodieMetricsGraphiteConfig.java
index 12987a7..25c4c6a 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/metrics/HoodieMetricsGraphiteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/metrics/HoodieMetricsGraphiteConfig.java
@@ -61,6 +61,12 @@ public class HoodieMetricsGraphiteConfig extends 
HoodieConfig {
   .sinceVersion("0.5.1")
   .withDocumentation("Standard prefix applied to all metrics. This helps 
to add datacenter, environment information for e.g");
 
+  public static final ConfigProperty 
GRAPHITE_REPORT_PERIOD_IN_SECONDS = ConfigProperty
+  .key(GRAPHITE_PREFIX + ".report.period.seconds")
+  .defaultValue(30)
+  .sinceVersion("0.10.0")
+  .withDocumentation("Graphite reporting period in seconds. Default to 
30.");
+
   /**
* @deprecated Use {@link #GRAPHITE_SERVER_HOST_NAME} and its methods instead
*/
@@ -126,6 +132,11 @@ public class HoodieMetricsGraphiteConfig extends 
HoodieConfig {
   return this;
 }
 
+public HoodieMetricsGraphiteConfig.Builder periodSeconds(String 
periodSeconds) {
+  hoodieMetricsGraphiteConfig.setValue(GRAPHITE_REPORT_PERIOD_IN_SECONDS, 
periodSeconds);
+  return this;
+}
+
 public HoodieMetricsGraphiteConfig build() {
   
hoodieMetricsGraphiteConfig.setDefaults(HoodieMetricsGraphiteConfig.class.getName());
   return hoodieMetricsGraphiteConfig;
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/MetricsGraphiteReporter.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/MetricsGraphiteReporter.java
index 9855ac0..c6dff8f 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/MetricsGraphiteReporter.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/MetricsGraphiteReporter.java
@@ -42,6 +42,7 @@ public class MetricsGraphiteReporter extends MetricsReporter {
   private final HoodieWriteConfig config;
   private String serverHost;
   private int serverPort;
+  private final int periodSeconds;
 
   public MetricsGraphiteReporter(HoodieWriteConfig config, MetricRegistry 
registry) {
 this.registry = registry;
@@ -56,12 +57,13 @@ public class MetricsGraphiteReporter extends 
MetricsReporter {
 }
 
 this.graphiteReporter = createGraphiteReport();
+this.periodSeconds = config.getGraphiteReportPeriodSeconds();
   }
 
   @Override
   public void start() {
 if (graphiteReporter != null) {
-  graphiteReporter.start(30, TimeUnit.SECONDS);
+  graphiteReporter.start(periodSeconds, TimeUnit.SECONDS);
 } else {
   LOG.error("Cannot start as the graphiteReporter is null.");
 }
diff --git 
a/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metrics/TestHoodieGraphiteMetrics.java
 
b/hudi-client/hudi-client-common/src/test/java/org/apache/hudi/metrics/TestHoodieGraphiteMetrics.java
new file mode 100644
index 000..6ff7ee8
--- /dev/null
+

[jira] [Created] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-16 Thread vinoyang (Jira)
vinoyang created HUDI-2440:
--

 Summary: Add dependency change diff script for dependency governace
 Key: HUDI-2440
 URL: https://issues.apache.org/jira/browse/HUDI-2440
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Utilities
Reporter: vinoyang
Assignee: vinoyang


Currently, hudi's dependency management is chaotic, e.g. for 
`hudi-spark-bundle_2.11`, the dependency list is here:
{code:java}
HikariCP/2.5.1//HikariCP-2.5.1.jar
ST4/4.0.4//ST4-4.0.4.jar
aircompressor/0.15//aircompressor-0.15.jar
annotations/17.0.0//annotations-17.0.0.jar
ant-launcher/1.9.1//ant-launcher-1.9.1.jar
ant/1.6.5//ant-1.6.5.jar
ant/1.9.1//ant-1.9.1.jar
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
aopalliance/1.0//aopalliance-1.0.jar
apache-curator/2.7.1//apache-curator-2.7.1.pom
apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
api-util/1.0.0-M20//api-util-1.0.0-M20.jar
asm/3.1//asm-3.1.jar
avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
avatica/1.8.0//avatica-1.8.0.jar
avro/1.8.2//avro-1.8.2.jar
bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
calcite-core/1.10.0//calcite-core-1.10.0.jar
calcite-druid/1.10.0//calcite-druid-1.10.0.jar
calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
commons-cli/1.2//commons-cli-1.2.jar
commons-codec/1.4//commons-codec-1.4.jar
commons-collections/3.2.2//commons-collections-3.2.2.jar
commons-compiler/2.7.6//commons-compiler-2.7.6.jar
commons-compress/1.9//commons-compress-1.9.jar
commons-configuration/1.6//commons-configuration-1.6.jar
commons-daemon/1.0.13//commons-daemon-1.0.13.jar
commons-dbcp/1.4//commons-dbcp-1.4.jar
commons-digester/1.8//commons-digester-1.8.jar
commons-el/1.0//commons-el-1.0.jar
commons-httpclient/3.1//commons-httpclient-3.1.jar
commons-io/2.4//commons-io-2.4.jar
commons-lang/2.6//commons-lang-2.6.jar
commons-lang3/3.1//commons-lang3-3.1.jar
commons-logging/1.2//commons-logging-1.2.jar
commons-math/2.2//commons-math-2.2.jar
commons-math3/3.1.1//commons-math3-3.1.1.jar
commons-net/3.1//commons-net-3.1.jar
commons-pool/1.5.4//commons-pool-1.5.4.jar
curator-client/2.7.1//curator-client-2.7.1.jar
curator-framework/2.7.1//curator-framework-2.7.1.jar
curator-recipes/2.7.1//curator-recipes-2.7.1.jar
datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.10.2.0//derby-10.10.2.0.jar
disruptor/3.3.0//disruptor-3.3.0.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
fastutil/7.0.13//fastutil-7.0.13.jar
findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
fluent-hc/4.4.1//fluent-hc-4.4.1.jar
groovy-all/2.4.4//groovy-all-2.4.4.jar
gson/2.3.1//gson-2.3.1.jar
guava/14.0.1//guava-14.0.1.jar
guice-assistedinject/3.0//guice-assistedinject-3.0.jar
guice-servlet/3.0//guice-servlet-3.0.jar
guice/3.0//guice-3.0.jar
hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
hadoop-client/2.7.3//hadoop-client-2.7.3.jar
hadoop-common/2.7.3//hadoop-common-2.7.3.jar
hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.jar
hadoop-yarn-registry/2.7.1//hadoop-yarn-registry-2.7.1.jar
hadoop-yarn-server-applicationhistoryservice/2.7.2//hadoop-yarn-server-applicationhistoryservice-2.7.2.jar
hadoop-yarn-server-common/2.7.2//hadoop-yarn-server-common-2.7.2.jar
hadoop-yarn-server-resourcemanager/2.7.2//hadoop-yarn-server-resourcemanager-2.7.2.jar
hadoop-yarn-server-web-proxy/2.7.2//hadoop-yarn-server-web-proxy-2.7.2.jar
hamcrest-core/1.3//hamcrest-core-1.3.jar
hbase-annotations/1.2.3//hbase-annotations-1.2.3.jar
hbase-client/1.2.3//hbase-client-1.2.3.jar
hbase-common/1.2.3//hbase-common-1.2.3.jar
hbase-common/1.2.3/tests/hbase-common-1.2.3-tests.jar
hbase-hadoop-compat/1.2.3//hbase-hadoop-compat-1.2.3.jar
hbase-hadoop2-compat/1.2.3//hbase-hadoop2-compat-1.2.3.jar
hbase-prefix-tree/1.2.3//hbase-prefix-tree-1.2.3.jar
hbase

[jira] [Closed] (HUDI-2423) Separate some config logic from HoodieMetricsConfig into HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig

2021-09-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2423.
--
Resolution: Done

> Separate some config logic from HoodieMetricsConfig into 
> HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig
> ---
>
> Key: HUDI-2423
> URL: https://issues.apache.org/jira/browse/HUDI-2423
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2423) Separate some config logic from HoodieMetricsConfig into HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig

2021-09-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2423:
---
Fix Version/s: 0.10.0

> Separate some config logic from HoodieMetricsConfig into 
> HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig
> ---
>
> Key: HUDI-2423
> URL: https://issues.apache.org/jira/browse/HUDI-2423
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2423) Separate some config logic from HoodieMetricsConfig into HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig

2021-09-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2423:
---
Summary: Separate some config logic from HoodieMetricsConfig into 
HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig  (was:  Breakdown 
HoodieMetricsConfig into HoodieMetricsGraphiteConfig、HoodieMetricsJmxConfig...)

> Separate some config logic from HoodieMetricsConfig into 
> HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig
> ---
>
> Key: HUDI-2423
> URL: https://issues.apache.org/jira/browse/HUDI-2423
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2423] Separate some config logic from HoodieMetricsConfig into HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig (#3652)

2021-09-16 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2791fb9  [HUDI-2423] Separate some config logic from 
HoodieMetricsConfig into HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig 
(#3652)
2791fb9 is described below

commit 2791fb9a964b39ef9aaec83eafd080013186b2eb
Author: liujinhui <965147...@qq.com>
AuthorDate: Thu Sep 16 15:08:10 2021 +0800

[HUDI-2423] Separate some config logic from HoodieMetricsConfig into 
HoodieMetricsGraphiteConfig HoodieMetricsJmxConfig (#3652)
---
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  29 -
 .../config/{ => metrics}/HoodieMetricsConfig.java  | 112 +
 .../{ => metrics}/HoodieMetricsDatadogConfig.java  |   4 +-
 .../metrics/HoodieMetricsGraphiteConfig.java   | 134 +
 .../config/metrics/HoodieMetricsJmxConfig.java | 118 ++
 .../HoodieMetricsPrometheusConfig.java |  45 ++-
 .../metadata/HoodieBackedTableMetadataWriter.java  |  22 ++--
 .../datadog/TestHoodieMetricsDatadogConfig.java|   2 +-
 .../functional/TestHoodieBackedMetadata.java   |   7 +-
 9 files changed, 344 insertions(+), 129 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 4df7d0d..c871253 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -40,6 +40,11 @@ import 
org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
 import org.apache.hudi.common.table.view.FileSystemViewStorageConfig;
 import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.config.metrics.HoodieMetricsConfig;
+import org.apache.hudi.config.metrics.HoodieMetricsDatadogConfig;
+import org.apache.hudi.config.metrics.HoodieMetricsGraphiteConfig;
+import org.apache.hudi.config.metrics.HoodieMetricsJmxConfig;
+import org.apache.hudi.config.metrics.HoodieMetricsPrometheusConfig;
 import org.apache.hudi.execution.bulkinsert.BulkInsertSortMode;
 import org.apache.hudi.index.HoodieIndex;
 import org.apache.hudi.keygen.SimpleAvroKeyGenerator;
@@ -1459,23 +1464,23 @@ public class HoodieWriteConfig extends HoodieConfig {
   }
 
   public String getGraphiteServerHost() {
-return getString(HoodieMetricsConfig.GRAPHITE_SERVER_HOST_NAME);
+return getString(HoodieMetricsGraphiteConfig.GRAPHITE_SERVER_HOST_NAME);
   }
 
   public int getGraphiteServerPort() {
-return getInt(HoodieMetricsConfig.GRAPHITE_SERVER_PORT_NUM);
+return getInt(HoodieMetricsGraphiteConfig.GRAPHITE_SERVER_PORT_NUM);
   }
 
   public String getGraphiteMetricPrefix() {
-return getString(HoodieMetricsConfig.GRAPHITE_METRIC_PREFIX_VALUE);
+return getString(HoodieMetricsGraphiteConfig.GRAPHITE_METRIC_PREFIX_VALUE);
   }
 
   public String getJmxHost() {
-return getString(HoodieMetricsConfig.JMX_HOST_NAME);
+return getString(HoodieMetricsJmxConfig.JMX_HOST_NAME);
   }
 
   public String getJmxPort() {
-return getString(HoodieMetricsConfig.JMX_PORT_NUM);
+return getString(HoodieMetricsJmxConfig.JMX_PORT_NUM);
   }
 
   public int getDatadogReportPeriodSeconds() {
@@ -1777,6 +1782,8 @@ public class HoodieWriteConfig extends HoodieConfig {
 private boolean isMetadataConfigSet = false;
 private boolean isLockConfigSet = false;
 private boolean isPreCommitValidationConfigSet = false;
+private boolean isMetricsJmxConfigSet = false;
+private boolean isMetricsGraphiteConfigSet = false;
 
 public Builder withEngineType(EngineType engineType) {
   this.engineType = engineType;
@@ -1931,6 +1938,18 @@ public class HoodieWriteConfig extends HoodieConfig {
   return this;
 }
 
+public Builder withMetricsJmxConfig(HoodieMetricsJmxConfig 
metricsJmxConfig) {
+  writeConfig.getProps().putAll(metricsJmxConfig.getProps());
+  isMetricsJmxConfigSet = true;
+  return this;
+}
+
+public Builder withMetricsGraphiteConfig(HoodieMetricsGraphiteConfig 
mericsGraphiteConfig) {
+  writeConfig.getProps().putAll(mericsGraphiteConfig.getProps());
+  isMetricsGraphiteConfigSet = true;
+  return this;
+}
+
 public Builder withPreCommitValidatorConfig(HoodiePreCommitValidatorConfig 
validatorConfig) {
   writeConfig.getProps().putAll(validatorConfig.getProps());
   isPreCommitValidationConfigSet = true;
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieMetricsConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/co

[hudi] branch master updated (76554aa -> 86a7351)

2021-09-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 76554aa  [MINOR] Add document for DataSourceReadOptions (#3653)
 add 86a7351  [MINOR] Delete Redundant code (#3661)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java  | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


[hudi] branch asf-site updated: [MINOR][DOCS] Fixed the broken link on the how to contribute page (#3663)

2021-09-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new cb436e2  [MINOR][DOCS] Fixed the broken link on the how to contribute 
page (#3663)
cb436e2 is described below

commit cb436e2d0af007bda2ba9df651f3a58b358695e6
Author: Vinoth Govindarajan 
AuthorDate: Tue Sep 14 23:44:21 2021 -0700

[MINOR][DOCS] Fixed the broken link on the how to contribute page (#3663)
---
 website/contribute/how-to-contribute.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/website/contribute/how-to-contribute.md 
b/website/contribute/how-to-contribute.md
index 1d2ff4d..137e5b2 100644
--- a/website/contribute/how-to-contribute.md
+++ b/website/contribute/how-to-contribute.md
@@ -33,7 +33,7 @@ Committers are chosen by a majority vote of the Apache Hudi 
[PMC](https://www.ap
 ## Code Contributions
 
 Useful resources for contributing can be found under the "Quick Links" left 
menu.
-Specifically, please refer to the detailed [contribution 
guide](/contribute/how-to-contribute).
+Specifically, please refer to the detailed [contribution 
guide](/contribute/developer-setup).
 
 ## Accounts
 


[hudi] branch master updated (627f20f -> 76554aa)

2021-09-15 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 627f20f  [HUDI-2430] Make decimal compatible with hudi for flink 
writer (#3658)
 add 76554aa  [MINOR] Add document for DataSourceReadOptions (#3653)

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/hudi/DataSourceOptions.scala| 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)


[jira] [Updated] (HUDI-2410) Fix getDefaultBootstrapIndexClass logical error

2021-09-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2410:
---
Description: 
 
{code:java}
public static String getDefaultBootstrapIndexClass(Properties props) {
 String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
 if 
("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key({ 
defaultClass = NO_OP_BOOTSTRAP_INDEX_CLASS; 
  }
  return defaultClass;
 }
{code}
 

When hoodie.bootstrap.index.enable is not passed, the original logic will 
follow HFileBootstrapIndex,

This should not be judged here

  was:
public static String getDefaultBootstrapIndexClass(Properties props) {
 String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
 if ("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key(

{ defaultClass = NO_OP_BOOTSTRAP_INDEX_CLASS; }

return defaultClass;
 }

When hoodie.bootstrap.index.enable is not passed, the original logic will 
follow HFileBootstrapIndex,

This should not be judged here


> Fix getDefaultBootstrapIndexClass logical error
> ---
>
> Key: HUDI-2410
> URL: https://issues.apache.org/jira/browse/HUDI-2410
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
>  
> {code:java}
> public static String getDefaultBootstrapIndexClass(Properties props) {
>  String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
>  if 
> ("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key({ 
> defaultClass = NO_OP_BOOTSTRAP_INDEX_CLASS; 
>   }
>   return defaultClass;
>  }
> {code}
>  
> When hoodie.bootstrap.index.enable is not passed, the original logic will 
> follow HFileBootstrapIndex,
> This should not be judged here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2410) Fix getDefaultBootstrapIndexClass logical error

2021-09-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2410.
--
Resolution: Fixed

9f3c4a2a7f565f7bcc32189a202a3d400ece23f1

> Fix getDefaultBootstrapIndexClass logical error
> ---
>
> Key: HUDI-2410
> URL: https://issues.apache.org/jira/browse/HUDI-2410
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> public static String getDefaultBootstrapIndexClass(Properties props) {
>  String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
>  if 
> ("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key(
> { defaultClass = NO_OP_BOOTSTRAP_INDEX_CLASS; }
> return defaultClass;
>  }
> When hoodie.bootstrap.index.enable is not passed, the original logic will 
> follow HFileBootstrapIndex,
> This should not be judged here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2410) Fix getDefaultBootstrapIndexClass logical error

2021-09-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2410:
--

Assignee: liujinhui

> Fix getDefaultBootstrapIndexClass logical error
> ---
>
> Key: HUDI-2410
> URL: https://issues.apache.org/jira/browse/HUDI-2410
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: bootstrap
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> public static String getDefaultBootstrapIndexClass(Properties props) {
>  String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
>  if 
> ("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key(
> { defaultClass = NO_OP_BOOTSTRAP_INDEX_CLASS; }
> return defaultClass;
>  }
> When hoodie.bootstrap.index.enable is not passed, the original logic will 
> follow HFileBootstrapIndex,
> This should not be judged here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (c79017c -> 9f3c4a2)

2021-09-13 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from c79017c  [HUDI-2397] Add  `--enable-sync` parameter (#3608)
 add 9f3c4a2   [HUDI-2410] Fix getDefaultBootstrapIndexClass logical error 
(#3633)

No new revisions were added by this update.

Summary of changes:
 .../src/test/java/org/apache/hudi/table/TestCleaner.java  |  2 +-
 .../org/apache/hudi/common/table/HoodieTableConfig.java   |  9 +
 .../apache/hudi/common/table/HoodieTableMetaClient.java   | 15 +++
 .../common/table/view/TestHoodieTableFileSystemView.java  |  7 ++-
 .../org/apache/hudi/common/testutils/HoodieTestUtils.java |  3 ++-
 .../java/org/apache/hudi/functional/TestBootstrap.java|  4 ++--
 6 files changed, 31 insertions(+), 9 deletions(-)


[jira] [Closed] (HUDI-2411) Remove unnecessary method overriden and note

2021-09-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2411.
--
Resolution: Done

44b9bc145e0d101bcc688f11c6a30ebcbb7a4a7d

> Remove unnecessary  method overriden and note
> -
>
> Key: HUDI-2411
> URL: https://issues.apache.org/jira/browse/HUDI-2411
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Xianghu Wang
>Assignee: Xianghu Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (512ca42 -> 44b9bc1)

2021-09-10 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 512ca42  [MINOR] Correct the comment for the parallelism of tasks in 
FlinkOptions (#3634)
 add 44b9bc1  [HUDI-2411] Remove unnecessary method overriden and note 
(#3636)

No new revisions were added by this update.

Summary of changes:
 .../hudi/index/bloom/HoodieBaseBloomIndexCheckFunction.java | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)


[hudi] branch master updated: [MINOR] Correct the comment for the parallelism of tasks in FlinkOptions (#3634)

2021-09-09 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 512ca42  [MINOR] Correct the comment for the parallelism of tasks in 
FlinkOptions (#3634)
512ca42 is described below

commit 512ca42d14a29e5d8da02198345024f2f83999d9
Author: SteNicholas 
AuthorDate: Fri Sep 10 13:42:11 2021 +0800

[MINOR] Correct the comment for the parallelism of tasks in FlinkOptions 
(#3634)
---
 .../src/main/java/org/apache/hudi/configuration/FlinkOptions.java | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java 
b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
index 6e0ff52..64b308d 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
@@ -325,19 +325,19 @@ public class FlinkOptions extends HoodieConfig {
   .key("write.index_bootstrap.tasks")
   .intType()
   .noDefaultValue()
-  .withDescription("Parallelism of tasks that do index bootstrap, default 
is 4");
+  .withDescription("Parallelism of tasks that do index bootstrap, default 
is the parallelism of the environment");
 
   public static final ConfigOption BUCKET_ASSIGN_TASKS = ConfigOptions
   .key("write.bucket_assign.tasks")
   .intType()
   .noDefaultValue()
-  .withDescription("Parallelism of tasks that do bucket assign, default is 
4");
+  .withDescription("Parallelism of tasks that do bucket assign, default is 
the parallelism of the environment");
 
   public static final ConfigOption WRITE_TASKS = ConfigOptions
   .key("write.tasks")
   .intType()
-  .defaultValue(4)
-  .withDescription("Parallelism of tasks that do actual write, default is 
4");
+  .noDefaultValue()
+  .withDescription("Parallelism of tasks that do actual write, default is 
the parallelism of the environment");
 
   public static final ConfigOption WRITE_TASK_MAX_SIZE = ConfigOptions
   .key("write.task.max.size")


[hudi] branch master updated: [MINOR] Remove unused variables (#3631)

2021-09-09 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4abcb4f  [MINOR] Remove unused variables (#3631)
4abcb4f is described below

commit 4abcb4f6591448ef1d9bbc9aa237758ae75ecba7
Author: Wei 
AuthorDate: Thu Sep 9 23:21:16 2021 +0800

[MINOR] Remove unused variables (#3631)
---
 .../org/apache/hudi/hive/replication/HiveSyncGlobalCommitConfig.java| 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/replication/HiveSyncGlobalCommitConfig.java
 
b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/replication/HiveSyncGlobalCommitConfig.java
index bce84e9..c3dd2af 100644
--- 
a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/replication/HiveSyncGlobalCommitConfig.java
+++ 
b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/replication/HiveSyncGlobalCommitConfig.java
@@ -46,10 +46,8 @@ public class HiveSyncGlobalCommitConfig extends 
GlobalHiveSyncConfig {
 
   public static String LOCAL_HIVE_SITE_URI = 
"hivesyncglobal.local_hive_site_uri";
   public static String REMOTE_HIVE_SITE_URI = 
"hivesyncglobal.remote_hive_site_uri";
-  public static String CONFIG_FILE_URI = "hivesyncglobal.config_file_uri";
   public static String REMOTE_BASE_PATH = "hivesyncglobal.remote_base_path";
   public static String LOCAL_BASE_PATH = "hivesyncglobal.local_base_path";
-  public static String RETRY_ATTEMPTS = "hivesyncglobal.retry_attempts";
   public static String REMOTE_HIVE_SERVER_JDBC_URLS = 
"hivesyncglobal.remote_hs2_jdbc_urls";
   public static String LOCAL_HIVE_SERVER_JDBC_URLS = 
"hivesyncglobal.local_hs2_jdbc_urls";
 


[jira] [Updated] (HUDI-2384) Allow log file size more than 2GB

2021-09-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2384:
---
Fix Version/s: 0.10.0

> Allow log file size more than 2GB
> -
>
> Key: HUDI-2384
> URL: https://issues.apache.org/jira/browse/HUDI-2384
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2384) Allow log file size more than 2GB

2021-09-01 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2384.
--
Resolution: Done

21fd6edfe7721c674b40877fbbdbac71b36bf782

> Allow log file size more than 2GB
> -
>
> Key: HUDI-2384
> URL: https://issues.apache.org/jira/browse/HUDI-2384
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: XiaoyuGeng
>Assignee: XiaoyuGeng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (38c9b85 -> 21fd6ed)

2021-09-01 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 38c9b85  [HUDI-2280] Use GitHub Actions to build different scala spark 
versions (#3556)
 add 21fd6ed  [HUDI-2384] Change log file size config to long (#3577)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/config/HoodieStorageConfig.java | 2 +-
 .../src/main/java/org/apache/hudi/config/HoodieWriteConfig.java   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)


[jira] [Closed] (HUDI-2320) Add support ByteArrayDeserializer in AvroKafkaSource

2021-08-29 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2320.
--
Resolution: Done

bf5a52e51bbeaa089995335a0a4c55884792e505

> Add support ByteArrayDeserializer in AvroKafkaSource
> 
>
> Key: HUDI-2320
> URL: https://issues.apache.org/jira/browse/HUDI-2320
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When the 'value.serializer' of Kafka Avro Producer is 
> 'org.apache.kafka.common.serialization.ByteArraySerializer',Use the following 
> configuration
> {code:java}
> --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
> --schemaprovider-class 
> org.apache.hudi.utilities.schema.JdbcbasedSchemaProvider \
> --hoodie-conf 
> "hoodie.deltastreamer.source.kafka.value.deserializer.class=org.apache.kafka.common.serialization.ByteArrayDeserializer"
> {code}
> For now,It will throw an exception::
> {code:java}
> java.lang.ClassCastException: [B cannot be cast to 
> org.apache.avro.generic.GenericRecord{code}
> After support ByteArrayDeserializer,Use the configuration above,It works 
> properly.And there is no need to provide 'schema.registry.url',For example, 
> we can use the JdbcbasedSchemaProvider to get the sourceSchema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-2320] Add support ByteArrayDeserializer in AvroKafkaSource (#3502)

2021-08-29 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new bf5a52e  [HUDI-2320] Add support ByteArrayDeserializer in 
AvroKafkaSource (#3502)
bf5a52e is described below

commit bf5a52e51bbeaa089995335a0a4c55884792e505
Author: 董可伦 
AuthorDate: Mon Aug 30 10:01:15 2021 +0800

[HUDI-2320] Add support ByteArrayDeserializer in AvroKafkaSource (#3502)
---
 hudi-utilities/pom.xml |  2 +-
 .../hudi/utilities/sources/AvroKafkaSource.java| 22 ++
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index 4dcc966..089b780 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -254,7 +254,7 @@
 
   com.twitter
   bijection-avro_${scala.binary.version}
-  0.9.3
+  0.9.7
 
 
 
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
index 500c412..ff8ea5a 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java
@@ -26,11 +26,13 @@ import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics;
 import org.apache.hudi.utilities.deser.KafkaAvroSchemaDeserializer;
 import org.apache.hudi.utilities.schema.SchemaProvider;
+import org.apache.hudi.utilities.sources.helpers.AvroConvertor;
 import org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen;
 import 
org.apache.hudi.utilities.sources.helpers.KafkaOffsetGen.CheckpointUtils;
 
 import org.apache.avro.generic.GenericRecord;
 import org.apache.kafka.common.serialization.StringDeserializer;
+import org.apache.kafka.common.serialization.ByteArrayDeserializer;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 import org.apache.spark.api.java.JavaRDD;
@@ -55,13 +57,15 @@ public class AvroKafkaSource extends AvroSource {
 
   private final KafkaOffsetGen offsetGen;
   private final HoodieDeltaStreamerMetrics metrics;
+  private final SchemaProvider schemaProvider;
+  private final String deserializerClassName;
 
   public AvroKafkaSource(TypedProperties props, JavaSparkContext sparkContext, 
SparkSession sparkSession,
   SchemaProvider schemaProvider, HoodieDeltaStreamerMetrics metrics) {
 super(props, sparkContext, sparkSession, schemaProvider);
 
 props.put(NATIVE_KAFKA_KEY_DESERIALIZER_PROP, StringDeserializer.class);
-String deserializerClassName = 
props.getString(DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().key(),
+deserializerClassName = 
props.getString(DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().key(),
 
DataSourceWriteOptions.KAFKA_AVRO_VALUE_DESERIALIZER_CLASS().defaultValue());
 
 try {
@@ -78,6 +82,7 @@ public class AvroKafkaSource extends AvroSource {
   throw new HoodieException(error, e);
 }
 
+this.schemaProvider = schemaProvider;
 this.metrics = metrics;
 offsetGen = new KafkaOffsetGen(props);
   }
@@ -91,12 +96,21 @@ public class AvroKafkaSource extends AvroSource {
   return new InputBatch<>(Option.empty(), 
CheckpointUtils.offsetsToStr(offsetRanges));
 }
 JavaRDD newDataRDD = toRDD(offsetRanges);
-return new InputBatch<>(Option.of(newDataRDD), 
KafkaOffsetGen.CheckpointUtils.offsetsToStr(offsetRanges));
+return new InputBatch<>(Option.of(newDataRDD), 
CheckpointUtils.offsetsToStr(offsetRanges));
   }
 
   private JavaRDD toRDD(OffsetRange[] offsetRanges) {
-return KafkaUtils.createRDD(sparkContext, offsetGen.getKafkaParams(), 
offsetRanges,
-LocationStrategies.PreferConsistent()).map(obj -> (GenericRecord) 
obj.value());
+if (deserializerClassName.equals(ByteArrayDeserializer.class.getName())) {
+  if (schemaProvider == null) {
+throw new HoodieException("Please provide a valid schema provider 
class when use ByteArrayDeserializer!");
+  }
+  AvroConvertor convertor = new 
AvroConvertor(schemaProvider.getSourceSchema());
+  return KafkaUtils.createRDD(sparkContext, 
offsetGen.getKafkaParams(), offsetRanges,
+  LocationStrategies.PreferConsistent()).map(obj -> 
convertor.fromAvroBinary(obj.value()));
+} else {
+  return KafkaUtils.createRDD(sparkContext, offsetGen.getKafkaParams(), 
offsetRanges,
+  LocationStrategies.PreferConsistent()).map(obj -> 
(GenericRecord) obj.value());
+}
   }
 
   @Override


[hudi] branch asf-site updated: [MINOR] Remove link to missing monitoring section (#3424)

2021-08-09 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new de2ea79  [MINOR] Remove link to missing monitoring section (#3424)
de2ea79 is described below

commit de2ea7970cdabf20c5a930a948a125cba261da35
Author: Damon P. Cortesi 
AuthorDate: Mon Aug 9 01:55:20 2021 -0700

[MINOR] Remove link to missing monitoring section (#3424)

The monitoring section doesn't exist in `deployment.md` so the link in the 
TOC was not working.

Unsure if it was removed or what happened, but this PR removes the link to 
the missing section.
---
 website/docs/deployment.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/website/docs/deployment.md b/website/docs/deployment.md
index 3b2366a..20bf723 100644
--- a/website/docs/deployment.md
+++ b/website/docs/deployment.md
@@ -13,7 +13,6 @@ Specifically, we will cover the following aspects.
  - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, 
guidelines and general best-practices.
  - [Migrating to Hudi](#migrating) : How to migrate your existing tables to 
Apache Hudi.
  - [Interacting via CLI](#cli) : Using the CLI to perform maintenance or 
deeper introspection.
- - [Monitoring](#monitoring) : Tracking metrics from your hudi tables using 
popular tools.
  - [Troubleshooting](#troubleshooting) : Uncovering, triaging and resolving 
issues in production.
  
 ## Deploying


[jira] [Closed] (HUDI-2225) Add compaction example

2021-08-02 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2225.
--
Fix Version/s: 0.9.0
   Resolution: Done

aa857beee00a764cee90d6e790ee4b0ab4ad4862

> Add compaction example
> --
>
> Key: HUDI-2225
> URL: https://issues.apache.org/jira/browse/HUDI-2225
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (b21ae68 -> aa857be)

2021-08-02 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from b21ae68  [MINOR] Improving runtime of TestStructuredStreaming by 2 
mins (#3382)
 add aa857be  [HUDI-2225] Add a compaction job in hudi-examples (#3347)

No new revisions were added by this update.

Summary of changes:
 .../examples/spark/HoodieMorCompactionJob.scala| 113 +
 1 file changed, 113 insertions(+)
 create mode 100644 
hudi-examples/src/main/scala/org/apache/hudi/examples/spark/HoodieMorCompactionJob.scala


[jira] [Assigned] (HUDI-2244) Fix database alreadyExistsException while hive sync

2021-07-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-2244:
--

Assignee: Zheng yunhong

> Fix database alreadyExistsException while hive sync
> ---
>
> Key: HUDI-2244
> URL: https://issues.apache.org/jira/browse/HUDI-2244
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Fix database alreadyExistsException while hive sync.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2244) Fix database alreadyExistsException while hive sync

2021-07-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2244.
--
Resolution: Fixed

eedfadeb46d5538bc7efb2c455469f1b42e9385e

> Fix database alreadyExistsException while hive sync
> ---
>
> Key: HUDI-2244
> URL: https://issues.apache.org/jira/browse/HUDI-2244
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Zheng yunhong
>Assignee: Zheng yunhong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Fix database alreadyExistsException while hive sync.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (91c2213 -> eedfade)

2021-07-28 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 91c2213  [HUDI-2245] BucketAssigner generates the fileId evenly to 
avoid data skew (#3362)
 add eedfade  [HUDI-2244] Fix database alreadyExists exception while hive 
sync (#3361)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/hive/HiveSyncTool.java|  4 +-
 .../org/apache/hudi/hive/HoodieHiveClient.java | 11 +++--
 .../org/apache/hudi/hive/TestHiveSyncTool.java | 47 ++
 .../apache/hudi/hive/testutils/HiveTestUtil.java   |  3 +-
 4 files changed, 56 insertions(+), 9 deletions(-)


[jira] [Closed] (HUDI-2230) "Task not serializable" exception due to non-serializable Codahale Timers

2021-07-28 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2230.
--
Resolution: Fixed

8105cf588e28820b9c021c9ed0e59e3f8b6efa71

> "Task not serializable" exception due to non-serializable Codahale Timers
> -
>
> Key: HUDI-2230
> URL: https://issues.apache.org/jira/browse/HUDI-2230
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.9.0
>Reporter: Dave Hagman
>Assignee: Dave Hagman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Steps to reproduce:
>  * Enable graphite metrics via props file. Example:
> {noformat}
> hoodie.metrics.on=true
> hoodie.metrics.reporter.type=GRAPHITE
> hoodie.metrics.graphite.host=
> hoodie.metrics.graphite.port=
> hoodie.metrics.graphite.metric.prefix=
> {noformat}
>  * Run the Deltastreamer
>  * Note the following exception:
> {noformat}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: Task not serializable
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:165)
>   at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:160)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:501)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:959)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1038)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1047)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: Task not serializable
>   at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:90)
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:163)
>   ... 15 more
> Caused by: org.apache.hudi.exception.HoodieException: Task not serializable
>   at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:649)
>   at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2502)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:421)
>   at org.apache.spark.api.java.JavaRDDLike.map(JavaRDDLike.scala:93)
>   at org.apache.spark.api.java.JavaRDDLik

[hudi] branch master updated (8fef50e -> 8105cf5)

2021-07-28 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 8fef50e  [HUDI-2044] Integrate consumers with rocksDB and compression 
within External Spillable Map (#3318)
 add 8105cf5  [HUDI-2230] Make codahale times transient to avoid 
serializable exceptions (#3345)

No new revisions were added by this update.

Summary of changes:
 .../hudi/utilities/deltastreamer/HoodieDeltaStreamerMetrics.java| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


[hudi] branch master updated (61148c1 -> 024cf01)

2021-07-26 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 61148c1  [HUDI-2176, 2178, 2179] Adding virtual key support to COW 
table (#3306)
 add 024cf01  [MINOR] Correct the words accroding in the comments to 
according (#3343)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/util/RowDataToAvroConverters.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[jira] [Updated] (HUDI-2216) the words 'fiels' in the comments is incorrect

2021-07-24 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2216:
---
Issue Type: Improvement  (was: Bug)

> the words 'fiels' in the comments is incorrect
> --
>
> Key: HUDI-2216
> URL: https://issues.apache.org/jira/browse/HUDI-2216
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
> Attachments: HUDI-2216.png
>
>
> the words 'fiels'  in the comments of MergeIntoHoodieTableCommand is 
> incorrect,it should be 
> 'fields'
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-2216) the words 'fiels' in the comments is incorrect

2021-07-24 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-2216.
--
Resolution: Done

a91296f14a037a148d949b2380ad503677e688c7

> the words 'fiels' in the comments is incorrect
> --
>
> Key: HUDI-2216
> URL: https://issues.apache.org/jira/browse/HUDI-2216
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Trivial
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
> Attachments: HUDI-2216.png
>
>
> the words 'fiels'  in the comments of MergeIntoHoodieTableCommand is 
> incorrect,it should be 
> 'fields'
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2216) the words 'fiels' in the comments is incorrect

2021-07-24 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-2216:
---
Priority: Trivial  (was: Major)

> the words 'fiels' in the comments is incorrect
> --
>
> Key: HUDI-2216
> URL: https://issues.apache.org/jira/browse/HUDI-2216
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Trivial
>  Labels: documentation, pull-request-available
> Fix For: 0.9.0
>
> Attachments: HUDI-2216.png
>
>
> the words 'fiels'  in the comments of MergeIntoHoodieTableCommand is 
> incorrect,it should be 
> 'fields'
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >