[jira] [Commented] (HUDI-996) Use shared spark session provider

2020-07-28 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166871#comment-17166871
 ] 

Raymond Xu commented on HUDI-996:
-

pausing the work on tagging more functional tests to functional test suite. 
There will be just repeated work following the linked PR, which can be resumed 
anytime later.

> Use shared spark session provider 
> --
>
> Key: HUDI-996
> URL: https://issues.apache.org/jira/browse/HUDI-996
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> * implement a shared spark session provider to be used for test suites, setup 
> and tear down less spark sessions and other mini servers
>  * add functional tests with similar setup logic to test suites, to make use 
> of shared spark session



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-996) Use shared spark session provider

2020-07-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-996:
---

Assignee: (was: Raymond Xu)

> Use shared spark session provider 
> --
>
> Key: HUDI-996
> URL: https://issues.apache.org/jira/browse/HUDI-996
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> * implement a shared spark session provider to be used for test suites, setup 
> and tear down less spark sessions and other mini servers
>  * add functional tests with similar setup logic to test suites, to make use 
> of shared spark session



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-995) Organize test utils methods and classes

2020-07-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-995:

Status: Open  (was: New)

> Organize test utils methods and classes
> ---
>
> Key: HUDI-995
> URL: https://issues.apache.org/jira/browse/HUDI-995
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> * Move test utils classes to hudi-common where appropriate, e.g. 
> TestRawTripPayload, HoodieDataGenerator
>  * Organize test utils into separate utils classes like `TransformUtils` for 
> transformations, `SchemaUtils` for schema loading, etc
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-995) Organize test utils methods and classes

2020-07-28 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166868#comment-17166868
 ] 

Raymond Xu commented on HUDI-995:
-

[~yanghua] yes, there'll be more incremental changes. Let me get back to the 
done criteria later, once I gather more info around a good stopping point.

> Organize test utils methods and classes
> ---
>
> Key: HUDI-995
> URL: https://issues.apache.org/jira/browse/HUDI-995
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> * Move test utils classes to hudi-common where appropriate, e.g. 
> TestRawTripPayload, HoodieDataGenerator
>  * Organize test utils into separate utils classes like `TransformUtils` for 
> transformations, `SchemaUtils` for schema loading, etc
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-995) Organize test utils methods and classes

2020-07-28 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-995:

Status: In Progress  (was: Open)

> Organize test utils methods and classes
> ---
>
> Key: HUDI-995
> URL: https://issues.apache.org/jira/browse/HUDI-995
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> * Move test utils classes to hudi-common where appropriate, e.g. 
> TestRawTripPayload, HoodieDataGenerator
>  * Organize test utils into separate utils classes like `TransformUtils` for 
> transformations, `SchemaUtils` for schema loading, etc
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #353

2020-07-28 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.32 KB...]

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Created] (HUDI-1129) AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

2020-07-28 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1129:


 Summary: AvroConversionUtils unable to handle avro to row 
transformation when passing evolved schema 
 Key: HUDI-1129
 URL: https://issues.apache.org/jira/browse/HUDI-1129
 Project: Apache Hudi
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Balaji Varadarajan


Unit test to repro : 
[https://github.com/apache/hudi/pull/1844/files#diff-2c3763c5782af9c3cbc02e2935211587R476]

Context in : 
[https://github.com/apache/hudi/issues/1845#issuecomment-665180775] (issue 2)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1129) AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

2020-07-28 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-1129:
-
Status: Open  (was: New)

> AvroConversionUtils unable to handle avro to row transformation when passing 
> evolved schema 
> 
>
> Key: HUDI-1129
> URL: https://issues.apache.org/jira/browse/HUDI-1129
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Unit test to repro : 
> [https://github.com/apache/hudi/pull/1844/files#diff-2c3763c5782af9c3cbc02e2935211587R476]
> Context in : 
> [https://github.com/apache/hudi/issues/1845#issuecomment-665180775] (issue 2)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch hudi_test_suite_refactor updated (f4ff5d6 -> d2b5125)

2020-07-28 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


 discard f4ff5d6  [HUDI-394] Provide a basic implementation of test suite
 add 1aae437  [HUDI-1102] Add common useful Spark related and Table path 
detection utilities (#1841)
 add b71f25f  [HUDI-92] Provide reasonable names for Spark DAG stages in 
HUDI. (#1289)
 add 1ec89e9  [HUDI-839] Introducing support for rollbacks using marker 
files (#1756)
 add 5e7ab11  [HUDI-994] Move TestHoodieIndex test cases to unit tests 
(#1850)
 add 743ef32  [HUDI-871] Add support for Tencent Cloud Object Storage(COS) 
(#1855)
 add 12ef8c9  [HUDI-708] Add temps show and unit test for TempViewCommand 
(#1770)
 add 5b6026b  [HUDI-802] Fixing deletes for inserts in same batch in write 
path (#1792)
 add 9bd37ef  [MINOR] Fix flaky testUpsertsUpdatePartitionPath* tests 
(#1863)
 add a8bd76c  [HUDI-1029] In inline compaction mode, previously failed 
compactions needs to be retried before new compactions (#1857)
 add 3dd189e  [MINOR] Fix checkstyle issue on 
TestHoodieClientOnCopyOnWriteStorage (#1865)
 add c39778c  [HUDI-1113] Add user define metrics reporter (#1851)
 add f61cd10  [HUDI-985] Introduce rerun ci bot (#1693)
 add da10680  [HUDI-1037] Introduce a write committed callback hook and 
given a default http callback implementation (#1842)
 add c3279cd  [HUDI-1082] Fix minor bug in deciding the insert buckets 
(#1838)
 add 467d097  [MINOR] Add Databricks File System to StorageSchemes (#1877)
 add 0cb24e4  [MINOR] Use HoodieActiveTimeline.COMMIT_FORMATTER (#1874)
 add ca36c44  [HUDI-995] Move TestRawTripPayload and 
HoodieTestDataGenerator to hudi-common (#1873)
 add fa41921  [HUDI-703] Add test for HoodieSyncCommand (#1774)
 add 5e7931b  [MINOR] Fix master compilation failure (#1881)
 add b2763f4  [MINOR] Fixing default index parallelism for simple index 
(#1882)
 add d5b593b  [MINOR] change log.info to log.debug (#1883)
 add d2b5125  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (f4ff5d6)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (d2b5125)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .github/actions/bot/package.json   |  36 +++
 .github/actions/bot/src/action.js  | 100 +
 .github/workflows/bot.yml  |  43 
 .gitignore |   2 +
 README.md  |  12 +
 .../{base.properties => hoodie-incr.properties}|  14 +-
 docker/demo/config/hoodie-schema.avsc  | 145 
 ...n_commit_time_mor.sh => sync-validate.commands} |   7 +-
 hudi-cli/pom.xml   |  17 ++
 .../main/java/org/apache/hudi/cli/HoodieCLI.java   |  12 +
 .../hudi/cli/commands/HoodieSyncCommand.java   |   4 +-
 .../apache/hudi/cli/commands/TempViewCommand.java  |  45 ++--
 .../hudi/cli/utils/SparkTempViewProvider.java  |  18 ++
 .../apache/hudi/cli/utils/TempViewProvider.java|   7 +-
 .../cli/commands/TestArchivedCommitsCommand.java   |   4 +-
 .../hudi/cli/commands/TestCleansCommand.java   |   4 +-
 .../hudi/cli/commands/TestCommitsCommand.java  |   8 +-
 .../hudi/cli/commands/TestRepairsCommand.java  |   2 +-
 .../hudi/cli/commands/TestRollbacksCommand.java|   2 +-
 .../hudi/cli/commands/TestSavepointsCommand.java   |   2 +-
 .../apache/hudi/cli/commands/TestStatsCommand.java |   2 +-
 .../apache/hudi/cli/commands/TestTableCommand.java |   8 +-
 .../hudi/cli/commands/TestTempViewCommand.java |  84 +++
 .../hudi/cli/integ/ITTestCommitsCommand.java   |   2 +-
 .../cli/integ/ITTestHDFSParquetImportCommand.java  |   2 +-
 .../hudi/cli/integ/ITTestRepairsCommand.java   |   2 +-
 .../hudi/cli/integ/ITTestSavepointsCommand.java|   2 +-
 java => AbstractShellBaseIntegrationTest.java} |   6 +-
 .../testutils/AbstractShellIntegrationTest.java|  38 +---
 .../HoodieTestCommitMetadataGenerator.java |   4 +-
 .../hudi/callback/HoodieWriteCommitCallback.java   |  35 +++
 .../http/HoodieWriteCommitHttpCallbackClient.java  | 108 +
 .../common/HoodieWriteCommitCallbackMessage.java   

[jira] [Updated] (HUDI-1128) DeltaStreamer not handling avro records written with older schema

2020-07-28 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-1128:
-
Status: Open  (was: New)

> DeltaStreamer not handling avro records written with older schema
> -
>
> Key: HUDI-1128
> URL: https://issues.apache.org/jira/browse/HUDI-1128
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>  Labels: schema-evolution
> Fix For: 0.6.1
>
>
> Context:  [https://github.com/apache/hudi/issues/1845]
> Look at issue 1 of 
> [https://github.com/apache/hudi/issues/1845#issuecomment-665180775]
> When deserializing bytes to avro in OverwriteWithLatestAvroPayload, we are 
> passing latest schema which is failing when the original record was written 
> with older schema
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1128) DeltaStreamer not handling avro records written with older schema

2020-07-28 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1128:


 Summary: DeltaStreamer not handling avro records written with 
older schema
 Key: HUDI-1128
 URL: https://issues.apache.org/jira/browse/HUDI-1128
 Project: Apache Hudi
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Balaji Varadarajan
 Fix For: 0.6.1


Context:  [https://github.com/apache/hudi/issues/1845]

Look at issue 1 of 
[https://github.com/apache/hudi/issues/1845#issuecomment-665180775]

When deserializing bytes to avro in OverwriteWithLatestAvroPayload, we are 
passing latest schema which is failing when the original record was written 
with older schema

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [MINOR] change log.info to log.debug (#1883)

2020-07-28 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d5b593b  [MINOR] change log.info to log.debug (#1883)
d5b593b is described below

commit d5b593b7d952a39679cade2b18aadbdfb2dc3eed
Author: Bhavani Sudha Saktheeswaran 
AuthorDate: Tue Jul 28 09:49:03 2020 -0700

[MINOR] change log.info to log.debug (#1883)
---
 .../org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
index 3758b9b..e14fe7e 100644
--- 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
+++ 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
@@ -132,7 +132,7 @@ public abstract class AbstractRealtimeRecordReader {
 Schema hiveSchema = Schema.createRecord(writerSchema.getName(), 
writerSchema.getDoc(), writerSchema.getNamespace(),
 writerSchema.isError());
 hiveSchema.setFields(hiveSchemaFields);
-LOG.info("HIVE Schema is :" + hiveSchema.toString(true));
+LOG.debug("HIVE Schema is :" + hiveSchema.toString(true));
 return hiveSchema;
   }
 



[hudi] branch hudi_test_suite_refactor updated (3b4ac10 -> f4ff5d6)

2020-07-28 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


 discard 3b4ac10  [HUDI-394] Provide a basic implementation of test suite
 add f4ff5d6  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (3b4ac10)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (f4ff5d6)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[hudi] branch master updated: [MINOR] Fixing default index parallelism for simple index (#1882)

2020-07-28 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new b2763f4  [MINOR] Fixing default index parallelism for simple index 
(#1882)
b2763f4 is described below

commit b2763f433b3efb92fdcc0e760a88a43eaa2e5be3
Author: Sivabalan Narayanan 
AuthorDate: Tue Jul 28 11:22:09 2020 -0400

[MINOR] Fixing default index parallelism for simple index (#1882)
---
 .../src/main/java/org/apache/hudi/config/HoodieIndexConfig.java   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java 
b/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
index 4e974af..83e9f67 100644
--- a/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
+++ b/hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
@@ -65,9 +65,9 @@ public class HoodieIndexConfig extends DefaultHoodieConfig {
   public static final String SIMPLE_INDEX_USE_CACHING_PROP = 
"hoodie.simple.index.use.caching";
   public static final String DEFAULT_SIMPLE_INDEX_USE_CACHING = "true";
   public static final String SIMPLE_INDEX_PARALLELISM_PROP = 
"hoodie.simple.index.parallelism";
-  public static final String DEFAULT_SIMPLE_INDEX_PARALLELISM = "0";
+  public static final String DEFAULT_SIMPLE_INDEX_PARALLELISM = "50";
   public static final String GLOBAL_SIMPLE_INDEX_PARALLELISM_PROP = 
"hoodie.global.simple.index.parallelism";
-  public static final String DEFAULT_GLOBAL_SIMPLE_INDEX_PARALLELISM = "0";
+  public static final String DEFAULT_GLOBAL_SIMPLE_INDEX_PARALLELISM = "100";
 
   // 1B bloom filter checks happen in 250 seconds. 500ms to read a bloom 
filter.
   // 10M checks in 2500ms, thus amortizing the cost of reading bloom filter 
across partitions.



[hudi] branch asf-site updated: Travis CI build asf-site

2020-07-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 9a1b21c  Travis CI build asf-site
9a1b21c is described below

commit 9a1b21ce371b5aceab4541adc08c575f64fd3f02
Author: CI 
AuthorDate: Tue Jul 28 13:13:41 2020 +

Travis CI build asf-site
---
 content/cn/docs/configurations.html | 10 +++
 content/docs/configurations.html| 10 +++
 content/docs/metrics.html   | 52 +
 3 files changed, 72 insertions(+)

diff --git a/content/cn/docs/configurations.html 
b/content/cn/docs/configurations.html
index 6e49d8d..589e8d4 100644
--- a/content/cn/docs/configurations.html
+++ b/content/cn/docs/configurations.html
@@ -880,6 +880,16 @@ Hudi提供了一个选项,可以通过将对该分区中的插入作为对现
 属性: hoodie.metrics.datadog.metric.tags 
 Datadog指标标签(逗号分隔),将和指标数据一并发送。
 
+用户自定义发送器
+
+on(metricsOn = false)
+属性: hoodie.metrics.on 
+打开或关闭发送指标。默认情况下处于关闭状态。
+
+withReporterClass(className = “”)
+属性: hoodie.metrics.reporter.class 

+用于处理发送指标的用户自定义类,必须是AbstractUserDefinedMetricsReporter类的子类.
+
 内存配置
 控制由Hudi内部执行的压缩和合并的内存使用情况
 withMemoryConfig (HoodieMemoryConfig) 
diff --git a/content/docs/configurations.html b/content/docs/configurations.html
index db94025..69aab3f 100644
--- a/content/docs/configurations.html
+++ b/content/docs/configurations.html
@@ -857,6 +857,16 @@ HoodieWriteConfig can be built using a builder pattern as 
below.
 Property: hoodie.metrics.datadog.metric.tags 
 Datadog metric tags (comma-delimited) to be sent 
along with metrics data.
 
+USER DEFINED REPORTER
+
+on(metricsOn = false)
+hoodie.metrics.on 
+Turn on/off metrics reporting. off by 
default.
+
+withReporterClass(className = “”)
+Property: hoodie.metrics.reporter.class 
+User-defined class used to report metrics, must be a 
subclass of AbstractUserDefinedMetricsReporter.
+
 Memory configs
 Controls memory usage for compaction and merges, performed internally by 
Hudi
 withMemoryConfig (HoodieMemoryConfig) 
diff --git a/content/docs/metrics.html b/content/docs/metrics.html
index 4eb3020..f537328 100644
--- a/content/docs/metrics.html
+++ b/content/docs/metrics.html
@@ -371,6 +371,7 @@
   JmxMetricsReporter
   MetricsGraphiteReporter
   DatadogMetricsReporter
+  UserDefinedMetricsReporter
 
   
   HoodieMetrics
@@ -465,6 +466,57 @@ A reporter which publishes metric values to Datadog 
monitoring service via Datad
   prefix.table 
name.deltastreamer.hiveSyncDuration
 
 
+UserDefinedMetricsReporter
+
+Allows users to define a custom metrics reporter.
+
+Configurations
+The following is an example of UserDefinedMetricsReporter. More detailed 
configurations can be referenced here.
+
+hoodie.metrics.on=true
+hoodie.metrics.reporter.class=test.TestUserDefinedMetricsReporter
+
+
+Demo
+In this simple demo, TestMetricsReporter will print all gauges every 10 
seconds
+
+public static class TestUserDefinedMetricsReporter 
+extends AbstractUserDefinedMetricsReporter {
+  private static final Logger log = LogManager.getLogger(DummyMetricsReporter.class);
+
+  private ScheduledExecutorService exec = Executors.newScheduledThreadPool(1, r - {
+  Thread t = Executors.defaultThreadFactory().newThread(r);
+  t.setDaemon(true);
+  return t;
+  });
+
+  public TestUserDefinedMetricsReporter(Properties props, MetricRegistry registry) {
+super(props, registry);
+  }
+
+  @Override
+  public void start() {
+exec.schedule(this::report, 10, TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void report() {
+this.getRegistry().getGauges().forEach((key, value) - 
+  log.info("key: " 
+ key + 
" value: " + value.getValue().toString()));
+  }
+
+  @Override
+  public Closeable getReporter() {
+return null;
+  }
+
+  @Override
+  public void stop() {
+exec.shutdown();
+  }
+}
+
+
 HoodieMetrics
 
 Once the Hudi writer is configured with the right table and environment for 
HoodieMetrics, it produces the following 
HoodieMetrics, that aid in debugging 
hudi tables



[hudi] branch asf-site updated: [DOC][HUDI-1123] add doc for user defined metrics reporter (#1879)

2020-07-28 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 7b12147  [DOC][HUDI-1123] add doc for user defined metrics reporter 
(#1879)
7b12147 is described below

commit 7b12147f7b2cb35598ee20ecd8caa2f25de5db47
Author: zherenyu831 <52404525+zherenyu...@users.noreply.github.com>
AuthorDate: Tue Jul 28 21:33:13 2020 +0900

[DOC][HUDI-1123] add doc for user defined metrics reporter (#1879)
---
 docs/_docs/2_4_configurations.cn.md | 10 +++
 docs/_docs/2_4_configurations.md| 11 
 docs/_docs/2_8_metrics.md   | 53 +
 3 files changed, 74 insertions(+)

diff --git a/docs/_docs/2_4_configurations.cn.md 
b/docs/_docs/2_4_configurations.cn.md
index a75419e..b577990 100644
--- a/docs/_docs/2_4_configurations.cn.md
+++ b/docs/_docs/2_4_configurations.cn.md
@@ -519,6 +519,16 @@ Hudi提供了一个选项,可以通过将对该分区中的插入作为对现
 属性: `hoodie.metrics.datadog.metric.tags` 
 Datadog指标标签(逗号分隔),将和指标数据一并发送。
 
+ 用户自定义发送器
+
+# on(metricsOn = false) {#on}
+属性: `hoodie.metrics.on` 
+打开或关闭发送指标。默认情况下处于关闭状态。
+
+# withReporterClass(className = "") {#withReporterClass}
+属性: `hoodie.metrics.reporter.class` 
+用于处理发送指标的用户自定义类,必须是AbstractUserDefinedMetricsReporter类的子类.
+
 ### 内存配置
 控制由Hudi内部执行的压缩和合并的内存使用情况
 [withMemoryConfig](#withMemoryConfig) (HoodieMemoryConfig) 
diff --git a/docs/_docs/2_4_configurations.md b/docs/_docs/2_4_configurations.md
index 6fc1049..627d148 100644
--- a/docs/_docs/2_4_configurations.md
+++ b/docs/_docs/2_4_configurations.md
@@ -483,6 +483,17 @@ Property: `hoodie.metrics.datadog.metric.host` 
 Property: `hoodie.metrics.datadog.metric.tags` 
 Datadog metric tags (comma-delimited) to be sent 
along with metrics data.
 
+ USER DEFINED REPORTER
+
+# on(metricsOn = false) {#on}
+`hoodie.metrics.on` 
+Turn on/off metrics reporting. off by default.
+
+# withReporterClass(className = "") {#withReporterClass}
+Property: `hoodie.metrics.reporter.class` 
+User-defined class used to report metrics, must be a 
subclass of AbstractUserDefinedMetricsReporter.
+
+
 ### Memory configs
 Controls memory usage for compaction and merges, performed internally by Hudi
 [withMemoryConfig](#withMemoryConfig) (HoodieMemoryConfig) 
diff --git a/docs/_docs/2_8_metrics.md b/docs/_docs/2_8_metrics.md
index e5043af..287053c 100644
--- a/docs/_docs/2_8_metrics.md
+++ b/docs/_docs/2_8_metrics.md
@@ -90,6 +90,59 @@ In this demo, we ran a `HoodieDeltaStreamer` job with 
`HoodieMetrics` turned on
 
  * `..deltastreamer.duration`
  * `..deltastreamer.hiveSyncDuration`
+ 
+### UserDefinedMetricsReporter
+
+Allows users to define a custom metrics reporter.
+
+ Configurations
+The following is an example of `UserDefinedMetricsReporter`. More detailed 
configurations can be referenced 
[here](configurations.html#user-defined-reporter).
+
+```properties
+hoodie.metrics.on=true
+hoodie.metrics.reporter.class=test.TestUserDefinedMetricsReporter
+```
+
+ Demo
+In this simple demo, TestMetricsReporter will print all gauges every 10 seconds
+
+```java
+public static class TestUserDefinedMetricsReporter 
+extends AbstractUserDefinedMetricsReporter {
+  private static final Logger log = 
LogManager.getLogger(DummyMetricsReporter.class);
+
+  private ScheduledExecutorService exec = Executors.newScheduledThreadPool(1, 
r -> {
+  Thread t = Executors.defaultThreadFactory().newThread(r);
+  t.setDaemon(true);
+  return t;
+  });
+
+  public TestUserDefinedMetricsReporter(Properties props, MetricRegistry 
registry) {
+super(props, registry);
+  }
+
+  @Override
+  public void start() {
+exec.schedule(this::report, 10, TimeUnit.SECONDS);
+  }
+
+  @Override
+  public void report() {
+this.getRegistry().getGauges().forEach((key, value) -> 
+  log.info("key: " + key + " value: " + value.getValue().toString()));
+  }
+
+  @Override
+  public Closeable getReporter() {
+return null;
+  }
+
+  @Override
+  public void stop() {
+exec.shutdown();
+  }
+}
+```
 
 ## HoodieMetrics
 



[hudi] branch hudi_test_suite_refactor updated (9e9f930 -> 3b4ac10)

2020-07-28 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


 discard 9e9f930  [HUDI-394] Provide a basic implementation of test suite
 add 3b4ac10  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (9e9f930)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (3b4ac10)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 docker/hoodie/hadoop/hive_base/pom.xml | 1 -
 1 file changed, 1 deletion(-)



[jira] [Comment Edited] (HUDI-1116) Support time travel using timestamp type

2020-07-28 Thread linshan-ma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166195#comment-17166195
 ] 

linshan-ma edited comment on HUDI-1116 at 7/28/20, 6:54 AM:


so sorry. I was obsessed with the company's affairs. Now, does anyone follow 
this issue? I may not have time to finish the ticket


was (Author: linshan):
so sorry. I was obsessed with the company's affairs. Now, does anyone follow 
this issue? I may not have time to finish the work

> Support time travel using timestamp type
> 
>
> Key: HUDI-1116
> URL: https://issues.apache.org/jira/browse/HUDI-1116
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: linshan-ma
>Priority: Major
>
>  
> {{Currently, we use commit time to mimic time-travel queries. We need ability 
> to handle time-travel with a proper timestamp provided.}}
> {{}}
> {{For e:g: }}
> {{spark.read  .format(“hudi”).option(“timestampAsOf”, 
> “2019-01-01”).load(“/path/to/my/table”)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1116) Support time travel using timestamp type

2020-07-28 Thread linshan-ma (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166195#comment-17166195
 ] 

linshan-ma commented on HUDI-1116:
--

so sorry. I was obsessed with the company's affairs. Now, does anyone follow 
this issue? I may not have time to finish the work

> Support time travel using timestamp type
> 
>
> Key: HUDI-1116
> URL: https://issues.apache.org/jira/browse/HUDI-1116
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: linshan-ma
>Priority: Major
>
>  
> {{Currently, we use commit time to mimic time-travel queries. We need ability 
> to handle time-travel with a proper timestamp provided.}}
> {{}}
> {{For e:g: }}
> {{spark.read  .format(“hudi”).option(“timestampAsOf”, 
> “2019-01-01”).load(“/path/to/my/table”)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [MINOR] Fix master compilation failure (#1881)

2020-07-28 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e7931b  [MINOR] Fix master compilation failure (#1881)
5e7931b is described below

commit 5e7931b1f9d586a44bcfa13e21917b24d4f78596
Author: Udit Mehrotra 
AuthorDate: Mon Jul 27 23:02:58 2020 -0700

[MINOR] Fix master compilation failure (#1881)

Co-authored-by: Udit Mehrotra 
---
 hudi-spark/src/test/java/HoodieJavaGenerateApp.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hudi-spark/src/test/java/HoodieJavaGenerateApp.java 
b/hudi-spark/src/test/java/HoodieJavaGenerateApp.java
index 64245e9..1160f2d 100644
--- a/hudi-spark/src/test/java/HoodieJavaGenerateApp.java
+++ b/hudi-spark/src/test/java/HoodieJavaGenerateApp.java
@@ -30,7 +30,7 @@ import org.apache.hudi.hive.NonPartitionedExtractor;
 import org.apache.hudi.keygen.NonpartitionedKeyGenerator;
 import org.apache.hudi.keygen.SimpleKeyGenerator;
 import org.apache.hudi.testutils.DataSourceTestUtils;
-import org.apache.hudi.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 import org.apache.spark.api.java.JavaSparkContext;