[hudi] branch hudi_test_suite_refactor updated (7cc6c55 -> 9e9f930)

2020-07-27 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/hudi.git.


omit 7cc6c55  [HUDI-394] Provide a basic implementation of test suite
 add 9e9f930  [HUDI-394] Provide a basic implementation of test suite

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (7cc6c55)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (9e9f930)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Build failed in Jenkins: hudi-snapshot-deployment-0.5 #352

2020-07-27 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.32 KB...]

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark-bundle_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Updated] (HUDI-703) Add unit test for HoodieSyncCommand

2020-07-27 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-703:
--
Status: Open  (was: New)

> Add unit test for HoodieSyncCommand
> ---
>
> Key: HUDI-703
> URL: https://issues.apache.org/jira/browse/HUDI-703
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-703) Add unit test for HoodieSyncCommand

2020-07-27 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-703.
---
Fix Version/s: 0.6.0
   Resolution: Done

Done via master branch: fa419213f62d2006aeea228180302024977feb16

> Add unit test for HoodieSyncCommand
> ---
>
> Key: HUDI-703
> URL: https://issues.apache.org/jira/browse/HUDI-703
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-703] Add test for HoodieSyncCommand (#1774)

2020-07-27 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new fa41921  [HUDI-703] Add test for HoodieSyncCommand (#1774)
fa41921 is described below

commit fa419213f62d2006aeea228180302024977feb16
Author: hongdd 
AuthorDate: Tue Jul 28 08:31:43 2020 +0800

[HUDI-703] Add test for HoodieSyncCommand (#1774)
---
 docker/demo/config/hoodie-incr.properties  |  31 
 docker/demo/config/hoodie-schema.avsc  | 145 
 docker/demo/sync-validate.commands |  19 +++
 hudi-cli/pom.xml   |  17 ++
 .../hudi/cli/commands/HoodieSyncCommand.java   |   4 +-
 .../org/apache/hudi/integ/HoodieTestHiveBase.java  | 121 +
 .../java/org/apache/hudi/integ/ITTestBase.java |   4 +-
 .../org/apache/hudi/integ/ITTestHoodieSanity.java  |   6 +-
 .../integ/command/ITTestHoodieSyncCommand.java |  75 
 .../src/test/resources/hoodie-docker.properties|  18 ++
 hudi-spark/run_hoodie_app.sh   |   4 +-
 .../src/test/java/HoodieJavaGenerateApp.java   | 190 +
 12 files changed, 625 insertions(+), 9 deletions(-)

diff --git a/docker/demo/config/hoodie-incr.properties 
b/docker/demo/config/hoodie-incr.properties
new file mode 100644
index 000..95a6627
--- /dev/null
+++ b/docker/demo/config/hoodie-incr.properties
@@ -0,0 +1,31 @@
+
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+
+hoodie.upsert.shuffle.parallelism=2
+hoodie.insert.shuffle.parallelism=2
+hoodie.bulkinsert.shuffle.parallelism=2
+hoodie.datasource.write.recordkey.field=_row_key
+hoodie.datasource.write.partitionpath.field=partition
+hoodie.deltastreamer.schemaprovider.source.schema.file=file:///var/hoodie/ws/docker/demo/config/hoodie-schema.avsc
+hoodie.deltastreamer.schemaprovider.target.schema.file=file:///var/hoodie/ws/docker/demo/config/hoodie-schema.avsc
+hoodie.deltastreamer.source.hoodieincr.partition.fields=partition
+hoodie.deltastreamer.source.hoodieincr.path=/docker_hoodie_sync_valid_test
+hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt=true
+# hive sync
+hoodie.datasource.hive_sync.table=docker_hoodie_sync_valid_test_2
+hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:1
+hoodie.datasource.hive_sync.partition_fields=partition
\ No newline at end of file
diff --git a/docker/demo/config/hoodie-schema.avsc 
b/docker/demo/config/hoodie-schema.avsc
new file mode 100644
index 000..55e255f
--- /dev/null
+++ b/docker/demo/config/hoodie-schema.avsc
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+"type": "record",
+"name": "triprec",
+"fields": [
+{
+"name": "timestamp",
+"type": "double"
+},
+{
+"name": "_row_key",
+"type": "string"
+},
+{
+"name": "rider",
+"type": "string"
+},
+{
+"name": "driver",
+"type": "string"
+},
+{
+"name": "begin_lat",
+"type": "double"
+},
+{
+"name": "begin_lon",
+"type": "double"
+},
+{
+"name": "end_lat",
+"type": "double"
+},
+{
+  

[jira] [Assigned] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-07-27 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-1098:


Assignee: sivabalan narayanan

> Marker file finalizing may block on a data file that was never written
> --
>
> Key: HUDI-1098
> URL: https://issues.apache.org/jira/browse/HUDI-1098
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.6.0
>
>
> {code:java}
> // Ensure all files in delete list is actually present. This is mandatory for 
> an eventually consistent FS. // Otherwise, we may miss deleting such files. 
> If files are not found even after retries, fail the commit 
> if (consistencyCheckEnabled) { 
>   // This will either ensure all files to be deleted are present. 
> waitForAllFiles(jsc, groupByPartition, FileVisibility.APPEAR); 
> }
> {code}
> We need to handle the case where marker file was created, but we crashed 
> before the data file was created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-27 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1049:
-
Status: Closed  (was: Patch Available)

> In inline compaction mode, previously failed compactions needs to be retried 
> before new compactions 
> 
>
> Key: HUDI-1049
> URL: https://issues.apache.org/jira/browse/HUDI-1049
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> With Async compaction, previously failed compactions are retried before new 
> compactions are run. With inline compaction, this failure retry is not 
> getting done.
>  
> As async compaction is the de-facto mode for MOR table, we haven't noticed 
> this problem in the community. But, this was reported recently as part of 
> [https://github.com/apache/hudi/issues/1764#issuecomment-648882567]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-27 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-1049.
--
Resolution: Fixed

> In inline compaction mode, previously failed compactions needs to be retried 
> before new compactions 
> 
>
> Key: HUDI-1049
> URL: https://issues.apache.org/jira/browse/HUDI-1049
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> With Async compaction, previously failed compactions are retried before new 
> compactions are run. With inline compaction, this failure retry is not 
> getting done.
>  
> As async compaction is the de-facto mode for MOR table, we haven't noticed 
> this problem in the community. But, this was reported recently as part of 
> [https://github.com/apache/hudi/issues/1764#issuecomment-648882567]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-27 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-1049:
--

> In inline compaction mode, previously failed compactions needs to be retried 
> before new compactions 
> 
>
> Key: HUDI-1049
> URL: https://issues.apache.org/jira/browse/HUDI-1049
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> With Async compaction, previously failed compactions are retried before new 
> compactions are run. With inline compaction, this failure retry is not 
> getting done.
>  
> As async compaction is the de-facto mode for MOR table, we haven't noticed 
> this problem in the community. But, this was reported recently as part of 
> [https://github.com/apache/hudi/issues/1764#issuecomment-648882567]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-242) Support Efficient bootstrap of large parquet datasets to Hudi

2020-07-27 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-242:
---

Assignee: Vinoth Chandar  (was: Balaji Varadarajan)

> Support Efficient bootstrap of large parquet datasets to Hudi
> -
>
> Key: HUDI-242
> URL: https://issues.apache.org/jira/browse/HUDI-242
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
>  Support Efficient bootstrap of large parquet tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1014) Design and Implement upgrade-downgrade infrastrucutre

2020-07-27 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166002#comment-17166002
 ] 

sivabalan narayanan commented on HUDI-1014:
---

With 0.6.0, Hoodie is switching to marker based rollback and hence location and 
format of marker files is going to differ. Hence a dataset created in pre 0.6.0 
when launching for first time with 0.6.0 if having a pending commit that needs 
to be rolled back, we need some fix to be put in or in other words some upgrade 
steps need to be done. 

 
 * Collect into on all existing pending commit file slices. 
 * Delete old marker files and recreate new ones in new format (w/ IOType). 
 * Proceed on as usual. Every write operation will try to rollback any pending 
rollback and since marker based roll back is enabled and since marker files are 
in expected format, it should be taken care of automatically. 

 

Similarly, we might introduce a command in hudi-cli to downgrade from 0.6.0 to 
pre 0.6.0. Users need to do this downgrade step before launching hudi in an 
older version if their data set was created in 0.6.0. 

 

 

> Design and Implement upgrade-downgrade infrastrucutre
> -
>
> Key: HUDI-1014
> URL: https://issues.apache.org/jira/browse/HUDI-1014
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Writer Core
>Reporter: Vinoth Chandar
>Assignee: sivabalan narayanan
>Priority: Blocker
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1127) Handling late arriving Deletes

2020-07-27 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1127:
---

 Summary: Handling late arriving Deletes
 Key: HUDI-1127
 URL: https://issues.apache.org/jira/browse/HUDI-1127
 Project: Apache Hudi
  Issue Type: Improvement
  Components: DeltaStreamer, Writer Core
Reporter: Bhavani Sudha
Assignee: Bhavani Sudha
 Fix For: 0.6.1


Recently I was working on a [PR|https://github.com/apache/hudi/pull/1704] to 
enhance OverwriteWithLatestAvroPayload class to consider records in storage 
when merging. Briefly, this class will ignore older updates if the record in 
storage is the latest one ( based on the Precombine field). 

Based on this, the expectation is that we handle any write operation that 
should be dealt with the same way - if they are older they should be ignored. 
While at this, I identified that we cannot handle all Deletes the same way. 
This is because we process deletes in two ways mainly -
 * by adding and enabling a metadata field  `_hoodie_is_deleted` to our in the 
original record and sending it as an UPSERT operation.
 * by using an empty payload using the EmptyHoodieRecordPayload and sending the 
write as a DELETE operation. 

While the former has ordering field and can be processed as expected (older 
deletes will be ignored), the later does not have any ordering field to 
identify if its an older delete or not and hence will let the older delete to 
go through.

Just opening this issue to track this gap. We would need to identify what is 
the right choice here and fix as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1126) code implementation to support structured streaming

2020-07-27 Thread linshan-ma (Jira)
linshan-ma created HUDI-1126:


 Summary: code implementation  to support  structured streaming
 Key: HUDI-1126
 URL: https://issues.apache.org/jira/browse/HUDI-1126
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: linshan-ma


code implementation to support structured streaming



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-1125) build framework to support structured streaming

2020-07-27 Thread linshan-ma (Jira)
linshan-ma created HUDI-1125:


 Summary: build framework  to support   structured streaming 
 Key: HUDI-1125
 URL: https://issues.apache.org/jira/browse/HUDI-1125
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: linshan-ma


build framework to support structured streaming

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-1125) build framework to support structured streaming

2020-07-27 Thread linshan-ma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

linshan-ma reassigned HUDI-1125:


Assignee: linshan-ma

> build framework  to support   structured streaming 
> ---
>
> Key: HUDI-1125
> URL: https://issues.apache.org/jira/browse/HUDI-1125
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: linshan-ma
>Assignee: linshan-ma
>Priority: Major
>
> build framework to support structured streaming
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-995) Organize test utils methods and classes

2020-07-27 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165635#comment-17165635
 ] 

vinoyang commented on HUDI-995:
---

[~rxu] Does this ticket still contain other unfinished PRs? If yes, I will not 
close it.

> Organize test utils methods and classes
> ---
>
> Key: HUDI-995
> URL: https://issues.apache.org/jira/browse/HUDI-995
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: pull-request-available
>
> * Move test utils classes to hudi-common where appropriate, e.g. 
> TestRawTripPayload, HoodieDataGenerator
>  * Organize test utils into separate utils classes like `TransformUtils` for 
> transformations, `SchemaUtils` for schema loading, etc
>  *



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated: [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common (#1873)

2020-07-27 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ca36c44  [HUDI-995] Move TestRawTripPayload and 
HoodieTestDataGenerator to hudi-common (#1873)
ca36c44 is described below

commit ca36c44cb3a081ce68282742f177f471fb7ed8c3
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Mon Jul 27 04:21:45 2020 -0700

[HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to 
hudi-common (#1873)
---
 .../hudi/cli/commands/TestCleansCommand.java   |  4 +-
 .../hudi/cli/commands/TestCommitsCommand.java  |  4 +-
 .../hudi/cli/commands/TestRepairsCommand.java  |  2 +-
 .../hudi/cli/commands/TestRollbacksCommand.java|  2 +-
 .../hudi/cli/commands/TestSavepointsCommand.java   |  2 +-
 .../apache/hudi/cli/commands/TestStatsCommand.java |  2 +-
 .../apache/hudi/cli/commands/TestTableCommand.java |  8 +-
 .../hudi/cli/integ/ITTestCommitsCommand.java   |  2 +-
 .../cli/integ/ITTestHDFSParquetImportCommand.java  |  2 +-
 .../hudi/cli/integ/ITTestRepairsCommand.java   |  2 +-
 .../hudi/cli/integ/ITTestSavepointsCommand.java|  2 +-
 .../HoodieTestCommitMetadataGenerator.java |  4 +-
 .../org/apache/hudi/client/TestClientRollback.java |  2 +-
 .../TestHoodieClientOnCopyOnWriteStorage.java  | 20 ++---
 .../java/org/apache/hudi/client/TestMultiFS.java   |  2 +-
 .../hudi/client/TestTableSchemaEvolution.java  | 20 ++---
 .../hudi/client/TestUpdateSchemaEvolution.java | 12 +--
 .../hudi/execution/TestBoundedInMemoryQueue.java   |  2 +-
 .../TestSparkBoundedInMemoryExecutor.java  |  2 +-
 .../org/apache/hudi/index/TestHoodieIndex.java | 31 
 .../hudi/index/bloom/TestHoodieBloomIndex.java | 46 ++--
 .../index/bloom/TestHoodieGlobalBloomIndex.java| 64 
 .../apache/hudi/index/hbase/TestHBaseIndex.java|  2 +-
 .../index/hbase/TestHBaseQPSResourceAllocator.java |  2 +-
 .../hudi/io/TestHoodieKeyLocationFetchHandle.java  |  8 +-
 .../org/apache/hudi/io/TestHoodieMergeHandle.java  |  2 +-
 .../hudi/io/TestHoodieTimelineArchiveLog.java  |  2 +-
 .../io/storage/TestHoodieFileWriterFactory.java|  2 +-
 .../java/org/apache/hudi/table/TestCleaner.java|  2 +-
 .../hudi/table/TestHoodieMergeOnReadTable.java |  6 +-
 .../commit/TestCopyOnWriteActionExecutor.java  | 26 +++
 .../table/action/commit/TestUpsertPartitioner.java |  2 +-
 .../table/action/compact/CompactionTestBase.java   |  7 +-
 .../table/action/compact/TestHoodieCompactor.java  |  2 +-
 .../rollback/HoodieClientRollbackTestBase.java |  7 +-
 .../TestCopyOnWriteRollbackActionExecutor.java |  9 ++-
 .../TestMergeOnReadRollbackActionExecutor.java |  9 ++-
 .../hudi/testutils/HoodieClientTestBase.java   |  5 +-
 .../hudi/testutils/HoodieClientTestHarness.java|  6 +-
 .../hudi/testutils/HoodieMergeOnReadTestUtils.java |  2 +
 .../hudi/testutils/MetadataMergeWriteStatus.java   | 85 ++
 .../common/fs/inline}/TestParquetInLining.java | 22 +++---
 .../common}/testutils/HoodieTestDataGenerator.java | 54 +++---
 .../hudi/common/testutils/RawTripTestPayload.java  | 84 -
 hudi-spark/src/test/java/HoodieJavaApp.java|  2 +-
 .../src/test/java/HoodieJavaStreamingApp.java  |  2 +-
 .../apache/hudi/testutils/DataSourceTestUtils.java |  3 +-
 .../apache/hudi/functional/TestDataSource.scala|  3 +-
 .../functional/TestHDFSParquetImporter.java|  2 +-
 .../functional/TestHoodieDeltaStreamer.java|  2 +-
 .../TestHoodieMultiTableDeltaStreamer.java |  2 +-
 .../functional/TestHoodieSnapshotCopier.java   |  2 +-
 .../functional/TestHoodieSnapshotExporter.java |  6 +-
 .../hudi/utilities/sources/TestKafkaSource.java|  2 +-
 .../utilities/testutils/UtilitiesTestBase.java |  6 +-
 .../testutils/sources/AbstractBaseTestSource.java  |  6 +-
 .../sources/AbstractDFSSourceTestBase.java |  2 +-
 57 files changed, 329 insertions(+), 294 deletions(-)

diff --git 
a/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestCleansCommand.java 
b/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestCleansCommand.java
index c14cf0b..3da7189 100644
--- a/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestCleansCommand.java
+++ b/hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestCleansCommand.java
@@ -33,10 +33,10 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
 import org.apache.hudi.common.table.timeline.versioning.TimelineLayoutVersion;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
 
 import 

[GitHub] [hudi] yanghua merged pull request #1873: [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common

2020-07-27 Thread GitBox


yanghua merged pull request #1873:
URL: https://github.com/apache/hudi/pull/1873


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hddong commented on pull request #1567: [HUDI-840]Clean blank file created by HoodieLogFormatWriter

2020-07-27 Thread GitBox


hddong commented on pull request #1567:
URL: https://github.com/apache/hudi/pull/1567#issuecomment-664250509


   @vinothchandar :
   
https://github.com/apache/hudi/blob/0cb24e4a2defd8e639437b6cd145a26f038ef1af/hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java#L382
   `readSchemaFromLogFile` may get the blank file and the blank file had been 
read before in `HoodieLogFileCommand`(modified to avoid reading the blank files)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hddong edited a comment on pull request #1567: [HUDI-840]Clean blank file created by HoodieLogFormatWriter

2020-07-27 Thread GitBox


hddong edited a comment on pull request #1567:
URL: https://github.com/apache/hudi/pull/1567#issuecomment-664250509


   @vinothchandar :
   
https://github.com/apache/hudi/blob/0cb24e4a2defd8e639437b6cd145a26f038ef1af/hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java#L382
   `readSchemaFromLogFile` may read the blank file and the blank file had been 
read before in `HoodieLogFileCommand`(modified to avoid reading the blank files)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zherenyu831 opened a new pull request #1879: [DOC][HUDI-1123] add doc for user defined metrics reporter

2020-07-27 Thread GitBox


zherenyu831 opened a new pull request #1879:
URL: https://github.com/apache/hudi/pull/1879


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   This pull request add user defined metrics reporter document
   
   ## Brief change log
   Add UserDefinedMetricsReporter doc in docs/_docs/2_8_metrics.md 
   Add USER DEFINED REPORTER doc in docs/_docs/2_4_configurations.md 
   Add USER DEFINED REPORTER doc in docs/_docs/2_4_configurations.cn.md 
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1123) Document the usage of user define metrics reporter

2020-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1123:
-
Labels: pull-request-available  (was: )

> Document the usage of user define metrics reporter
> --
>
> Key: HUDI-1123
> URL: https://issues.apache.org/jira/browse/HUDI-1123
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: leesf
>Assignee: Zheren Yu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)