[jira] [Commented] (HUDI-896) CI tests improvements

2020-05-13 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106884#comment-17106884
 ] 

Raymond Xu commented on HUDI-896:
-

https://github.com/apache/incubator-hudi/pull/1619
https://github.com/apache/incubator-hudi/pull/1623

> CI tests improvements
> -
>
> Key: HUDI-896
> URL: https://issues.apache.org/jira/browse/HUDI-896
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.0
>
>
> - Parallelize CI testing to reduce CI wait time
> - To avoid OOM, verify if bump up heap space (to 2g) in travis is within 
> limit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-13 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HUDI-889:

Status: In Progress  (was: Open)

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-13 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HUDI-889:

Status: Closed  (was: Patch Available)

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-13 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HUDI-889:

Status: Patch Available  (was: In Progress)

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-889) Writer supports useJdbc configuration when hive synchronization is enabled

2020-05-13 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated HUDI-889:

Status: Open  (was: New)

> Writer supports useJdbc configuration when hive synchronization is enabled
> --
>
> Key: HUDI-889
> URL: https://issues.apache.org/jira/browse/HUDI-889
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: dzcxzl
>Priority: Trivial
>
> hudi-hive-sync supports the useJdbc = false configuration, but the writer 
> does not provide this configuration at this stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-896) CI tests improvements

2020-05-13 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-896:

Status: In Progress  (was: Open)

> CI tests improvements
> -
>
> Key: HUDI-896
> URL: https://issues.apache.org/jira/browse/HUDI-896
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.0
>
>
> - Parallelize CI testing to reduce CI wait time
> - To avoid OOM, verify if bump up heap space (to 2g) in travis is within 
> limit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-896) CI tests improvements

2020-05-13 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-896:

Status: Open  (was: New)

> CI tests improvements
> -
>
> Key: HUDI-896
> URL: https://issues.apache.org/jira/browse/HUDI-896
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.0
>
>
> - Parallelize CI testing to reduce CI wait time
> - To avoid OOM, verify if bump up heap space (to 2g) in travis is within 
> limit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-896) CI tests improvements

2020-05-13 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-896:
---

 Summary: CI tests improvements
 Key: HUDI-896
 URL: https://issues.apache.org/jira/browse/HUDI-896
 Project: Apache Hudi (incubating)
  Issue Type: New Feature
  Components: Testing
Reporter: Raymond Xu
Assignee: Raymond Xu
 Fix For: 0.6.0


- Parallelize CI testing to reduce CI wait time
- To avoid OOM, verify if bump up heap space (to 2g) in travis is within limit 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-767) Support transformation when export to Hudi

2020-05-13 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106877#comment-17106877
 ] 

Raymond Xu commented on HUDI-767:
-

[~vinoth] I'm trying to clear up my queue a bit, hence deferring this to 0.6.1, 
please feel free to tag to another version.

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.6.1
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-767) Support transformation when export to Hudi

2020-05-13 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu reassigned HUDI-767:
---

Assignee: (was: Raymond Xu)

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.6.1
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2020-05-13 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-767:

Status: New  (was: Open)

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.1
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2020-05-13 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-767:

Fix Version/s: (was: 0.6.0)
   0.6.1

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
>  Labels: bug-bash-0.6.0
> Fix For: 0.6.1
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2020-05-13 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-767:

Labels:   (was: bug-bash-0.6.0)

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Major
> Fix For: 0.6.1
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-774) Spark to Avro converter incorrectly generates optional fields

2020-05-13 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106849#comment-17106849
 ] 

Raymond Xu commented on HUDI-774:
-

I think this is a very important fix..+1

For any custom schema provider, union null and other type with default null is 
very common. This bug will break this kind of use cases.

cc [~vinoth] 

> Spark to Avro converter incorrectly generates optional fields
> -
>
> Key: HUDI-774
> URL: https://issues.apache.org/jira/browse/HUDI-774
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Alexander Filipchik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think https://issues.apache.org/jira/browse/SPARK-28008 is a good 
> descriptions of what is happening.
>  
> It can cause a situation when schema in the MOR log files is incompatible 
> with the schema produced by RowBasedSchemaProvider, so compactions will stall.
>  
> I have a fix which is a bit hacky -> postprocess schema produced by the 
> converter and
> 1) Make sure unions with null types have those null types at position 0
> 2) They have default values set to null
> I couldn't find a way to do a clean fix as some classes that are problematic 
> are from Hive and called from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #277

2020-05-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.40 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-13 Thread hong dongdong (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106804#comment-17106804
 ] 

hong dongdong commented on HUDI-890:


Hi, [~bhavanisudha] take a look when free

https://jira.apache.org/jira/browse/HUDI-789

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to 
> break the inheritance chain
>  * #1388 [HUDI-681] Remove embeddedTimelineService from HoodieReadClient
>  * #1350 [HUDI-629]: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 [HUDI-799] Use appropriate FS when loading configs
>  * #1406 [HUDI-713] Fix conversion of Spark array of struct type to Avro 
> schema
>  * #1394 [HUDI-656][Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 [HUDI-850] Avoid unnecessary listings in incremental cleaning mode
>  * #1421 [HUDI-724] Parallelize getSmallFiles for partitions
>  * #1330 [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned 
> by Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 [HUDI-539] Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 [HUDI-782] Add support of Aliyun object storage service.
>  * #1432 [HUDI-716] Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 [HUDI-400] Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 [HUDI-742] Fix Java Math Exception
>  * #1416 [HUDI-717] Fixed usage of HiveDriver for DDL statements.
>  * #1427 [HUDI-727]: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 [HUDI-795] Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 [HUDI-852] adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 [HUDI-616] Fixed parquet files getting created on local FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] EdwinGuo opened a new issue #1630: [SUPPORT] Latest commit does not have any schema in commit metadata

2020-05-13 Thread GitBox


EdwinGuo opened a new issue #1630:
URL: https://github.com/apache/incubator-hudi/issues/1630


   **_Tips before filing an issue_**
   
   - Have you gone through our 
[FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
   Yes
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   I have a COW table which is in hudi version 0.5.0, when I'm trying to use 
the branch build for 
hudi-client(https://github.com/apache/incubator-hudi/commit/506447fd4fde4cd922f7aa8f4e17a7f0dc97),Latest
 commit does not have any schema in all the commit metadata. So schema suppose 
to be existing under the extrameta section, which is not in my case.
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.5.0
   
   * Spark version : 2.4.5
   
   * Hive version : 2.3.6
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) :s3
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine (#1513)

2020-05-13 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 83796b3  [HUDI-793] Adding proper default to hudi metadata fields and 
proper handling to rewrite routine (#1513)
83796b3 is described below

commit 83796b3189570182c68a9c41e57b356124c301ca
Author: Alexander Filipchik 
AuthorDate: Wed May 13 18:04:38 2020 -0700

[HUDI-793] Adding proper default to hudi metadata fields and proper 
handling to rewrite routine (#1513)

* Adding proper default to hudi metadata fields and proper handling to 
rewrite routine
* Handle fields declared with a null default

Co-authored-by: Alex Filipchik 
---
 .../main/java/org/apache/hudi/avro/HoodieAvroUtils.java | 17 +++--
 .../java/org/apache/hudi/avro/TestHoodieAvroUtils.java  |  7 ---
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java 
b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
index d56b7d9..bffe8df 100644
--- a/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
+++ b/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.avro;
 
+import org.apache.avro.JsonProperties.Null;
 import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.exception.HoodieIOException;
 import org.apache.hudi.exception.SchemaCompatabilityException;
@@ -141,15 +142,15 @@ public class HoodieAvroUtils {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 
 parentFields.add(commitTimeField);
 parentFields.add(commitSeqnoField);
@@ -253,7 +254,11 @@ public class HoodieAvroUtils {
 GenericRecord newRecord = new GenericData.Record(newSchema);
 for (Schema.Field f : fieldsToWrite) {
   if (record.get(f.name()) == null) {
-newRecord.put(f.name(), f.defaultVal());
+if (f.defaultVal() instanceof Null) {
+  newRecord.put(f.name(), null);
+} else {
+  newRecord.put(f.name(), f.defaultVal());
+}
   } else {
 newRecord.put(f.name(), record.get(f.name()));
   }
diff --git 
a/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java 
b/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
index e2c1266..9c5e046 100644
--- a/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
+++ b/hudi-common/src/test/java/org/apache/hudi/avro/TestHoodieAvroUtils.java
@@ -47,12 +47,13 @@ public class TestHoodieAvroUtils {
   + "{\"name\": \"non_pii_col\", \"type\": \"string\"},"
   + "{\"name\": \"pii_col\", \"type\": \"string\", \"column_category\": 
\"user_profile\"}]}";
 
-
-  private static String SCHEMA_WITH_METADATA_FIELD = "{\"type\": 
\"record\",\"name\": \"testrec2\",\"fields\": [ "
+  private static String SCHEMA_WITH_METADATA_FIELD =
+  "{\"type\": \"record\",\"name\": \"testrec2\",\"fields\": [ "
   + "{\"name\": \"timestamp\",\"type\": \"double\"},{\"name\": 
\"_row_key\", \"type\": \"string\"},"
   + "{\"name\": \"non_pii_col\", \"type\": \"string\"},"
   + "{\"name\": \"pii_col\", \"type\": \"string\", \"column_category\": 
\"user_profile\"},"
-  + "{\"name\": \"_hoodie_commit_time\", \"type\": [\"null\", 
\"string\"]}]}";
+  + "{\"name\": \"_hoodie_commit_time\", \"type\": [\"null\", 
\"string\"]},"
+  + "{\"name\": \"nullable_field\",\"type\": [\"null\" 
,\"string\"],\"default\": null}]}";
 
   @Test
   

[GitHub] [incubator-hudi] vinothchandar merged pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-05-13 Thread GitBox


vinothchandar merged pull request #1513:
URL: https://github.com/apache/incubator-hudi/pull/1513


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1513: [HUDI-793] Adding proper default to hudi metadata fields and proper handling to rewrite routine

2020-05-13 Thread GitBox


vinothchandar commented on a change in pull request #1513:
URL: https://github.com/apache/incubator-hudi/pull/1513#discussion_r424814160



##
File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
##
@@ -104,15 +105,15 @@ public static Schema addMetadataFields(Schema schema) {
 List parentFields = new ArrayList<>();
 
 Schema.Field commitTimeField =
-new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_TIME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field commitSeqnoField =
-new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.COMMIT_SEQNO_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field recordKeyField =
-new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.RECORD_KEY_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field partitionPathField =
-new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.PARTITION_PATH_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());
 Schema.Field fileNameField =
-new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", (Object) null);
+new Schema.Field(HoodieRecord.FILENAME_METADATA_FIELD, 
METADATA_FIELD_SCHEMA, "", NullNode.getInstance());

Review comment:
   Overloading of `null` to denote absence of a default value in avro, is 
confusing to say the least.. (atleast I learnt that now) But I am trying to 
grok the actual change in behavior.. 
   
   This change seems to be orthogonal to the fix below.. Effectively, we are 
making the metadata fields nullable (as opposed to having no default values), 
with this change... While I agree, metadata fields won't have nulls.. Having 
this safety is better.. 
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-473) IllegalArgumentException in QuickstartUtils

2020-05-13 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106755#comment-17106755
 ] 

sivabalan narayanan commented on HUDI-473:
--

[~zhangpu-paul]: are you still facing the issue? Can you help answer 
pratyaksh's questions. 

> IllegalArgumentException in QuickstartUtils 
> 
>
> Key: HUDI-473
> URL: https://issues.apache.org/jira/browse/HUDI-473
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Usability
>Reporter: zhangpu
>Priority: Minor
>  Labels: bug-bash-0.6.0, starter
>
>  First call dataGen.generateInserts to write the data,Then another process 
> call dataGen.generateUpdates ,Throws the following exception:
> Exception in thread "main" java.lang.IllegalArgumentException: bound must be 
> positive
>   at java.util.Random.nextInt(Random.java:388)
>   at 
> org.apache.hudi.QuickstartUtils$DataGenerator.generateUpdates(QuickstartUtils.java:163)
> Is the design reasonable?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on pull request #1584: fix schema provider issue

2020-05-13 Thread GitBox


vinothchandar commented on pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-628315469


   Spending today and tomorrow on all the schema PRs.. stay tuned :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan edited a comment on pull request #1584: fix schema provider issue

2020-05-13 Thread GitBox


xushiyan edited a comment on pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-628299745


   @pratyakshsharma I had a new proposed change reflected in the last commit. 
   
   The idea is to not throw exception when no data is fetched, so this is to 
loosen a bit on throwing exception and asking user to set the class. If any 
data is fetched, then it is still the old requirement on setting schema 
provider.
   
   This should work for ROW source case where users can totally forget about 
schema provider setting. For all data source types, we don't care about the 
schema if no data is fetched.
   
   cc @vinothchandar @afilipchik 
   
   Please kindly verify the changes and see if the proposal works or if i 
overlooked any side effect. Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2020-05-13 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-818:


Assignee: lamber-ken

> Optimize the default value of hoodie.memory.merge.max.size option
> -
>
> Key: HUDI-818
> URL: https://issues.apache.org/jira/browse/HUDI-818
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: bug-bash-0.6.0, help-requested
> Fix For: 0.6.0
>
>
> The default value of hoodie.memory.merge.max.size option is incapable of 
> meeting their performance requirements
> [https://github.com/apache/incubator-hudi/issues/1491]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nsivabalan commented on pull request #1596: [HUDI-863] get decimal properties from derived spark DataType

2020-05-13 Thread GitBox


nsivabalan commented on pull request #1596:
URL: https://github.com/apache/incubator-hudi/pull/1596#issuecomment-628305115


   @rolandjohann : can you assign this ticket to yourself. 
https://issues.apache.org/jira/browse/HUDI-863
   thanks. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1603: [HUDI-836] Add configs for Datadog metrics reporter

2020-05-13 Thread GitBox


xushiyan commented on pull request #1603:
URL: https://github.com/apache/incubator-hudi/pull/1603#issuecomment-628304763


   Note: blocked by #1572 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1572: [HUDI-836] Implement datadog metrics reporter

2020-05-13 Thread GitBox


xushiyan commented on pull request #1572:
URL: https://github.com/apache/incubator-hudi/pull/1572#issuecomment-628304476


   Note: blocked by #1623 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1623: [MINOR] Increase heap space for surefire

2020-05-13 Thread GitBox


xushiyan commented on pull request #1623:
URL: https://github.com/apache/incubator-hudi/pull/1623#issuecomment-628303823


   @bvaradar Give me some time to get back to you...need to dig into some docs 
for this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1607: [HUDI-811] Restructure test packages

2020-05-13 Thread GitBox


xushiyan commented on a change in pull request #1607:
URL: https://github.com/apache/incubator-hudi/pull/1607#discussion_r424790740



##
File path: 
hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/TestUtil.java
##
@@ -16,7 +16,7 @@
  * limitations under the License.
  */
 
-package org.apache.hudi.hive;
+package org.apache.hudi.hive.testutils;

Review comment:
   Yes `HiveTestUtil` is better... I'll change it in the next PR where I 
change the remaining modules.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1584: fix schema provider issue

2020-05-13 Thread GitBox


xushiyan commented on pull request #1584:
URL: https://github.com/apache/incubator-hudi/pull/1584#issuecomment-628299745


   @pratyakshsharma I had a new proposed change reflected in the last commit. 
   
   The idea is to not throw exception when no data is fetched, so this is to 
loosen a bit on throwing exception and asking user to set the class. If any 
data is fetched, then it is still the old requirement on setting schema 
provider.
   
   This should work for ROW source case where users can totally forget about 
schema provider setting.
   
   cc @vinothchandar @afilipchik 
   
   Please kindly verify the changes and see if the proposal works or if i 
overlooked any side effect. Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] bvaradar commented on pull request #1619: [MINOR] Parallelize CI tests by modules

2020-05-13 Thread GitBox


bvaradar commented on pull request #1619:
URL: https://github.com/apache/incubator-hudi/pull/1619#issuecomment-628293196


Optimizing/Redesigning tests should be definitely done from modularity, 
test-quality and maintenance perspective and having running-time reduction 
would be a good side-effect. But, that may not guarantee that we will be under 
the time-limit. 
   
   I agree with @xushiyan that this looks like a race condition with code 
coverage report generation which needs to be resolved. 




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1502: [WIP] [HUDI-778] Adding codecov bagde to readme file

2020-05-13 Thread GitBox


vinothchandar commented on pull request #1502:
URL: https://github.com/apache/incubator-hudi/pull/1502#issuecomment-628282660


   marker for later. closing to reduce clutter of WIP PRs 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar closed pull request #1502: [WIP] [HUDI-778] Adding codecov bagde to readme file

2020-05-13 Thread GitBox


vinothchandar closed pull request #1502:
URL: https://github.com/apache/incubator-hudi/pull/1502


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar commented on pull request #1557: [WIP] [HUDI-834] Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-05-13 Thread GitBox


vinothchandar commented on pull request #1557:
URL: https://github.com/apache/incubator-hudi/pull/1557#issuecomment-628282107


   Closing with a WIP tag for future follow up. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar closed pull request #1557: [WIP] [HUDI-834] Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-05-13 Thread GitBox


vinothchandar closed pull request #1557:
URL: https://github.com/apache/incubator-hudi/pull/1557


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[incubator-hudi] branch master updated: [HUDI-811] Restructure test packages (#1607)

2020-05-13 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d4848b  [HUDI-811] Restructure test packages (#1607)
0d4848b is described below

commit 0d4848b68b625a17d05b38864a84a6cc71189bfa
Author: Raymond Xu <2701446+xushi...@users.noreply.github.com>
AuthorDate: Wed May 13 15:37:03 2020 -0700

[HUDI-811] Restructure test packages (#1607)

* restructure hudi-spark tests
* restructure hudi-timeline-service tests
* restructure hudi-hadoop-mr hudi-utilities tests
* restructure hudi-hive-sync tests
---
 .../hudi/hadoop/TestHoodieParquetInputFormat.java  |  1 +
 ...hHandlerTest.java => TestInputPathHandler.java} |  2 +-
 .../TestHoodieCombineHiveInputFormat.java  |  5 ++--
 .../realtime/TestHoodieRealtimeRecordReader.java   |  2 +-
 .../{ => testutils}/InputFormatTestUtil.java   |  3 +-
 .../org/apache/hudi/hive/TestHiveSyncTool.java |  1 +
 .../hive/{util => testutils}/HiveTestService.java  |  2 +-
 .../apache/hudi/hive/{ => testutils}/TestUtil.java | 22 +++---
 hudi-spark/src/test/java/HoodieJavaApp.java|  1 +
 .../src/test/java/HoodieJavaStreamingApp.java  |  1 +
 .../apache/hudi/TestDataSourceUtils.java}  |  7 +++--
 .../hudi/testutils}/DataSourceTestUtils.java   |  3 ++
 .../apache/hudi}/TestDataSourceDefaults.scala  |  3 +-
 .../HoodieSparkSqlWriterSuite.scala|  3 +-
 .../apache/hudi/functional}/TestDataSource.scala   |  3 ++
 .../TestRemoteHoodieTableFileSystemView.java   |  2 +-
 .../hudi/utilities/HoodieSnapshotExporter.java | 14 -
 .../TestSchedulerConfGenerator.java|  4 +--
 .../TestAWSDatabaseMigrationServiceSource.java |  3 +-
 .../{ => functional}/TestHDFSParquetImporter.java  |  3 +-
 .../{ => functional}/TestHoodieDeltaStreamer.java  |  9 +++---
 .../TestHoodieMultiTableDeltaStreamer.java |  2 +-
 .../{ => functional}/TestHoodieSnapshotCopier.java |  3 +-
 .../TestHoodieSnapshotExporter.java|  3 +-
 .../TestJdbcbasedSchemaProvider.java   |  4 ++-
 .../TestTimestampBasedKeyGenerator.java|  3 +-
 .../hudi/utilities/sources/TestCsvDFSSource.java   |  7 +++--
 .../hudi/utilities/sources/TestDataSource.java |  1 +
 .../hudi/utilities/sources/TestJsonDFSSource.java  |  7 +++--
 .../hudi/utilities/sources/TestKafkaSource.java|  2 +-
 .../utilities/sources/TestParquetDFSSource.java|  5 ++--
 .../{ => testutils}/UtilitiesTestBase.java |  7 +++--
 .../sources/AbstractBaseTestSource.java|  7 +++--
 .../sources/AbstractDFSSourceTestBase.java | 34 +++---
 .../sources/DistributedTestDataSource.java |  5 ++--
 .../sources/config/TestSourceConfig.java   |  2 +-
 36 files changed, 106 insertions(+), 80 deletions(-)

diff --git 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
index 8a6aee4..9ff6241 100644
--- 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
+++ 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestHoodieParquetInputFormat.java
@@ -28,6 +28,7 @@ import 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
+import org.apache.hudi.hadoop.testutils.InputFormatTestUtil;
 
 import org.apache.avro.Schema;
 import org.apache.hadoop.fs.FileStatus;
diff --git 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/InputPathHandlerTest.java 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestInputPathHandler.java
similarity index 99%
rename from 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/InputPathHandlerTest.java
rename to 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestInputPathHandler.java
index 5a364f7..d5acf68 100644
--- 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/InputPathHandlerTest.java
+++ 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/TestInputPathHandler.java
@@ -42,7 +42,7 @@ import java.util.stream.Collectors;
 
 import static org.junit.jupiter.api.Assertions.assertTrue;
 
-public class InputPathHandlerTest {
+public class TestInputPathHandler {
 
   // Incremental Table
   public static final String RAW_TRIPS_TEST_NAME = "raw_trips";
diff --git 
a/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieCombineHiveInputFormat.java
 
b/hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/hive/TestHoodieCombineHiveInputFormat.java
similarity index 97%
rename from 

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1607: [HUDI-811] Restructure test packages

2020-05-13 Thread GitBox


vinothchandar commented on a change in pull request #1607:
URL: https://github.com/apache/incubator-hudi/pull/1607#discussion_r424769094



##
File path: 
hudi-hive-sync/src/test/java/org/apache/hudi/hive/testutils/TestUtil.java
##
@@ -16,7 +16,7 @@
  * limitations under the License.
  */
 
-package org.apache.hudi.hive;
+package org.apache.hudi.hive.testutils;

Review comment:
   should we also rename this class `HiveTestUtil` .. optional, feel free 
to skip 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] vinothchandar merged pull request #1607: [HUDI-811] Restructure test packages

2020-05-13 Thread GitBox


vinothchandar merged pull request #1607:
URL: https://github.com/apache/incubator-hudi/pull/1607


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] nandini57 opened a new issue #1629: [SUPPORT]

2020-05-13 Thread GitBox


nandini57 opened a new issue #1629:
URL: https://github.com/apache/incubator-hudi/issues/1629


   In Hudi-0.5.1 HiveSyncTool.java why MERGE_ON_READ tables mandate a _rt 
suffix?
   
case MERGE_ON_READ:
   this.snapshotTableName = cfg.tableName + SUFFIX_SNAPSHOT_TABLE;
   
   Also when i run my localhive-server ,and run HiveSyncTool i keep getting the 
below execption while creating tables
   
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:java.lang.IllegalArgumentException: Can not create a Path 
from an empty string)
   
   .However, if i do create with updatesql it works fine.Any pointers on the 
mismatch?
   
if (!hiveClient.doesTableExist (tableName)) {
   hiveClient.updateHiveSQL (createTableMOR (tableName, "default"));
   }
   {
   String sql ="CREATE EXTERNAL TABLE IF NOT EXISTS 
`REPLACE_DB`.`REPLACE_TBL`( `_hoodie_commit_time` string, 
`_hoodie_commit_seqno` string, `_hoodie_record_key` string, 
`_hoodie_partition_path` string, `_hoodie_file_name` string, `id` bigint, 
`name` string, `team` string, `ts` bigint) PARTITIONED BY (`recordKey` string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'";
   sql=sql.replace ("REPLACE_TBL",tableName).replace 
("REPLACE_DB",database);
   return sql;
   };



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] xushiyan commented on pull request #1619: [MINOR] Parallelize CI tests by modules

2020-05-13 Thread GitBox


xushiyan commented on pull request #1619:
URL: https://github.com/apache/incubator-hudi/pull/1619#issuecomment-628260195


   @bvaradar @ramachandranms thanks for chime in.. from the info 
@ramachandranms provides, hudi-spark-bundle report will be overwritten by the 
last submission and that submission should cover all modules. So previously 
when we have only 2 jobs (unit tests and integration tests), it is due to unit 
tests taking longer and covering all modules so that report was accurate. If 
integration test job takes longer and finishes later, coverage report will also 
be affected. We need to avoid this kind of dependency.
   
   When we modularize projects, by design we should be able to and encouraged 
to run unit tests for each module independently. The integration test then 
serves the purpose of checking cross-module functionalities. As for the overall 
coverage report, I think it should be a pure appending of all kinds of test 
reports without chronological overwriting. Thus, I think the setup perhaps can 
be improved in such way that all jobs are free to submit the reports whenever 
they finish and codecov  sums up all submissions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] ramachandranms commented on pull request #1619: [MINOR] Parallelize CI tests by modules

2020-05-13 Thread GitBox


ramachandranms commented on pull request #1619:
URL: https://github.com/apache/incubator-hudi/pull/1619#issuecomment-628259820


   @bvaradar i think a better idea would be to see how we can optimize the 
tests than breaking up into multiple jenkins jobs.
   
   regarding missing coverage: the problem is more about missing coverage due 
to splitting up than multiple uploads. since we have cross module tests, not 
running some of the modules for the uploading job will miss those tests and 
coverage will drop.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-894) Allow ability to use hive metastore thrift connection to register tables

2020-05-13 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal closed HUDI-894.

Resolution: Duplicate

> Allow ability to use hive metastore thrift connection to register tables
> 
>
> Key: HUDI-894
> URL: https://issues.apache.org/jira/browse/HUDI-894
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> At the moment, we have 2 ways to register the table with HMS 
> 1) Thrift based HMS
> 2) JDBC through hive server
> For secure clusters, the thrift based HMS works out of the box as long as the 
> correct namespace and connection string is provided, for JDBC, that does not 
> work out of the box. For users who want to register in secure clusters, we 
> want to allow ability to toggle between these 2 approaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-894) Allow ability to use hive metastore thrift connection to register tables

2020-05-13 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-894:
-
Status: Open  (was: New)

> Allow ability to use hive metastore thrift connection to register tables
> 
>
> Key: HUDI-894
> URL: https://issues.apache.org/jira/browse/HUDI-894
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> At the moment, we have 2 ways to register the table with HMS 
> 1) Thrift based HMS
> 2) JDBC through hive server
> For secure clusters, the thrift based HMS works out of the box as long as the 
> correct namespace and connection string is provided, for JDBC, that does not 
> work out of the box. For users who want to register in secure clusters, we 
> want to allow ability to toggle between these 2 approaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-894) Allow ability to use hive metastore thrift connection to register tables

2020-05-13 Thread Nishith Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishith Agarwal updated HUDI-894:
-
Status: In Progress  (was: Open)

> Allow ability to use hive metastore thrift connection to register tables
> 
>
> Key: HUDI-894
> URL: https://issues.apache.org/jira/browse/HUDI-894
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> At the moment, we have 2 ways to register the table with HMS 
> 1) Thrift based HMS
> 2) JDBC through hive server
> For secure clusters, the thrift based HMS works out of the box as long as the 
> correct namespace and connection string is provided, for JDBC, that does not 
> work out of the box. For users who want to register in secure clusters, we 
> want to allow ability to toggle between these 2 approaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-895) Reduce listing .hoodie folder when using timeline server

2020-05-13 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-895:
---

 Summary: Reduce listing .hoodie folder when using timeline server
 Key: HUDI-895
 URL: https://issues.apache.org/jira/browse/HUDI-895
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Writer Core
Reporter: Balaji Varadarajan
 Fix For: 0.6.0, 0.5.3


Currently, we are unnecessarily listing .hoodie folder when sending queries to 
timeline-server. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on pull request #1619: [MINOR] Parallelize CI tests by modules

2020-05-13 Thread GitBox


bvaradar commented on pull request #1619:
URL: https://github.com/apache/incubator-hudi/pull/1619#issuecomment-628250030


   @ramachandranms : Jobs in travis-ci.org has a hard timeout of 50 mins. 
Splitting the jobs serves as the only way to be under this time-limit. From 
@xushiyan comment, I see that both jobs uploads report from hudi-spark-bundle. 
If one of the job does not upload from hudi-spark-bundle, will it work ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-894) Allow ability to use hive metastore thrift connection to register tables

2020-05-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-894:

Labels: pull-request-available  (was: )

> Allow ability to use hive metastore thrift connection to register tables
> 
>
> Key: HUDI-894
> URL: https://issues.apache.org/jira/browse/HUDI-894
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.3
>
>
> At the moment, we have 2 ways to register the table with HMS 
> 1) Thrift based HMS
> 2) JDBC through hive server
> For secure clusters, the thrift based HMS works out of the box as long as the 
> correct namespace and connection string is provided, for JDBC, that does not 
> work out of the box. For users who want to register in secure clusters, we 
> want to allow ability to toggle between these 2 approaches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] n3nash commented on pull request #1605: [HUDI-894] : Adding ability to use hive metastore client for hive table registration instead of jdbc

2020-05-13 Thread GitBox


n3nash commented on pull request #1605:
URL: https://github.com/apache/incubator-hudi/pull/1605#issuecomment-628245562


   This has been addressed here -> 
https://github.com/apache/incubator-hudi/pull/1627



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1605: Adding ability to use hive metastore client for hive table registration instead of jdbc

2020-05-13 Thread GitBox


n3nash commented on a change in pull request #1605:
URL: https://github.com/apache/incubator-hudi/pull/1605#discussion_r424727083



##
File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
##
@@ -264,6 +264,7 @@ object DataSourceWriteOptions {
   val HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY = 
"hoodie.datasource.hive_sync.partition_extractor_class"
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = 
"hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = 
"hoodie.datasource.hive_sync.use_pre_apache_input_format"
+  val HIVE_JDBC_ENABLE_OPT_KEY = "hoodie.datasource.hive_sync.usejdbc"

Review comment:
   sounds good, will do





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-13 Thread Bhavani Sudha (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106561#comment-17106561
 ] 

Bhavani Sudha commented on HUDI-890:


[~garyli1019] thanks Gary. I ll take a look soon.

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to 
> break the inheritance chain
>  * #1388 [HUDI-681] Remove embeddedTimelineService from HoodieReadClient
>  * #1350 [HUDI-629]: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 [HUDI-799] Use appropriate FS when loading configs
>  * #1406 [HUDI-713] Fix conversion of Spark array of struct type to Avro 
> schema
>  * #1394 [HUDI-656][Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 [HUDI-850] Avoid unnecessary listings in incremental cleaning mode
>  * #1421 [HUDI-724] Parallelize getSmallFiles for partitions
>  * #1330 [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned 
> by Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 [HUDI-539] Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 [HUDI-782] Add support of Aliyun object storage service.
>  * #1432 [HUDI-716] Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 [HUDI-400] Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 [HUDI-742] Fix Java Math Exception
>  * #1416 [HUDI-717] Fixed usage of HiveDriver for DDL statements.
>  * #1427 [HUDI-727]: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 [HUDI-795] Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 [HUDI-852] adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 [HUDI-616] Fixed parquet files getting created on local FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-13 Thread Bhavani Sudha (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106542#comment-17106542
 ] 

Bhavani Sudha commented on HUDI-890:


Sounds good. I ll tag them additionally with 0.5.3 sometime today.

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to 
> break the inheritance chain
>  * #1388 [HUDI-681] Remove embeddedTimelineService from HoodieReadClient
>  * #1350 [HUDI-629]: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 [HUDI-799] Use appropriate FS when loading configs
>  * #1406 [HUDI-713] Fix conversion of Spark array of struct type to Avro 
> schema
>  * #1394 [HUDI-656][Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 [HUDI-850] Avoid unnecessary listings in incremental cleaning mode
>  * #1421 [HUDI-724] Parallelize getSmallFiles for partitions
>  * #1330 [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned 
> by Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 [HUDI-539] Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 [HUDI-782] Add support of Aliyun object storage service.
>  * #1432 [HUDI-716] Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 [HUDI-400] Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 [HUDI-742] Fix Java Math Exception
>  * #1416 [HUDI-717] Fixed usage of HiveDriver for DDL statements.
>  * #1427 [HUDI-727]: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 [HUDI-795] Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 [HUDI-852] adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 [HUDI-616] Fixed parquet files getting created on local FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-13 Thread Bhavani Sudha (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106542#comment-17106542
 ] 

Bhavani Sudha edited comment on HUDI-890 at 5/13/20, 6:09 PM:
--

[~vinoth]  Thanks. Sounds good. I ll tag them additionally with 0.5.3 sometime 
today.


was (Author: bhavanisudha):
Sounds good. I ll tag them additionally with 0.5.3 sometime today.

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to 
> break the inheritance chain
>  * #1388 [HUDI-681] Remove embeddedTimelineService from HoodieReadClient
>  * #1350 [HUDI-629]: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 [HUDI-799] Use appropriate FS when loading configs
>  * #1406 [HUDI-713] Fix conversion of Spark array of struct type to Avro 
> schema
>  * #1394 [HUDI-656][Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 [HUDI-850] Avoid unnecessary listings in incremental cleaning mode
>  * #1421 [HUDI-724] Parallelize getSmallFiles for partitions
>  * #1330 [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned 
> by Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 [HUDI-539] Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 [HUDI-782] Add support of Aliyun object storage service.
>  * #1432 [HUDI-716] Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 [HUDI-400] Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 [HUDI-742] Fix Java Math Exception
>  * #1416 [HUDI-717] Fixed usage of HiveDriver for DDL statements.
>  * #1427 [HUDI-727]: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 [HUDI-795] Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 [HUDI-852] adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 [HUDI-616] Fixed parquet files getting created on local FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-652) Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance chain

2020-05-13 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-652:

Fix Version/s: 0.5.3
   0.6.0

> Decouple HoodieReadClient and AbstractHoodieClient to break the inheritance 
> chain
> -
>
> Key: HUDI-652
> URL: https://issues.apache.org/jira/browse/HUDI-652
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since we decide to restructure the {{hudi-client}} module so that the 
> write-specific classes could be moved to {{hudi-write-common}}. Currently, 
> {{HoodieReadClient}} and {{HoodieWriteClient}} shared the same super class, 
> it's {{AbstractHoodieClient}}. To do that, we should decouple 
> {{HoodieReadClient}} and {{AbstractHoodieClient}}. Frome the source code, I 
> found {{HoodieReadClient}} does not depend on {{AbstractHoodieClient}} deeply.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-890) Prepare for 0.5.3 patch release

2020-05-13 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106533#comment-17106533
 ] 

Vinoth Chandar commented on HUDI-890:
-

[~bhavanisudha] JIRA lets you tag multiple issues with a fix version.. So we 
can tag these additionally against 0.5.3 as well 

> Prepare for 0.5.3 patch release
> ---
>
> Key: HUDI-890
> URL: https://issues.apache.org/jira/browse/HUDI-890
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.5.3
>
>
> The following commits are included in this release.
>  * #1372 [HUDI-652] Decouple HoodieReadClient and AbstractHoodieClient to 
> break the inheritance chain
>  * #1388 [HUDI-681] Remove embeddedTimelineService from HoodieReadClient
>  * #1350 [HUDI-629]: Replace Guava's Hashing with an equivalent in 
> NumericUtils.java
>  * #1505 [HUDI - 738] Add validation to DeltaStreamer to fail fast when 
> filterDupes is enabled on UPSERT mode.
>  * #1517 [HUDI-799] Use appropriate FS when loading configs
>  * #1406 [HUDI-713] Fix conversion of Spark array of struct type to Avro 
> schema
>  * #1394 [HUDI-656][Performance] Return a dummy Spark relation after writing 
> the DataFrame
>  * #1576 [HUDI-850] Avoid unnecessary listings in incremental cleaning mode
>  * #1421 [HUDI-724] Parallelize getSmallFiles for partitions
>  * #1330 [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned 
> by Date type columns
>  * #1413 Add constructor to HoodieROTablePathFilter
>  * #1415 [HUDI-539] Make ROPathFilter conf member serializable
>  * #1578 Add changes for presto mor queries
>  * #1506 [HUDI-782] Add support of Aliyun object storage service.
>  * #1432 [HUDI-716] Exception: Not an Avro data file when running 
> HoodieCleanClient.runClean
>  * #1422 [HUDI-400] Check upgrade from old plan to new plan for compaction
>  * #1448 [MINOR] Update DOAP with 0.5.2 Release
>  * #1466 [HUDI-742] Fix Java Math Exception
>  * #1416 [HUDI-717] Fixed usage of HiveDriver for DDL statements.
>  * #1427 [HUDI-727]: Copy default values of fields if not present when 
> rewriting incoming record with new schema
>  * #1515 [HUDI-795] Handle auto-deleted empty aux folder
>  * #1547 [MINOR]: Fix cli docs for DeltaStreamer
>  * #1580 [HUDI-852] adding check for table name for Append Save mode
>  * #1537 [MINOR] fixed building IndexFileFilter with a wrong condition in 
> HoodieGlobalBloomIndex class
>  * #1434 [HUDI-616] Fixed parquet files getting created on local FS



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-629) Replace Guava's Hashing with an equivalent in NumericUtils.java

2020-05-13 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar resolved HUDI-629.
-
Fix Version/s: 0.5.3
   Resolution: Fixed

> Replace Guava's Hashing with an equivalent in NumericUtils.java
> ---
>
> Key: HUDI-629
> URL: https://issues.apache.org/jira/browse/HUDI-629
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0, 0.5.3
>
>   Original Estimate: 4h
>  Time Spent: 10m
>  Remaining Estimate: 3h 50m
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-629) Replace Guava's Hashing with an equivalent in NumericUtils.java

2020-05-13 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-629:
-

> Replace Guava's Hashing with an equivalent in NumericUtils.java
> ---
>
> Key: HUDI-629
> URL: https://issues.apache.org/jira/browse/HUDI-629
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>   Original Estimate: 4h
>  Time Spent: 10m
>  Remaining Estimate: 3h 50m
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-891) Improve websites for graduation required content

2020-05-13 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken resolved HUDI-891.
-
Resolution: Fixed

> Improve websites for graduation required content
> 
>
> Key: HUDI-891
> URL: https://issues.apache.org/jira/browse/HUDI-891
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> check site
> [https://whimsy.apache.org/pods/project/hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-893) Add spark datasource V2 reader support for Hudi tables

2020-05-13 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-893:


 Summary: Add spark datasource V2 reader support for Hudi tables
 Key: HUDI-893
 URL: https://issues.apache.org/jira/browse/HUDI-893
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Spark Integration
Reporter: Nishith Agarwal
Assignee: Nishith Agarwal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-889] Writer supports useJdbc configuration when hive synchronization is enabled (#1627)

2020-05-13 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 32bada2  [HUDI-889] Writer supports useJdbc configuration when hive 
synchronization is enabled (#1627)
32bada2 is described below

commit 32bada29dc95f1d5910713ae6b4f4a4ef39677c9
Author: cxzl25 
AuthorDate: Thu May 14 00:20:13 2020 +0800

[HUDI-889] Writer supports useJdbc configuration when hive synchronization 
is enabled (#1627)
---
 hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java| 2 ++
 hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala| 2 ++
 hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala | 4 +++-
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java 
b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
index 34f2ef2..9983c19 100644
--- a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
+++ b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
@@ -284,6 +284,8 @@ public class DataSourceUtils {
 hiveSyncConfig.partitionValueExtractorClass =
 
props.getString(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY(),
 SlashEncodedDayPartitionValueExtractor.class.getName());
+hiveSyncConfig.useJdbc = 
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_USE_JDBC_OPT_KEY(),
+DataSourceWriteOptions.DEFAULT_HIVE_USE_JDBC_OPT_VAL()));
 return hiveSyncConfig;
   }
 }
diff --git a/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala 
b/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
index 9d7d6cc..3d1172f 100644
--- a/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
+++ b/hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala
@@ -264,6 +264,7 @@ object DataSourceWriteOptions {
   val HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY = 
"hoodie.datasource.hive_sync.partition_extractor_class"
   val HIVE_ASSUME_DATE_PARTITION_OPT_KEY = 
"hoodie.datasource.hive_sync.assume_date_partitioning"
   val HIVE_USE_PRE_APACHE_INPUT_FORMAT_OPT_KEY = 
"hoodie.datasource.hive_sync.use_pre_apache_input_format"
+  val HIVE_USE_JDBC_OPT_KEY = "hoodie.datasource.hive_sync.use_jdbc"
 
   // DEFAULT FOR HIVE SPECIFIC CONFIGS
   val DEFAULT_HIVE_SYNC_ENABLED_OPT_VAL = "false"
@@ -276,4 +277,5 @@ object DataSourceWriteOptions {
   val DEFAULT_HIVE_PARTITION_EXTRACTOR_CLASS_OPT_VAL = 
classOf[SlashEncodedDayPartitionValueExtractor].getCanonicalName
   val DEFAULT_HIVE_ASSUME_DATE_PARTITION_OPT_VAL = "false"
   val DEFAULT_USE_PRE_APACHE_INPUT_FORMAT_OPT_VAL = "false"
+  val DEFAULT_HIVE_USE_JDBC_OPT_VAL = "true"
 }
diff --git 
a/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala 
b/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
index 5456782..efcf5e1 100644
--- a/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
+++ b/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
@@ -217,7 +217,8 @@ private[hudi] object HoodieSparkSqlWriter {
   HIVE_URL_OPT_KEY -> DEFAULT_HIVE_URL_OPT_VAL,
   HIVE_PARTITION_FIELDS_OPT_KEY -> DEFAULT_HIVE_PARTITION_FIELDS_OPT_VAL,
   HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> 
DEFAULT_HIVE_PARTITION_EXTRACTOR_CLASS_OPT_VAL,
-  HIVE_STYLE_PARTITIONING_OPT_KEY -> 
DEFAULT_HIVE_STYLE_PARTITIONING_OPT_VAL
+  HIVE_STYLE_PARTITIONING_OPT_KEY -> 
DEFAULT_HIVE_STYLE_PARTITIONING_OPT_VAL,
+  HIVE_USE_JDBC_OPT_KEY -> DEFAULT_HIVE_USE_JDBC_OPT_VAL
 ) ++ translateStorageTypeToTableType(parameters)
   }
 
@@ -248,6 +249,7 @@ private[hudi] object HoodieSparkSqlWriter {
 hiveSyncConfig.partitionFields =
   
ListBuffer(parameters(HIVE_PARTITION_FIELDS_OPT_KEY).split(",").map(_.trim).filter(!_.isEmpty).toList:
 _*)
 hiveSyncConfig.partitionValueExtractorClass = 
parameters(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY)
+hiveSyncConfig.useJdbc = parameters(HIVE_USE_JDBC_OPT_KEY).toBoolean
 hiveSyncConfig
   }
 



[jira] [Created] (HUDI-892) RealtimeParquetInputFormat should skip adding projection columns if there are no log files

2020-05-13 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-892:
---

 Summary: RealtimeParquetInputFormat should skip adding projection 
columns if there are no log files
 Key: HUDI-892
 URL: https://issues.apache.org/jira/browse/HUDI-892
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Hive Integration, Performance
Reporter: Vinoth Chandar






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch asf-site updated: [HUDI-891] Improve websites for graduation required content (#1628)

2020-05-13 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 20a0ee2  [HUDI-891] Improve websites for graduation required content 
(#1628)
20a0ee2 is described below

commit 20a0ee2a4d3d8251da3dac97d657db8163450a29
Author: lamber-ken 
AuthorDate: Wed May 13 22:56:37 2020 +0800

[HUDI-891] Improve websites for graduation required content (#1628)
---
 docs/_includes/footer.html | 22 ++
 docs/_sass/hudi_style/skins/_hudi.scss |  3 +++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/docs/_includes/footer.html b/docs/_includes/footer.html
index 71c88f0..e70b77c 100644
--- a/docs/_includes/footer.html
+++ b/docs/_includes/footer.html
@@ -2,12 +2,26 @@
 
   
 
-  https://apache.org;>
-
-  
+  
+
+  
+https://apache.org;>
+  
+
+  
+  
+https://www.apache.org/events/current-event.html;>
+  https://www.apache.org/events/current-event-234x60.png; />
+
+  
+
+  
 
 
-  Copyright  2019 https://apache.org;>The Apache Software Foundation, Licensed under 
the Apache License, Version 2.0.
+  https://www.apache.org/licenses/;>License | https://www.apache.org/security/;>Security | https://www.apache.org/foundation/thanks.html;>Thanks | https://www.apache.org/foundation/sponsorship.html;>Sponsorship
+
+
+  Copyright  2019 https://apache.org;>The Apache Software Foundation, Licensed under 
the https://www.apache.org/licenses/LICENSE-2.0;> Apache License, 
Version 2.0.
   Hudi, Apache and the Apache feather logo are trademarks of The Apache 
Software Foundation. Privacy Policy
   
   Apache Hudi is an effort undergoing incubation at The Apache Software 
Foundation (ASF), sponsored by the http://incubator.apache.org/;>Apache Incubator.
diff --git a/docs/_sass/hudi_style/skins/_hudi.scss 
b/docs/_sass/hudi_style/skins/_hudi.scss
index 96c8d5f..898046b 100644
--- a/docs/_sass/hudi_style/skins/_hudi.scss
+++ b/docs/_sass/hudi_style/skins/_hudi.scss
@@ -117,6 +117,9 @@ table {
   background-position: 1.2em center;
 }
 
+.table-apache-info td {
+  border: 0px;
+}
 
 
 



[jira] [Closed] (HUDI-869) Add support for alluxio

2020-05-13 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-869.
-
Resolution: Implemented

Implemented via master branch: 32ea4c70ff259798dee571670dbcc501cc477cb0

> Add support  for alluxio
> 
>
> Key: HUDI-869
> URL: https://issues.apache.org/jira/browse/HUDI-869
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> alluxio can be applied to accelerate hudi queries, so we can integrate it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated (404c7e8 -> 32ea4c7)

2020-05-13 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 404c7e8  [HUDI-884] Shade avro and parquet-avro in 
hudi-hive-sync-bundle (#1618)
 add 32ea4c7  [HUDI-869] Add support for alluxio (#1608)

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hudi/common/fs/StorageSchemes.java   | 4 +++-
 .../test/java/org/apache/hudi/common/storage/TestStorageSchemes.java  | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)



[jira] [Assigned] (HUDI-891) Improve websites for graduation required content

2020-05-13 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken reassigned HUDI-891:
---

Assignee: lamber-ken

> Improve websites for graduation required content
> 
>
> Key: HUDI-891
> URL: https://issues.apache.org/jira/browse/HUDI-891
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> check site
> [https://whimsy.apache.org/pods/project/hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-891) Improve websites for graduation required content

2020-05-13 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-891:

Status: Open  (was: New)

> Improve websites for graduation required content
> 
>
> Key: HUDI-891
> URL: https://issues.apache.org/jira/browse/HUDI-891
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Priority: Major
>
> check site
> [https://whimsy.apache.org/pods/project/hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-891) Improve websites for graduation required content

2020-05-13 Thread lamber-ken (Jira)
lamber-ken created HUDI-891:
---

 Summary: Improve websites for graduation required content
 Key: HUDI-891
 URL: https://issues.apache.org/jira/browse/HUDI-891
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: lamber-ken


check site

[https://whimsy.apache.org/pods/project/hudi]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)