[GitHub] [incubator-hudi] liujianhuiouc commented on issue #1216: [HUDI-525] lack of insert info in delta_commit inflight

2020-01-14 Thread GitBox
liujianhuiouc commented on issue #1216: [HUDI-525] lack of insert info in 
delta_commit inflight
URL: https://github.com/apache/incubator-hudi/pull/1216#issuecomment-574522300
 
 
   sometimes will to know the num of record of bulk inserts as the inflight 
state not successful transition to complete


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-01-14 Thread GitBox
pratyakshsharma commented on issue #1150: [HUDI-288]: Add support for ingesting 
multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#issuecomment-574505738
 
 
   @vinothchandar @bvaradar Sorry was busy with some other stuff. Today I will 
try to address most of the comments here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1150: [HUDI-288]: Add support for ingesting multiple kafka streams in a single DeltaStreamer deployment

2020-01-14 Thread GitBox
vinothchandar commented on issue #1150: [HUDI-288]: Add support for ingesting 
multiple kafka streams in a single DeltaStreamer deployment
URL: https://github.com/apache/incubator-hudi/pull/1150#issuecomment-574501169
 
 
   @pratyakshsharma  any updates? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-536) Update release notes to include KeyGenerator package changes

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-536:

Fix Version/s: 0.5.1

> Update release notes to include KeyGenerator package changes
> 
>
> Key: HUDI-536
> URL: https://issues.apache.org/jira/browse/HUDI-536
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Brandon Scheller
>Priority: Major
> Fix For: 0.5.1
>
>
> The change introduced here:
>  [https://github.com/apache/incubator-hudi/pull/1194]
> Refactors hudi keygenerators into their own package.
> We need to make this a backwards compatible change or update the release 
> notes to address this.
> Specifically:
> org.apache.hudi.ComplexKeyGenerator -> 
> org.apache.hudi.keygen.ComplexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-295) Do one-time cleanup of Hudi git history

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-295:

Fix Version/s: (was: 0.5.1)
   0.6.0

> Do one-time cleanup of Hudi git history
> ---
>
> Key: HUDI-295
> URL: https://issues.apache.org/jira/browse/HUDI-295
> Project: Apache Hudi (incubating)
>  Issue Type: Task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.6.0
>
>
> https://lists.apache.org/thread.html/dc6eb516e248088dac1a2b5c9690383dfe2eb3912f76bbe9dd763c2b@



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-537) Introduce administration CLI to update hoodie properties

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-537:

Priority: Blocker  (was: Major)

> Introduce administration CLI to update hoodie properties 
> -
>
> Key: HUDI-537
> URL: https://issues.apache.org/jira/browse/HUDI-537
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.1
>
>
> This tool can be used to change or add any table properties stored in 
> hoodie.properties file. This is expected to be used in maintenance mode when 
> ingestion or queries are not running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-509) Rename "views" into "query types" according to cWiki

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-509:

Fix Version/s: 0.5.1

> Rename "views" into "query types" according to cWiki
> 
>
> Key: HUDI-509
> URL: https://issues.apache.org/jira/browse/HUDI-509
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-510) Update site documentation in sync with cWiki

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-510:

Fix Version/s: 0.5.1

> Update site documentation in sync with cWiki
> 
>
> Key: HUDI-510
> URL: https://issues.apache.org/jira/browse/HUDI-510
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-509) Rename "views" into "query types" according to cWiki

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-509:

Priority: Blocker  (was: Major)

> Rename "views" into "query types" according to cWiki
> 
>
> Key: HUDI-509
> URL: https://issues.apache.org/jira/browse/HUDI-509
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-403) Publish a deployment guide talking about deployment options, upgrading etc

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-403:

Priority: Blocker  (was: Major)

> Publish a deployment guide talking about deployment options, upgrading etc
> --
>
> Key: HUDI-403
> URL: https://issues.apache.org/jira/browse/HUDI-403
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Blocker
> Fix For: 0.5.1
>
>
> Things to cover 
>  # Upgrade readers first, Upgrade writers next, Principles of compatibility 
> followed
>  # DeltaStreamer Deployment models
>  # Scheduling Compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-403) Publish a deployment guide talking about deployment options, upgrading etc

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-403:
---

Assignee: Balaji Varadarajan  (was: Vinoth Chandar)

> Publish a deployment guide talking about deployment options, upgrading etc
> --
>
> Key: HUDI-403
> URL: https://issues.apache.org/jira/browse/HUDI-403
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Docs
>Reporter: Vinoth Chandar
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
>
> Things to cover 
>  # Upgrade readers first, Upgrade writers next, Principles of compatibility 
> followed
>  # DeltaStreamer Deployment models
>  # Scheduling Compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-83) Support for timestamp datatype in Hudi

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-83:
---
Fix Version/s: (was: 0.5.1)
   0.6.0

> Support for timestamp datatype in Hudi
> --
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.6.0
>
>
> [https://github.com/apache/incubator-hudi/issues/543] ; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-238) Make separate release for hudi spark/scala based packages for scala 2.12

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-238:

Priority: Blocker  (was: Major)

> Make separate release for hudi spark/scala based packages for scala 2.12 
> -
>
> Key: HUDI-238
> URL: https://issues.apache.org/jira/browse/HUDI-238
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative, Usability
>Reporter: Balaji Varadarajan
>Assignee: Tadas Sugintas
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/apache/incubator-hudi/issues/881#issuecomment-528700749]
> Suspects: 
> h3. Hudi utilities package 
> bringing in spark-streaming-kafka-0.8* 
> {code:java}
> [INFO] Scanning for projects...
> [INFO] 
> [INFO] ---< org.apache.hudi:hudi-utilities 
> >---
> [INFO] Building hudi-utilities 0.5.0-SNAPSHOT
> [INFO] [ jar 
> ]-
> [INFO] 
> [INFO] --- maven-dependency-plugin:3.1.1:tree (default-cli) @ hudi-utilities 
> ---
> [INFO] org.apache.hudi:hudi-utilities:jar:0.5.0-SNAPSHOT
> [INFO] ...
> [INFO] +- org.apache.hudi:hudi-client:jar:0.5.0-SNAPSHOT:compile
>...
> [INFO] 
> [INFO] +- org.apache.hudi:hudi-spark:jar:0.5.0-SNAPSHOT:compile
> [INFO] |  \- org.scala-lang:scala-library:jar:2.11.8:compile
> [INFO] +- log4j:log4j:jar:1.2.17:compile
>...
> [INFO] +- org.apache.spark:spark-core_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.7:provided
> [INFO] |  |  +- org.apache.avro:avro-ipc:jar:1.7.7:provided
> [INFO] |  |  \- org.apache.avro:avro-ipc:jar:tests:1.7.7:provided
> [INFO] |  +- com.twitter:chill_2.11:jar:0.8.0:provided
> [INFO] |  +- com.twitter:chill-java:jar:0.8.0:provided
> [INFO] |  +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:provided
> [INFO] |  +- org.apache.spark:spark-launcher_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-network-common_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-network-shuffle_2.11:jar:2.1.0:provided
> [INFO] |  +- org.apache.spark:spark-unsafe_2.11:jar:2.1.0:provided
> [INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:provided
> [INFO] |  +- org.apache.curator:curator-recipes:jar:2.4.0:provided
> [INFO] |  +- org.apache.commons:commons-lang3:jar:3.5:provided
> [INFO] |  +- org.apache.commons:commons-math3:jar:3.4.1:provided
> [INFO] |  +- com.google.code.findbugs:jsr305:jar:1.3.9:provided
> [INFO] |  +- org.slf4j:slf4j-api:jar:1.7.16:compile
> [INFO] |  +- org.slf4j:jul-to-slf4j:jar:1.7.16:provided
> [INFO] |  +- org.slf4j:jcl-over-slf4j:jar:1.7.16:provided
> [INFO] |  +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile
> [INFO] |  +- com.ning:compress-lzf:jar:1.0.3:provided
> [INFO] |  +- org.xerial.snappy:snappy-java:jar:1.1.2.6:compile
> [INFO] |  +- net.jpountz.lz4:lz4:jar:1.3.0:compile
> [INFO] |  +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:provided
> [INFO] |  +- commons-net:commons-net:jar:2.2:provided
>
> [INFO] +- org.apache.spark:spark-sql_2.11:jar:2.1.0:provided
> [INFO] |  +- com.univocity:univocity-parsers:jar:2.2.1:provided
> [INFO] |  +- org.apache.spark:spark-sketch_2.11:jar:2.1.0:provided
> [INFO] |  \- org.apache.spark:spark-catalyst_2.11:jar:2.1.0:provided
> [INFO] | +- org.codehaus.janino:janino:jar:3.0.0:provided
> [INFO] | +- org.codehaus.janino:commons-compiler:jar:3.0.0:provided
> [INFO] | \- org.antlr:antlr4-runtime:jar:4.5.3:provided
> [INFO] +- com.databricks:spark-avro_2.11:jar:4.0.0:provided
> [INFO] +- org.apache.spark:spark-streaming_2.11:jar:2.1.0:compile
> [INFO] +- org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.0:compile
> [INFO] |  \- org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile
> [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile
> [INFO] | +- 
> org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile
> [INFO] | \- org.apache.kafka:kafka-clients:jar:0.8.2.1:compile
> [INFO] +- io.dropwizard.metrics:metrics-core:jar:4.0.2:compile
> [INFO] +- org.antlr:stringtemplate:jar:4.0.2:compile
> [INFO] |  \- org.antlr:antlr-runtime:jar:3.3:compile
> [INFO] +- com.beust:jcommander:jar:1.72:compile
> [INFO] +- com.twitter:bijection-avro_2.11:jar:0.9.2:compile
> [INFO] |  \- com.twitter:bijection-core_2.11:jar:0.9.2:compile
> [INFO] +- io.confluent:kafka-avro-serializer:jar:3.0.0:compile
> [INFO] +- io.confluent:common-config:jar:3.0.0:compile
> [INFO] +- io.confluent:common-utils:jar:3.0.0:compile
> [INFO] |  \- com.101tec:zkclient:jar:0.5:compile
> [INFO] +- io.confluent:kafka-schema-registry-client:jar:3.0.0:compile
> [INFO] 

[jira] [Updated] (HUDI-83) Support for timestamp datatype in Hudi

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-83:
---
Status: Closed  (was: Patch Available)

> Support for timestamp datatype in Hudi
> --
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.5.1
>
>
> [https://github.com/apache/incubator-hudi/issues/543] ; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-83) Support for timestamp datatype in Hudi

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reopened HUDI-83:


> Support for timestamp datatype in Hudi
> --
>
> Key: HUDI-83
> URL: https://issues.apache.org/jira/browse/HUDI-83
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Major
> Fix For: 0.5.1
>
>
> [https://github.com/apache/incubator-hudi/issues/543] ; related issues 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-507) Support \ t split hdfs source

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-507:

Fix Version/s: (was: 0.5.1)
   0.6.0

> Support \ t split hdfs source
> -
>
> Key: HUDI-507
> URL: https://issues.apache.org/jira/browse/HUDI-507
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Minor
> Fix For: 0.6.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> hi,hudi
>  
> Current Hudi data source does not support HDFS file data splitting with \ t 
> separator
>  I want to complete it and contribute to the community.
>  The main change is the addition of the TextDFSSource class to provide 
> support.
>  The specific new logic is: split the hdfs data according to the delimiter, 
> and then map it to the source.avsc pattern
>  
> Or do some other symbol format as an extension
> thanks,
> liujh
>  
> [~vinoth]   Please help with suggestions
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-535) Compatibility Issue : Hudi Readers using newer layout version can see intermittent errors due to race conditions with concurrent compactions in MOR tables

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-535:

Status: Open  (was: New)

> Compatibility Issue : Hudi Readers using newer layout version can see 
> intermittent errors due to race conditions with concurrent compactions in MOR 
> tables 
> ---
>
> Key: HUDI-535
> URL: https://issues.apache.org/jira/browse/HUDI-535
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
> Fix For: 0.5.1
>
>
> As part of recent changes in 0.5.1 to avoid renames in metadata, the 
> compaction plans are no longer getting written to .aux folder. Instead the 
> upgraded reader reads from timeline (.hoodie). This would work as long as 
> writer is also upgraded but if reader alone is upgraded, there would be 
> conditions when hudi reader tried to read from compaction requested file 
> which was renamed by concurrent compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-537) Introduce administration CLI to update hoodie properties

2020-01-14 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-537:

Status: Open  (was: New)

> Introduce administration CLI to update hoodie properties 
> -
>
> Key: HUDI-537
> URL: https://issues.apache.org/jira/browse/HUDI-537
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>
> This tool can be used to change or add any table properties stored in 
> hoodie.properties file. This is expected to be used in maintenance mode when 
> ingestion or queries are not running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-537) Introduce administration CLI to update hoodie properties

2020-01-14 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-537:
---

Assignee: Vinoth Chandar

> Introduce administration CLI to update hoodie properties 
> -
>
> Key: HUDI-537
> URL: https://issues.apache.org/jira/browse/HUDI-537
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: CLI
>Reporter: Balaji Varadarajan
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>
> This tool can be used to change or add any table properties stored in 
> hoodie.properties file. This is expected to be used in maintenance mode when 
> ingestion or queries are not running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-535) Compatibility Issue : Hudi Readers using newer layout version can see intermittent errors due to race conditions with concurrent compactions in MOR tables

2020-01-14 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reassigned HUDI-535:
---

Assignee: Balaji Varadarajan

> Compatibility Issue : Hudi Readers using newer layout version can see 
> intermittent errors due to race conditions with concurrent compactions in MOR 
> tables 
> ---
>
> Key: HUDI-535
> URL: https://issues.apache.org/jira/browse/HUDI-535
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Blocker
> Fix For: 0.5.1
>
>
> As part of recent changes in 0.5.1 to avoid renames in metadata, the 
> compaction plans are no longer getting written to .aux folder. Instead the 
> upgraded reader reads from timeline (.hoodie). This would work as long as 
> writer is also upgraded but if reader alone is upgraded, there would be 
> conditions when hudi reader tried to read from compaction requested file 
> which was renamed by concurrent compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-537) Introduce administration CLI to update hoodie properties

2020-01-14 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-537:
---

 Summary: Introduce administration CLI to update hoodie properties 
 Key: HUDI-537
 URL: https://issues.apache.org/jira/browse/HUDI-537
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: CLI
Reporter: Balaji Varadarajan
 Fix For: 0.5.1


This tool can be used to change or add any table properties stored in 
hoodie.properties file. This is expected to be used in maintenance mode when 
ingestion or queries are not running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in adding deletion support to Quick Start

2020-01-14 Thread GitBox
nsivabalan commented on issue #1225: [MINOR] Adding util methods to assist in 
adding deletion support to Quick Start
URL: https://github.com/apache/incubator-hudi/pull/1225#issuecomment-574492697
 
 
   @bhasudha : I have changed the way we wanna generate deletes. Basically I 
pass in insert records for which delete records will be generated. If we go 
with previous approach of generating random deletes, I couldn't verify if 
deletes actually deleted some records. So, have taken this approach.
   
   Steps I plan to add to Quick start is as follows
   
   - Generate a new batch of inserts.
   - Fetch all records from this new batch (// fix the rider value below since 
each batch will have unique rider value)
   val ds = spark.sql("select uuid, partitionPath from  hudi_ro_table where 
rider = 'rider-213'")
   - Generate delete records
   val deletes = dataGen.generateDeletes(ds.collectAsList())
   - Issue deletes
   val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
   df.write.format("org.apache.hudi").
   options(getQuickstartWriteConfigs).
   option(OPERATION_OPT_KEY,"delete").
   option(PRECOMBINE_FIELD_OPT_KEY, "ts").
   option(RECORDKEY_FIELD_OPT_KEY, "uuid").
   option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
   option(TABLE_NAME, tableName).
   mode(Append).
   save(basePath);
   
   - Same select query above should fetch 0 records since all records have been 
deleted. 
   spark.sql("select uuid, partitionPath from  hudi_ro_table where rider = 
'rider-213'").count()
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #159

2020-01-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.20 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle

[incubator-hudi] branch master updated (2bb0c21 -> 9b2944a)

2020-01-14 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 2bb0c21  Fix conversion of Spark struct type to Avro schema
 add 9b2944a  [MINOR] Refactor unnecessary boxing inside TypedProperties 
code (#1227)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/common/util/TypedProperties.java | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)



[GitHub] [incubator-hudi] vinothchandar merged pull request #1227: [MINOR] Refactor unnecessary boxing inside TypedProperties code

2020-01-14 Thread GitBox
vinothchandar merged pull request #1227: [MINOR] Refactor unnecessary boxing 
inside TypedProperties code
URL: https://github.com/apache/incubator-hudi/pull/1227
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-14 Thread GitBox
bvaradar commented on issue #1157: [HUDI-332]Add operation type 
(insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#issuecomment-574440583
 
 
   @hddong : Did you forgot to push the diff. Still seeing conflicts. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] ssomuah opened a new issue #1228: No FileSystem for scheme: abfss

2020-01-14 Thread GitBox
ssomuah opened a new issue #1228: No FileSystem for scheme: abfss
URL: https://github.com/apache/incubator-hudi/issues/1228
 
 
   Hi,
   I'm trying to use hudi to write to one of the Azure storage container file 
systems, ADLS Gen 2 (abfs://). The issue I'm facing is that in 
`HoodieROTablePathFilter` it tries to get a file path passing in a blank hadoop 
configuration. This manifests as `java.io.IOException: No FileSystem for 
scheme: abfss` because it doesn't have any of the configuration. Is there any 
way to work around this? How does it work for file systems besides "hdfs://"
   
   Environment Description
   
   Hudi version : master
   
   Spark version : 2.4.4
   
   Hadoop version : 2.7.3
   
   Storage (HDFS/S3/GCS..) : ABFSS://
   
   Running on Docker? (yes/no) : no
   
   Additional context
   
   The problematic line is 
   
https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96
   
   Add any other context about the problem here.
   
   Stacktrace
   java.io.IOException: No FileSystem for scheme: abfss
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)
at 
scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:349)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$bulkListLeafFiles$2.apply(InMemoryFileIndex.scala:261)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$bulkListLeafFiles$2.apply(InMemoryFileIndex.scala:260)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.bulkListLeafFiles(InMemoryFileIndex.scala:260)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:344)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$bulkListLeafFiles$2.apply(InMemoryFileIndex.scala:261)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$bulkListLeafFiles$2.apply(InMemoryFileIndex.scala:260)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.bulkListLeafFiles(InMemoryFileIndex.scala:260)
at 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:344)
at 

[GitHub] [incubator-hudi] hddong commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-14 Thread GitBox
hddong commented on issue #1157: [HUDI-332]Add operation type 
(insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#issuecomment-574427217
 
 
   @bvaradar @vinothchandar Thanks. rebased this and there are no conflicts now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-536) Update release notes to include KeyGenerator package changes

2020-01-14 Thread Brandon Scheller (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Scheller updated HUDI-536:
--
Description: 
The change introduced here:
 [https://github.com/apache/incubator-hudi/pull/1194]

Refactors hudi keygenerators into their own package.

We need to make this a backwards compatible change or update the release notes 
to address this.

Specifically:

org.apache.hudi.ComplexKeyGenerator -> 
org.apache.hudi.keygen.ComplexKeyGenerator

  was:
The change introduced here:
[https://github.com/apache/incubator-hudi/pull/1194]

Refactors hudi keygenerators into their own package.

We need to make this a backwards compatible change or update the release notes 
to address this.


> Update release notes to include KeyGenerator package changes
> 
>
> Key: HUDI-536
> URL: https://issues.apache.org/jira/browse/HUDI-536
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Brandon Scheller
>Priority: Major
>
> The change introduced here:
>  [https://github.com/apache/incubator-hudi/pull/1194]
> Refactors hudi keygenerators into their own package.
> We need to make this a backwards compatible change or update the release 
> notes to address this.
> Specifically:
> org.apache.hudi.ComplexKeyGenerator -> 
> org.apache.hudi.keygen.ComplexKeyGenerator



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bschell commented on issue #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-14 Thread GitBox
bschell commented on issue #1194: [HUDI-326] Add support to delete records with 
only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#issuecomment-574421754
 
 
   @n3nash Here is the JIRA: https://issues.apache.org/jira/browse/HUDI-536 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-536) Update release notes to include KeyGenerator package changes

2020-01-14 Thread Brandon Scheller (Jira)
Brandon Scheller created HUDI-536:
-

 Summary: Update release notes to include KeyGenerator package 
changes
 Key: HUDI-536
 URL: https://issues.apache.org/jira/browse/HUDI-536
 Project: Apache Hudi (incubating)
  Issue Type: Bug
Reporter: Brandon Scheller


The change introduced here:
[https://github.com/apache/incubator-hudi/pull/1194]

Refactors hudi keygenerators into their own package.

We need to make this a backwards compatible change or update the release notes 
to address this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1212: [HUDI-509] Renaming code in sync with cWiki restructuring

2020-01-14 Thread GitBox
vinothchandar commented on issue #1212: [HUDI-509] Renaming code in sync with 
cWiki restructuring
URL: https://github.com/apache/incubator-hudi/pull/1212#issuecomment-574421454
 
 
   @n3nash are you able to review this?  or any other committer really.. (wish 
there was a gh team we could mention.. ) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-93) Enforce semantics on HoodieRecordPayload to allow for a consistent instantiation of custom payloads via reflection

2020-01-14 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-93:
-
Status: In Progress  (was: Open)

> Enforce semantics on HoodieRecordPayload to allow for a consistent 
> instantiation of custom payloads via reflection
> --
>
> Key: HUDI-93
> URL: https://issues.apache.org/jira/browse/HUDI-93
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Nishith Agarwal
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> At the moment, the expectation is that any implementation of 
> HoodieRecordPayload needs to have a constructor with Optional. 
> But this is not enforced in the HoodieRecordPayload interface. We require a 
> method to enforce a semantic that works consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-31) MOR - Allow partitioner to pick more than one small file for inserting new data #494

2020-01-14 Thread Pratyaksh Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-31:
-
Status: In Progress  (was: Open)

> MOR - Allow partitioner to pick more than one small file for inserting new 
> data #494
> 
>
> Key: HUDI-31
> URL: https://issues.apache.org/jira/browse/HUDI-31
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Vinoth Chandar
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> https://github.com/uber/hudi/issues/494



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12

2020-01-14 Thread GitBox
zhedoubushishi commented on issue #1109: [HUDI-238] - Migrating to Scala 2.12
URL: https://github.com/apache/incubator-hudi/pull/1109#issuecomment-574321102
 
 
   > > Sure. I will send another PR. Currently our work only supports 2.12, but 
I can try to see if it is possible to support both 2.11 and 2.12.
   > 
   > @zhedoubushishi : Is your change different from what is being done as part 
of this PR ? Anyways, it would help if you can open a WIP PR and we can cross 
check with this PR to see if we are missing anything here.
   > 
   > Also @zhedoubushishi @ezhux : I see this info in stack-overflow to build 
both 2.11 and 2.12 versions of packages. https://stackoverflow.com/a/46785150. 
Can you check if this model would work for hudi ? We would need to change pom 
for hudi-spark and its dependents : hudi-spark-bundle and hudi-utilities-bundle
   
   I created a PR here: https://github.com/apache/incubator-hudi/pull/1226.
   This PR is compatible with Scala 2.12 so you can build it with following 
command:
   ```
   mvn clean install -Dscala.version=2.12.10 -Dscala.binary.version=2.12 
-DskipTests
   ```
   
   I am not sure if this https://stackoverflow.com/a/46785150 could work for 
Hudi. 
   My understanding is Hudi has many Scala dependencies. Say by default, 
scala.version=2.11, then hudi-spark will depend on many xxx_2.11 packages. Then 
does it make sense to compile Hudi code with both Scala 2.12 and Scala 2.11?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1208: [HUDI-304] Bring back spotless plugin

2020-01-14 Thread GitBox
bvaradar commented on issue #1208: [HUDI-304] Bring back spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574314755
 
 
   """The changes in this PR has nothing to do with 120 character limit. And I 
randomly checked some modified files, they are modified after first introduced 
spotless plugin at 2019/10/10. The master won't broken as this PR is rebased to 
master branch.""
   
   @leesf : Doesn't this imply spotless and checkstyle constraints are 
different ? It could be that eclipse style specification is more stronger than 
checkstyle ? I just wanted to make sure that we are not running in circles here 
:) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1208: [HUDI-304] Bring back spotless plugin

2020-01-14 Thread GitBox
bvaradar commented on a change in pull request #1208: [HUDI-304] Bring back 
spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#discussion_r366507169
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/HoodieCLI.java
 ##
 @@ -64,8 +64,8 @@ private static void setBasePath(String basePath) {
   }
 
   private static void setLayoutVersion(Integer layoutVersion) {
-HoodieCLI.layoutVersion = new TimelineLayoutVersion(
-(layoutVersion == null) ? TimelineLayoutVersion.CURR_VERSION : 
layoutVersion);
+HoodieCLI.layoutVersion =
 
 Review comment:
   For example, I am quoting this line. Do you know what rule in eclipse 
created this change. As checkstyle never enforced it, can we remove this rule ? 
Please take a look at other changes to see if there is any other pattern like 
this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-14 Thread GitBox
n3nash commented on issue #1194: [HUDI-326] Add support to delete records with 
only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#issuecomment-574314023
 
 
   @bschell Once you open up the JIRA for the backwards compatible package 
refactor, we can merge this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-14 Thread GitBox
n3nash commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r366506053
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/GlobalDeleteKeyGenerator.java
 ##
 @@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.keygen;
+
+import java.util.Arrays;
+import java.util.List;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieKeyException;
+
+/**
+ * Key generator for deletes using global indices. Global index deletes do not 
require partition value
+ * so this key generator avoids using partition value for generating HoodieKey.
+ */
+public class GlobalDeleteKeyGenerator extends KeyGenerator {
+
+  private static final String EMPTY_PARTITION = "";
+  private static final String NULL_RECORDKEY_PLACEHOLDER = "__null__";
+  private static final String EMPTY_RECORDKEY_PLACEHOLDER = "__empty__";
+
+  protected final List recordKeyFields;
+
+  public GlobalDeleteKeyGenerator(TypedProperties config) {
+super(config);
+this.recordKeyFields = 
Arrays.asList(config.getString(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY()).split(","));
+  }
+
+  @Override
+  public HoodieKey getKey(GenericRecord record) {
 
 Review comment:
   I think the point was to make the GlobalDeleteKeyGenerator work for any 
case, complex or simple. 
   @bvaradar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1194: [HUDI-326] Add support to delete records with only record_key

2020-01-14 Thread GitBox
n3nash commented on a change in pull request #1194: [HUDI-326] Add support to 
delete records with only record_key
URL: https://github.com/apache/incubator-hudi/pull/1194#discussion_r366505359
 
 

 ##
 File path: 
hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java
 ##
 @@ -16,8 +16,10 @@
  * limitations under the License.
  */
 
-package org.apache.hudi;
+package org.apache.hudi.keygen;
 
 Review comment:
   +1 @bschell 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-14 Thread GitBox
bvaradar commented on issue #1157: [HUDI-332]Add operation type 
(insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#issuecomment-574309467
 
 
   @hddong : lgtm. Once you rebase this PR and resolve conflicts, we can merge


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1157: [HUDI-332]Add operation type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata

2020-01-14 Thread GitBox
bvaradar commented on a change in pull request #1157: [HUDI-332]Add operation 
type (insert/upsert/bulkinsert/delete) to HoodieCommitMetadata
URL: https://github.com/apache/incubator-hudi/pull/1157#discussion_r366499639
 
 

 ##
 File path: hudi-common/src/main/avro/HoodieCommitMetadata.avsc
 ##
 @@ -129,6 +129,11 @@
  }],
  "default": null
   },
+  {
 
 Review comment:
   Agree with Vinoth, Avro schema evolution expects new fields to be appended. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-535) Compatibility Issue : Hudi Readers using newer layout version can see intermittent errors due to race conditions with concurrent compactions in MOR tables

2020-01-14 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-535:
---

 Summary: Compatibility Issue : Hudi Readers using newer layout 
version can see intermittent errors due to race conditions with concurrent 
compactions in MOR tables 
 Key: HUDI-535
 URL: https://issues.apache.org/jira/browse/HUDI-535
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Common Core
Reporter: Balaji Varadarajan
 Fix For: 0.5.1


As part of recent changes in 0.5.1 to avoid renames in metadata, the compaction 
plans are no longer getting written to .aux folder. Instead the upgraded reader 
reads from timeline (.hoodie). This would work as long as writer is also 
upgraded but if reader alone is upgraded, there would be conditions when hudi 
reader tried to read from compaction requested file which was renamed by 
concurrent compactions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] RonBarabash commented on issue #1202: [SUPPORT] com.uber.hoodie.exception.HoodieIOException: IOException when reading log file + Corrupted Log File

2020-01-14 Thread GitBox
RonBarabash commented on issue #1202: [SUPPORT] 
com.uber.hoodie.exception.HoodieIOException: IOException when reading log file 
+ Corrupted Log File
URL: https://github.com/apache/incubator-hudi/issues/1202#issuecomment-574294028
 
 
   Hey, im thinking this might be an s3 issue with inconsistency, we turned on 
`hoodie.consistency.check.enabled` and im thinking it solved, if it will not 
reproduce in the next couple of days i will close the issue.
   10x
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1227: [MINOR] Refactor unnecessary boxing inside TypedProperties code

2020-01-14 Thread GitBox
lamber-ken opened a new pull request #1227: [MINOR] Refactor unnecessary boxing 
inside TypedProperties code
URL: https://github.com/apache/incubator-hudi/pull/1227
 
 
   ## What is the purpose of the pull request
   
   Refactor unnecessary boxing inside TypedProperties.
   
   ## Brief change log
   
 - *Refactor unnecessary boxing*
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] RonBarabash commented on issue #1202: [SUPPORT] com.uber.hoodie.exception.HoodieIOException: IOException when reading log file + Corrupted Log File

2020-01-14 Thread GitBox
RonBarabash commented on issue #1202: [SUPPORT] 
com.uber.hoodie.exception.HoodieIOException: IOException when reading log file 
+ Corrupted Log File
URL: https://github.com/apache/incubator-hudi/issues/1202#issuecomment-574166866
 
 
   this is the first error we are getting:
   ```
   
   com.uber.hoodie.exception.CorruptedLogFileException: 
HoodieLogFile{pathStr='s3://yotpo-data-lake/table-views/orders/order_lines_stream/table_view/0/.d11ada9b-5e05-4649-9bbb-1b7f85f647fc-0_20200114075517.log.1_26-21-15497',
 fileLen=0}could not be read. Did not find the magic bytes at the start of the 
block   at 
com.uber.hoodie.common.table.log.HoodieLogFileReader.readMagic(HoodieLogFileReader.java:298)
 at 
com.uber.hoodie.common.table.log.HoodieLogFileReader.hasNext(HoodieLogFileReader.java:280)
   at 
com.uber.hoodie.common.table.log.HoodieLogFormatReader.hasNext(HoodieLogFormatReader.java:84)
at 
com.uber.hoodie.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:128)
at 
com.uber.hoodie.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:76)
   at 
com.uber.hoodie.io.compact.HoodieRealtimeTableCompactor.compact(HoodieRealtimeTableCompactor.java:124)
   at 
com.uber.hoodie.io.compact.HoodieRealtimeTableCompactor.lambda$compact$44896304$1(HoodieRealtimeTableCompactor.java:95)
  at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
   at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)  at 
scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)   at 
scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)   at 
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)  
 at 
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at 
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) 
at 
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
 at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)   
 at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:286) at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)  at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)   at 
org.apache.spark.scheduler.Task.run(Task.scala:123)  at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 at java.lang.Thread.run(Thread.java:748)
   --
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] RonBarabash commented on issue #1202: [SUPPORT] com.uber.hoodie.exception.HoodieIOException: IOException when reading log file + Corrupted Log File

2020-01-14 Thread GitBox
RonBarabash commented on issue #1202: [SUPPORT] 
com.uber.hoodie.exception.HoodieIOException: IOException when reading log file 
+ Corrupted Log File
URL: https://github.com/apache/incubator-hudi/issues/1202#issuecomment-574166330
 
 
   Input -> kafka, CDC events logs generated by debezium
   Processing -> spark structured streaming, we do some spark sql on the events
   Output -> Writing using spark to s3 with Hudi MergeOnRead
   this is are the hudi config:
   ``` "options": {
   "hoodie.compaction.strategy": 
"com.uber.hoodie.io.compact.strategy.UnBoundedCompactionStrategy",
   "hoodie.fail.on.timeline.archiving": "false",
   "hoodie.cleaner.commits.retained": "1",
   "hoodie.datasource.hive_sync.enable": "false",
   "hoodie.copyonwrite.record.size.estimate": "60",
   "hoodie.copyonwrite.insert.auto.split": "true",
   "hoodie.parquet.compression.codec": "snappy",
   "hoodie.index.bloom.num_entries": "100",
   "hoodie.compact.inline.max.delta.commits": "1",
 }```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-14 Thread GitBox
OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-574151367
 
 
   @vinothchandar Please check again


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-14 Thread GitBox
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366309913
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) 
throws IOException {
 defaults.load(in);
 return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map options) throws Exception 
{
+scala.collection.immutable.Map ioptions = 
toScalaImmutableMap(options);
+JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+String url = jdbcOptions.url();
+String table = jdbcOptions.tableOrQuery();
+JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+if (tableExists) {
+  JdbcDialect dialect = JdbcDialects.get(url);
+  try {
+PreparedStatement statement = 
conn.prepareStatement(dialect.getSchemaQuery(table));
+try {
+  statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+  ResultSet rs = statement.executeQuery();
+  try {
+StructType structType;
+if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+  structType = JdbcUtils.getSchema(rs, dialect, true);
+} else {
+  structType = JdbcUtils.getSchema(rs, dialect, false);
+}
+return 
AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." 
+ table);
+  } finally {
+rs.close();
+  }
+} finally {
+  statement.close();
+}
+  } finally {
+conn.close();
+  }
+} else {
+  throw new HoodieException(String.format("%s table not exists!", table));
+}
+  }
+
+  @SuppressWarnings("unchecked")
+  private static  scala.collection.immutable.Map 
toScalaImmutableMap(java.util.Map javaMap) {
 
 Review comment:
   Because the underlying spark function only accepts parameters of type 
scala.collection.immutable.Map, I provide a private conversion function from 
java map to scala immutable Map.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-14 Thread GitBox
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema 
provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366309238
 
 

 ##
 File path: 
hudi-utilities/src/test/resources/delta-streamer-config/source-jdbc.avsc
 ##
 @@ -0,0 +1,59 @@
+/*
 
 Review comment:
   Because the data structure of source.avs is too complex, the schema of the 
database table cannot achieve a similar effect.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf edited a comment on issue #1208: [HUDI-304] Bring back spotless plugin

2020-01-14 Thread GitBox
leesf edited a comment on issue #1208: [HUDI-304] Bring back spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574102314
 
 
   > > ocument that developers could use checkstyle.xml file in style folder in 
checkstyle plugin and things will go well
   > 
   > I was able to use checkstyle to format in IntelliJ. This is fine.. but we 
should clearly document this. maybe file a JIRA?
   > 
   > On import order, we can take a second stab may be down the line? again 
filing a JIRA would be great for tracking..
   > 
   > On this PR, my concern was we are reformatting again due to the 120 
character limit? I was trying to see if we can avoid it. @leesf could you 
explain why 100+ files are being touched in this PR? If these were all 
checkstyle failures, then master would be broken right? I am just trying to 
understand what code really changed here, given we are close to a release..
   
   Created https://issues.apache.org/jira/projects/HUDI/issues/HUDI-533 and 
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-534 to track these.
   The changes in this PR has nothing to do with 120 character limit. And I 
randomly checked some modified files, they are modified after first introduced 
spotless plugin at 2019/10/10. The master won't broken as this PR is rebased to 
master branch. 
   PS: And my another question is that if we need to modify 
`checkstyle-suppressions.xml` to make Reformat in IntelliJ and checkstyle.xml 
pass? but the spotless plugin would fail because of the identation is 4 after 
Reformat and the checkstyle.xml would also pass because of `` in 
checkstyle-suppressions.xml, but the spotless plugin would fail and I would not 
find a way to make three all pass (above jira tickets to tracking this to 
document it clearly). But after running `mvn spotless:apply`, we could make 
build pass.  cc @bvaradar @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf edited a comment on issue #1208: [HUDI-304] Bring back spotless plugin

2020-01-14 Thread GitBox
leesf edited a comment on issue #1208: [HUDI-304] Bring back spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574102314
 
 
   > > ocument that developers could use checkstyle.xml file in style folder in 
checkstyle plugin and things will go well
   > 
   > I was able to use checkstyle to format in IntelliJ. This is fine.. but we 
should clearly document this. maybe file a JIRA?
   > 
   > On import order, we can take a second stab may be down the line? again 
filing a JIRA would be great for tracking..
   > 
   > On this PR, my concern was we are reformatting again due to the 120 
character limit? I was trying to see if we can avoid it. @leesf could you 
explain why 100+ files are being touched in this PR? If these were all 
checkstyle failures, then master would be broken right? I am just trying to 
understand what code really changed here, given we are close to a release..
   
   Created https://issues.apache.org/jira/projects/HUDI/issues/HUDI-533 and 
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-534 to track these.
   The changes in this PR has nothing to do with 120 character limit. And I 
randomly checked some modified files, they are modified after first introduced 
spotless plugin at 2019/10/10. The master won't broken as this PR is rebased to 
master branch. 
   PS: And my another question is that if we need to modify 
`checkstyle-suppressions.xml` to make Reformat in IntelliJ and checkstyle.xml 
pass? cc @bvaradar @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-14 Thread GitBox
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

2020-01-14 Thread GitBox
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get 
metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from 
mysql, and at the same time, we need to get the schema from the database. So I 
submitted this PR. A schema provider that obtains metadata through Jdbc calls 
the Spark JDBC related methods by design. And ensure the uniformity of the 
schema, such as reading historical data from spark jdbc, and Use delta streamer 
to synchronize data.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on issue #1208: [HUDI-304] Bring back spotless plugin

2020-01-14 Thread GitBox
leesf commented on issue #1208: [HUDI-304] Bring back spotless plugin
URL: https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574102314
 
 
   > > ocument that developers could use checkstyle.xml file in style folder in 
checkstyle plugin and things will go well
   > 
   > I was able to use checkstyle to format in IntelliJ. This is fine.. but we 
should clearly document this. maybe file a JIRA?
   > 
   > On import order, we can take a second stab may be down the line? again 
filing a JIRA would be great for tracking..
   > 
   > On this PR, my concern was we are reformatting again due to the 120 
character limit? I was trying to see if we can avoid it. @leesf could you 
explain why 100+ files are being touched in this PR? If these were all 
checkstyle failures, then master would be broken right? I am just trying to 
understand what code really changed here, given we are close to a release..
   
   Created https://issues.apache.org/jira/projects/HUDI/issues/HUDI-533 and 
https://issues.apache.org/jira/projects/HUDI/issues/HUDI-534 to track these.
   The changes in this PR has nothing to do with 120 character limit. And I 
randomly checked some modified files, they are modified after first introduced 
spotless plugin at 2019/10/10. The master won't broken as this PR is rebased to 
master branch. 
   PS: And my another question is that if we need to modify 
`checkstyle-suppressions.xml` to make Reformat in IntelliJ, checkstyle.xml and 
eclipse-java-google-style.xml all pass? cc @bvaradar @vinothchandar 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-534) Explore a new way to fix import order

2020-01-14 Thread leesf (Jira)
leesf created HUDI-534:
--

 Summary: Explore a new way to fix import order
 Key: HUDI-534
 URL: https://issues.apache.org/jira/browse/HUDI-534
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: newbie
Reporter: leesf


more context at 
https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574024869.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-533) Update Setup docs with lastest checkstyle

2020-01-14 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-533:
---
Priority: Minor  (was: Major)

> Update Setup docs with lastest checkstyle
> -
>
> Key: HUDI-533
> URL: https://issues.apache.org/jira/browse/HUDI-533
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: leesf
>Priority: Minor
>
> more context at 
> https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574024869
> docs here https://hudi.apache.org/contributing.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-533) Update Setup docs with lastest checkstyle

2020-01-14 Thread leesf (Jira)
leesf created HUDI-533:
--

 Summary: Update Setup docs with lastest checkstyle
 Key: HUDI-533
 URL: https://issues.apache.org/jira/browse/HUDI-533
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: leesf


more context at 
https://github.com/apache/incubator-hudi/pull/1208#issuecomment-574024869
docs here https://hudi.apache.org/contributing.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1212: [WIP] [HUDI-509] Renaming code in sync with cWiki restructuring

2020-01-14 Thread GitBox
vinothchandar commented on a change in pull request #1212: [WIP] [HUDI-509] 
Renaming code in sync with cWiki restructuring
URL: https://github.com/apache/incubator-hudi/pull/1212#discussion_r366207936
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##
 @@ -160,9 +160,9 @@ public Operation convert(String value) throws 
ParameterException {
 @Parameter(names = {"--target-table"}, description = "name of the target 
table in Hive", required = true)
 public String targetTableName;
 
-@Parameter(names = {"--storage-type"}, description = "Type of Storage. 
COPY_ON_WRITE (or) MERGE_ON_READ",
+@Parameter(names = {"--table-type"}, description = "Type of table. 
COPY_ON_WRITE (or) MERGE_ON_READ",
 
 Review comment:
   I tried retaining `--storage-type` as well. but have to choose betwen this 
and the new `--table-type`to mark as required.. Whichever we pick, it wont be 
ideal. So choose to simply rename, requiring the users to make a small change 
in the command, when upgrading.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1212: [WIP] [HUDI-509] Renaming code in sync with cWiki restructuring

2020-01-14 Thread GitBox
vinothchandar commented on a change in pull request #1212: [WIP] [HUDI-509] 
Renaming code in sync with cWiki restructuring
URL: https://github.com/apache/incubator-hudi/pull/1212#discussion_r366214234
 
 

 ##
 File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
 ##
 @@ -50,20 +50,14 @@ class DefaultSource extends RelationProvider
   optParams: Map[String, String],
   schema: StructType): BaseRelation = {
 // Add default options for unspecified read options keys.
-val parameters = Map(VIEW_TYPE_OPT_KEY -> DEFAULT_VIEW_TYPE_OPT_VAL) ++ 
optParams
+val parameters = Map(QUERY_TYPE_OPT_KEY -> DEFAULT_QUERY_TYPE_OPT_VAL) ++ 
optParams
 
 val path = parameters.get("path")
 if (path.isEmpty) {
   throw new HoodieException("'path' must be specified.")
 }
 
-if (parameters(VIEW_TYPE_OPT_KEY).equals(VIEW_TYPE_REALTIME_OPT_VAL)) {
-  throw new HoodieException("Realtime view not supported yet via data 
source. Please use HiveContext route.")
-}
-
-if (parameters(VIEW_TYPE_OPT_KEY).equals(VIEW_TYPE_INCREMENTAL_OPT_VAL)) {
-  new IncrementalRelation(sqlContext, path.get, optParams, schema)
-} else {
+if (parameters(QUERY_TYPE_OPT_KEY).equals(QUERY_TYPE_SNAPSHOT_OPT_VAL)) {
 
 Review comment:
   this is a behavior change... previously MOR realtime view errored out here.. 
now we just return the same RO query results with a warning.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha merged pull request #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-14 Thread GitBox
bhasudha merged pull request #1223: [HUDI-530] Fix conversion of Spark struct 
type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (fd8f1c7 -> 2bb0c21)

2020-01-14 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from fd8f1c7  [MINOR] Reuse random object (#1222)
 add 2bb0c21  Fix conversion of Spark struct type to Avro schema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/hudi/common/HoodieTestDataGenerator.java   | 13 ++---
 .../main/scala/org/apache/hudi/AvroConversionHelper.scala |  2 +-
 hudi-spark/src/test/java/DataSourceTestUtils.java |  4 +++-
 hudi-spark/src/test/java/HoodieJavaApp.java   |  4 ++--
 hudi-spark/src/test/java/HoodieJavaStreamingApp.java  |  4 ++--
 .../src/test/resources/delta-streamer-config/source.avsc  | 15 ++-
 .../src/test/resources/delta-streamer-config/target.avsc  | 15 ++-
 7 files changed, 46 insertions(+), 11 deletions(-)



[GitHub] [incubator-hudi] bhasudha commented on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-14 Thread GitBox
bhasudha commented on issue #1223: [HUDI-530] Fix conversion of Spark struct 
type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-574060570
 
 
   Looks goot to me. I was also able to quickly verify this in my local setup. 
Thanks @umehrot2. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha edited a comment on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-14 Thread GitBox
bhasudha edited a comment on issue #1223: [HUDI-530] Fix conversion of Spark 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-574060570
 
 
   Looks good to me. I was also able to quickly verify this in my local setup. 
Thanks @umehrot2. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch hudi_test_suite_refactor updated (3dc85eb -> 0456214)

2020-01-14 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


 discard 3dc85eb  [HUDI-442] Fix 
TestComplexKeyGenerator#testSingleValueKeyGenerator and 
testMultipleValueKeyGenerator NPE
 add 0456214  [HUDI-442] Fix 
TestComplexKeyGenerator#testSingleValueKeyGenerator and 
testMultipleValueKeyGenerator NPE

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (3dc85eb)
\
 N -- N -- N   refs/heads/hudi_test_suite_refactor (0456214)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/hudi/testsuite/helpers/HiveServiceProvider.java| 1 -
 .../main/java/org/apache/hudi/testsuite/job/HoodieTestSuiteJob.java| 3 ++-
 .../java/org/apache/hudi/testsuite/job/TestHoodieTestSuiteJob.java | 1 -
 3 files changed, 2 insertions(+), 3 deletions(-)



[GitHub] [incubator-hudi] vinothchandar edited a comment on issue #1223: [HUDI-530] Fix conversion of Spark struct type to Avro schema

2020-01-14 Thread GitBox
vinothchandar edited a comment on issue #1223: [HUDI-530] Fix conversion of 
Spark struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1223#issuecomment-574015931
 
 
   @umehrot2 is this by any way related to the quickstart breakage that 
@nsivabalan reported? 
   EDIT: seems unrelated


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services