[jira] [Updated] (HUDI-388) Support DDL / DML SparkSQL statements which useful for admins

2020-03-16 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-388:

Status: Open  (was: New)

> Support DDL / DML SparkSQL statements which useful for admins
> -
>
> Key: HUDI-388
> URL: https://issues.apache.org/jira/browse/HUDI-388
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> *Purpose*
> Currently, hudi offers some tools available to operate an ecosystem of Hudi 
> datasets including hudi-cli, metrics, spark ui[1]. It's easy for admins to 
> manage the hudi datasets by some customized ddl sql statements instead of via 
> hudi-cli.
>  
> After SPARK-18127, we can customize the spark session with our own optimizer, 
> parser, analyzer, and physical plan strategy rules in Spark. Here are some 
> steps to extend spark session
> 1, Need a tool to parse the SparkSQL statements, like antlr, RegExp.
> 2, A class which extends org.apache.spark.sql.SparkSessionExtensions and 
> inject the parser.
> 3, Run the customized statements by extending 
> org.apache.spark.sql.execution.command.RunnableCommand.
>  
> *Demo*
> 1, Extend SparkSessionExtensions
> {code:java}
> class HudiSparkSessionExtension extends (SparkSessionExtensions => Unit) {
>   override def apply(extensions: SparkSessionExtensions): Unit = {
> extensions.injectParser { (session, parser) =>
>   new HudiDDLParser(parser)
> }
>   }
> } {code}
>  
> 2, Extend RunnableCommand
> {code:java}
> case class HudiStatCommand(path: String) extends RunnableCommand {
>   override val output: Seq[Attribute] = {
> Seq(
>   AttributeReference("CommitTime", StringType, nullable = false)(),
>   AttributeReference("Total Upserted", IntegerType, nullable = false)(),
>   AttributeReference("Total Written", IntegerType, nullable = false)(),
>   AttributeReference("Write Amplifiation Factor", DoubleType, nullable = 
> false)()
> )
>   }
>   override def run(sparkSession: SparkSession): Seq[Row] = {
> Seq(
>   Row("20191207003131", 0, 10, 0.1),
>   Row("20191207003200", 4, 10, 2.50),
>   Row("Total", 4, 20, 5.00)
> )
>   }
> }
> {code}
>  
> 3, demo result, mock data
> {code:java}
> +--+--+-+-+
> |CommitTime|Total Upserted|Total Written|Write Amplifiation Factor|
> +--+--+-+-+
> |20191207003131| 0|   10|  0.1|
> |20191207003200| 4|   10|  2.5|
> | Total| 4|   20|  5.0|
> +--+--+-+-+
> {code}
>  
> [https://github.com/lamber-ken/hudi-work]
> [http://hudi.apache.org/admin_guide.html]
> https://issues.apache.org/jira/browse/SPARK-18127
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] leesf commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-16 Thread GitBox
leesf commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-599870507
 
 
   > I don't see an option to merge the PR. Is it that @leesf is yet to 
approve? or do I need to request permission or something ?
   
   @nsivabalan Please refer to this wiki to get github write access to the 
repository. 
https://cwiki.apache.org/confluence/display/HUDI/Committer+On-boarding+Guide


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-716) Exception: Not an Avro data file when running HoodieCleanClient.runClean

2020-03-16 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060624#comment-17060624
 ] 

Balaji Varadarajan commented on HUDI-716:
-

[~afilipchik]: Can you attach the listing of .hoodie folder and also contents 
of hoodie.properties.

Also, Any chance do you know what version of hoodie was running 2 months back ?

> Exception: Not an Avro data file when running HoodieCleanClient.runClean
> 
>
> Key: HUDI-716
> URL: https://issues.apache.org/jira/browse/HUDI-716
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> Just upgraded to upstream master from 0.5 and seeing an issue at the end of 
> the delta sync run: 
> 20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync 
> once. Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error 
> running delta sync once. Shutting 
> downorg.apache.hudi.exception.HoodieIOException: Not an Avro data file at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) 
> at 
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
> at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) 
> at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) 
> at 
> org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.io.IOException: Not an Avro data file at 
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at 
> org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
>  at 
> org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) 
> at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) 
> ... 24 more
>  
> It is attempting to read an old cleanup file (2 month old) and crashing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nsivabalan commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-16 Thread GitBox
nsivabalan commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi 
Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-599869794
 
 
   I don't see an option to merge the PR. Is it that @leesf is yet to approve? 
or do I need to request permission or something ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
lamber-ken edited a comment on issue #1412: [HUDI-504] Restructuring and 
auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412#issuecomment-599862437
 
 
   here is the build log of current pr
   https://travis-ci.org/github/apache/incubator-hudi/builds/663339323


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
lamber-ken commented on issue #1412: [HUDI-504] Restructuring and 
auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412#issuecomment-599862437
 
 
   here is the build log of this pr
   https://travis-ci.org/github/apache/incubator-hudi/builds/663339323


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #219

2020-03-16 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.35 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
lamber-ken commented on a change in pull request #1412: [HUDI-504] 
Restructuring and auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412#discussion_r393427543
 
 

 ##
 File path: .travis.yml
 ##
 @@ -0,0 +1,42 @@
+language: ruby
+rvm:
+  - 2.6.3
+
+git:
+  clone: false
+
+env:
+  global:
+- GIT_USER="CI BOT"
+- GIT_EMAIL="ci...@hudi.apache.org"
+- GIT_REPO="apache"
+- GIT_PROJECT="incubator-hudi"
+- GIT_BRANCH="asf-site"
+- DOCS_ROOT="`pwd`/${GIT_PROJECT}/docs"
+
+before_install:
+  - git config --global user.name ${GIT_USER}
+  - git config --global user.email ${GIT_EMAIL}
+  - git clone https://${GIT_TOKEN}@github.com/${GIT_REPO}/${GIT_PROJECT}.git
+  - cd ${GIT_PROJECT} && git checkout ${GIT_BRANCH}
+  - gem install bundler:2.0.2
+
+script:
+  - pushd ${DOCS_ROOT}
+  - bundle install
+  - bundle update --bundler
+  - bundle exec jekyll build _config.yml --source . --destination _site
+  - popd
+
+after_success:
 
 Review comment:
   hi @vinothchandar 
   
   > 1 How does this file interplay with .travis.yml on master.. 
   
   per branch uses their own `.travis.yml`
   
   > i.e the change has been landed on asf-site already and this job is 
triggered after that?
   
   right, job will be triggered once merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-480) Support a querying delete data methond in incremental view

2020-03-16 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060605#comment-17060605
 ] 

vinoyang commented on HUDI-480:
---

[~vinoth] Yes, I am absolutely interested in this topic. IMO, before/after is a 
good direction. Let's support binlog feature on Hadoop.

> Support a querying delete data methond in incremental view
> --
>
> Key: HUDI-480
> URL: https://issues.apache.org/jira/browse/HUDI-480
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Incremental Pull
>Reporter: cdmikechen
>Priority: Minor
>
> As we known, hudi have supported many method to query data in Spark and Hive 
> and Presto. And it also provides a very good timeline idea to trace changes 
> in data, and it can be used to query incremental data in incremental view.
> In old time, we just have insert and update funciton to upsert data, and now 
> we have added new functions to delete some existing data.
> *[HUDI-328] Adding delete api to HoodieWriteClient* 
> https://github.com/apache/incubator-hudi/pull/1004
> *[HUDI-377] Adding Delete() support to 
> DeltaStreamer**https://github.com/apache/incubator-hudi/pull/1073
> So I think if we have delete api, should we add another method to get deleted 
> data in incremental view?
> I've looked at the methods for generating new parquet files. I think the main 
> idea is to combine old and new data, and then filter the data which need to 
> be deleted, so that the deleted data does not exist in the new dataset. 
> However, in this way, the data to be deleted will not be retained in new 
> dataset, so that only the inserted or modified data can be found according to 
> the existing timestamp field during data tracing in incremental view.
> If we can do it, I feel that there are two ideas to consider:
> 1. Trace the dataset in the same file at different time check points 
> according to the timeline, compare the two datasets according to the key and 
> filter out the deleted data. This method does not consume extra when writing, 
> but it needs to call the analysis function according to the actual request 
> during query, which consumes a lot.
> 2. When writing data, if there is any deleted data, we will record it. File 
> name such as *.delete_filename_version_timestamp*. So that we can immediately 
> give feedback according to the time. But additional processing will be done 
> at the time of writing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-716) Exception: Not an Avro data file when running HoodieCleanClient.runClean

2020-03-16 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-716:
---

Assignee: lamber-ken

> Exception: Not an Avro data file when running HoodieCleanClient.runClean
> 
>
> Key: HUDI-716
> URL: https://issues.apache.org/jira/browse/HUDI-716
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Assignee: lamber-ken
>Priority: Major
> Fix For: 0.6.0
>
>
> Just upgraded to upstream master from 0.5 and seeing an issue at the end of 
> the delta sync run: 
> 20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync 
> once. Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error 
> running delta sync once. Shutting 
> downorg.apache.hudi.exception.HoodieIOException: Not an Avro data file at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) 
> at 
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
> at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) 
> at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) 
> at 
> org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.io.IOException: Not an Avro data file at 
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at 
> org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
>  at 
> org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) 
> at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) 
> ... 24 more
>  
> It is attempting to read an old cleanup file (2 month old) and crashing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-716) Exception: Not an Avro data file when running HoodieCleanClient.runClean

2020-03-16 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060595#comment-17060595
 ] 

Vinoth Chandar commented on HUDI-716:
-

[~vbalaji] [~nagarwal] one more report of the same thing.. 

 

Alex mentioned that this is a zero size file.. 

 

cc @lamber-ken  are you up for taking a crack at this? (since you love these 
challenging ones :)) 

> Exception: Not an Avro data file when running HoodieCleanClient.runClean
> 
>
> Key: HUDI-716
> URL: https://issues.apache.org/jira/browse/HUDI-716
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> Just upgraded to upstream master from 0.5 and seeing an issue at the end of 
> the delta sync run: 
> 20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync 
> once. Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error 
> running delta sync once. Shutting 
> downorg.apache.hudi.exception.HoodieIOException: Not an Avro data file at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) 
> at 
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
> at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) 
> at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) 
> at 
> org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.io.IOException: Not an Avro data file at 
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at 
> org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
>  at 
> org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) 
> at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) 
> ... 24 more
>  
> It is attempting to read an old cleanup file (2 month old) and crashing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
yanghua commented on a change in pull request #1411: [HUDI-695]Add unit test 
for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#discussion_r393417290
 
 

 ##
 File path: 
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestTableCommand.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.AbstractShellIntegrationTest;
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.ConsistencyGuardConfig;
+import org.junit.Test;
+import org.springframework.shell.core.CommandResult;
+
+import java.io.File;
+
+import static 
org.apache.hudi.common.table.HoodieTableMetaClient.METAFOLDER_NAME;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Test Cases for {@link TableCommand}.
+ */
+public class TestTableCommand extends AbstractShellIntegrationTest {
+
+  /**
+   * Test Cases for create, desc and connect table.
+   */
+  @Test
+  public void testCreateAndConnectTable() {
+// Prepare
+String tableName = "test_table";
+HoodieCLI.conf = jsc.hadoopConfiguration();
+String tablePath = basePath + File.separator + tableName;
+String metaPath = tablePath + File.separator + METAFOLDER_NAME;
+
+// Test create default
+CommandResult cr = getShell().executeCommand(
+"create --path " + tablePath + " --tableName " + tableName);
+assertEquals("Metadata for table " + tableName + " loaded", 
cr.getResult().toString());
+HoodieTableMetaClient client = HoodieCLI.getTableMetaClient();
+assertEquals(metaPath, client.getArchivePath());
+assertEquals(tablePath, client.getBasePath());
+assertEquals(metaPath, client.getMetaPath());
+assertEquals(HoodieTableType.COPY_ON_WRITE, client.getTableType());
+assertEquals(new Integer(1), 
client.getTimelineLayoutVersion().getVersion());
+
+// Test desc
+cr = getShell().executeCommand("desc");
+assertTrue(cr.isSuccess());
+// check table's basePath metaPath and type
+assertTrue(cr.getResult().toString().contains(tablePath));
+assertTrue(cr.getResult().toString().contains(metaPath));
+assertTrue(cr.getResult().toString().contains("COPY_ON_WRITE"));
+
+// Test connect with specified values
+// Check specified values
+cr = getShell().executeCommand(
+"connect --path " + tablePath + " --initialCheckIntervalMs 3000 "
+  + "--maxWaitIntervalMs 4 --maxCheckIntervalMs 8");
+assertTrue(cr.isSuccess());
+ConsistencyGuardConfig conf = HoodieCLI.consistencyGuardConfig;
+assertEquals(3000, conf.getInitialConsistencyCheckIntervalMs());
+assertEquals(4, conf.getMaxConsistencyCheckIntervalMs());
+assertEquals(8, conf.getMaxConsistencyChecks());
+// Check default values
+assertTrue(!conf.isConsistencyCheckEnabled());
 
 Review comment:
   tip: we can also use `assertFalse` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
yanghua commented on a change in pull request #1411: [HUDI-695]Add unit test 
for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#discussion_r393417023
 
 

 ##
 File path: 
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestTableCommand.java
 ##
 @@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.AbstractShellIntegrationTest;
+import org.apache.hudi.cli.HoodieCLI;
+import org.apache.hudi.common.model.HoodieTableType;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.ConsistencyGuardConfig;
+import org.junit.Test;
+import org.springframework.shell.core.CommandResult;
+
+import java.io.File;
+
+import static 
org.apache.hudi.common.table.HoodieTableMetaClient.METAFOLDER_NAME;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Test Cases for {@link TableCommand}.
+ */
+public class TestTableCommand extends AbstractShellIntegrationTest {
+
+  /**
+   * Test Cases for create, desc and connect table.
+   */
+  @Test
+  public void testCreateAndConnectTable() {
 
 Review comment:
   Can we split this method into three small methods so that we can test 
`create`, `desc` and `connect` command individually?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-716) Exception: Not an Avro data file when running HoodieCleanClient.runClean

2020-03-16 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-716:

Fix Version/s: 0.6.0

> Exception: Not an Avro data file when running HoodieCleanClient.runClean
> 
>
> Key: HUDI-716
> URL: https://issues.apache.org/jira/browse/HUDI-716
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Alexander Filipchik
>Priority: Major
> Fix For: 0.6.0
>
>
> Just upgraded to upstream master from 0.5 and seeing an issue at the end of 
> the delta sync run: 
> 20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync 
> once. Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error 
> running delta sync once. Shutting 
> downorg.apache.hudi.exception.HoodieIOException: Not an Avro data file at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) 
> at 
> org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
>  at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>  at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
> at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) 
> at org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) 
> at 
> org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168)
>  at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395)
>  at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
> java.io.IOException: Not an Avro data file at 
> org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at 
> org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
>  at 
> org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) 
> at 
> org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) 
> ... 24 more
>  
> It is attempting to read an old cleanup file (2 month old) and crashing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-716) Exception: Not an Avro data file when running HoodieCleanClient.runClean

2020-03-16 Thread Alexander Filipchik (Jira)
Alexander Filipchik created HUDI-716:


 Summary: Exception: Not an Avro data file when running 
HoodieCleanClient.runClean
 Key: HUDI-716
 URL: https://issues.apache.org/jira/browse/HUDI-716
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: DeltaStreamer
Reporter: Alexander Filipchik


Just upgraded to upstream master from 0.5 and seeing an issue at the end of the 
delta sync run: 

20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running delta sync once. 
Shutting down20/03/17 02:13:49 ERROR HoodieDeltaStreamer: Got error running 
delta sync once. Shutting downorg.apache.hudi.exception.HoodieIOException: Not 
an Avro data file at 
org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:144) 
at 
org.apache.hudi.client.HoodieCleanClient.lambda$clean$0(HoodieCleanClient.java:88)
 at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) 
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) 
at org.apache.hudi.client.HoodieCleanClient.clean(HoodieCleanClient.java:86) at 
org.apache.hudi.client.HoodieWriteClient.clean(HoodieWriteClient.java:843) at 
org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:520) 
at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:168)
 at 
org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:111)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:395)
 at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:237) 
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
 at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:294)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at 
org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at 
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: 
java.io.IOException: Not an Avro data file at 
org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:50) at 
org.apache.hudi.common.util.AvroUtils.deserializeAvroMetadata(AvroUtils.java:147)
 at 
org.apache.hudi.common.util.CleanerUtils.getCleanerPlan(CleanerUtils.java:87) 
at 
org.apache.hudi.client.HoodieCleanClient.runClean(HoodieCleanClient.java:141) 
... 24 more

 

It is attempting to read an old cleanup file (2 month old) and crashing

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] EdwinGuo commented on issue #143: Tracking ticket for folks to be added to slack group

2020-03-16 Thread GitBox
EdwinGuo commented on issue #143: Tracking ticket for folks to be added to 
slack group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-599840432
 
 
   Please add: alter...@yahoo.com  Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong commented on issue #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
hddong commented on issue #1411: [HUDI-695]Add unit test for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#issuecomment-599832627
 
 
   @yanghua After reset, it's green now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
codecov-io commented on issue #1411: [HUDI-695]Add unit test for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#issuecomment-599832351
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=h1) 
Report
   > Merging 
[#1411](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/55e6d348155f63eb128cd208687d02206bad66a5?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1411/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1411  +/-   ##
   ===
   + Coverage 67.69%   67.7%   +<.01% 
   + Complexity  243 241   -2 
   ===
 Files   338 338  
 Lines 16371   16374   +3 
 Branches   16721672  
   ===
   + Hits  11083   11086   +3 
 Misses 45484548  
 Partials740 740
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1411/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=)
 | `64.17% <0%> (+1.67%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=footer).
 Last update 
[55e6d34...2186afe](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
codecov-io edited a comment on issue #1411: [HUDI-695]Add unit test for 
TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#issuecomment-599832351
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=h1) 
Report
   > Merging 
[#1411](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/55e6d348155f63eb128cd208687d02206bad66a5?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1411/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1411  +/-   ##
   ===
   + Coverage 67.69%   67.7%   +<.01% 
   + Complexity  243 241   -2 
   ===
 Files   338 338  
 Lines 16371   16374   +3 
 Branches   16721672  
   ===
   + Hits  11083   11086   +3 
 Misses 45484548  
 Partials740 740
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1411/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=)
 | `64.17% <0%> (+1.67%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=footer).
 Last update 
[55e6d34...2186afe](https://codecov.io/gh/apache/incubator-hudi/pull/1411?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on issue #1413: [HUDI-539] Add "Config" constructor to HoodieROTablePathFilter

2020-03-16 Thread GitBox
bschell commented on issue #1413: [HUDI-539] Add "Config" constructor to 
HoodieROTablePathFilter
URL: https://github.com/apache/incubator-hudi/pull/1413#issuecomment-599798159
 
 
   I would need to make the corresponding change in spark to use this new 
constructor. I haven't seen any issue with Spark's use of the pathfilter so far 
but I haven't looked closely. I can test this with spark-sql. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1413: [HUDI-539] Add "Config" constructor to HoodieROTablePathFilter

2020-03-16 Thread GitBox
vinothchandar commented on issue #1413: [HUDI-539] Add "Config" constructor to 
HoodieROTablePathFilter
URL: https://github.com/apache/incubator-hudi/pull/1413#issuecomment-599784095
 
 
   @bschell HUDI-539 was for fixing this for spark sql.. did you test that as 
well? does it actually fix anything for spark queries 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1413: [HUDI-539] Add "Config" constructor to HoodieROTablePathFilter

2020-03-16 Thread GitBox
vinothchandar commented on issue #1413: [HUDI-539] Add "Config" constructor to 
HoodieROTablePathFilter
URL: https://github.com/apache/incubator-hudi/pull/1413#issuecomment-599783606
 
 
   ah... the commit message did not have the HUDI-... prefix.. my bad. clicked 
too soon 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: Add constructor to HoodieROTablePathFilter (#1413)

2020-03-16 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 418f9bb  Add constructor to HoodieROTablePathFilter (#1413)
418f9bb is described below

commit 418f9bb2e91ed6c02077d36e49a47f0c8d08303a
Author: bschell 
AuthorDate: Mon Mar 16 15:19:16 2020 -0700

Add constructor to HoodieROTablePathFilter (#1413)

Allows HoodieROTablePathFilter to accept a configuration for
initializing the filesystem. This fixes a bug with Presto's use of this
pathfilter.

Co-authored-by: Brandon Scheller 
---
 .../org/apache/hudi/hadoop/HoodieROTablePathFilter.java   | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java
 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java
index 66ec864..d879a2f 100644
--- 
a/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java
+++ 
b/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java
@@ -63,12 +63,21 @@ public class HoodieROTablePathFilter implements PathFilter, 
Serializable {
*/
   private HashSet nonHoodiePathCache;
 
+  /**
+   * Hadoop configurations for the FileSystem.
+   */
+  private Configuration conf;
 
   private transient FileSystem fs;
 
   public HoodieROTablePathFilter() {
-hoodiePathCache = new HashMap<>();
-nonHoodiePathCache = new HashSet<>();
+this(new Configuration());
+  }
+
+  public HoodieROTablePathFilter(Configuration conf) {
+this.hoodiePathCache = new HashMap<>();
+this.nonHoodiePathCache = new HashSet<>();
+this.conf = conf;
   }
 
   /**
@@ -93,7 +102,7 @@ public class HoodieROTablePathFilter implements PathFilter, 
Serializable {
 Path folder = null;
 try {
   if (fs == null) {
-fs = path.getFileSystem(new Configuration());
+fs = path.getFileSystem(conf);
   }
 
   // Assumes path is a file



[GitHub] [incubator-hudi] vinothchandar merged pull request #1413: [HUDI-539] Add "Config" constructor to HoodieROTablePathFilter

2020-03-16 Thread GitBox
vinothchandar merged pull request #1413: [HUDI-539] Add "Config" constructor to 
HoodieROTablePathFilter
URL: https://github.com/apache/incubator-hudi/pull/1413
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393337986
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/inline/fs/TestHFileReadWriteFlow.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.io.hfile.CacheConfig;
+import org.apache.hadoop.hbase.io.hfile.HFile;
+import org.apache.hadoop.hbase.io.hfile.HFileContext;
+import org.apache.hadoop.hbase.io.hfile.HFileContextBuilder;
+import org.apache.hadoop.hbase.io.hfile.HFileScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.UUID;
+
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.FILE_SCHEME;
+import static org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.RANDOM;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getPhantomFile;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getRandomOuterInMemPath;
+
+/**
+ * Tests {@link InlineFileSystem} using HFile writer and reader.
+ */
+public class TestHFileReadWriteFlow {
 
 Review comment:
   this file has a fair amount of test code.. any way to simply/reuse? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393290627
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
+   * Output: "inlinefs:/file1/file/inline_file/?start_offset=20=40"
+   *
+   * @param outerPath
+   * @param origScheme
+   * @param inLineStartOffset
+   * @param inLineLength
+   * @return
+   */
+  public static Path getEmbeddedInLineFilePath(Path outerPath, String 
origScheme, long inLineStartOffset, long inLineLength) {
+String subPath = 
outerPath.toString().substring(outerPath.toString().indexOf(":") + 1);
+return new Path(
+InlineFileSystem.SCHEME + "://" + subPath + "/" + origScheme + "/" + 
INLINE_FILE_STR + "/"
++ "?" + START_OFFSET_STR + EQUALS_STR + inLineStartOffset + "&" + 
LENGTH_STR + EQUALS_STR + inLineLength
+);
+  }
+
+  /**
+   * Eg input : "inlinefs:/file1/file/inline_file/?start_offset=20=40".
+   * Output : "file:/file1"
+   *
+   * @param inlinePath
+   * @param outerScheme
+   * @return
+   */
+  public static Path getOuterfilePathFromInlinePath(Path inlinePath, String 
outerScheme) {
+String scheme = inlinePath.getParent().getParent().getName();
+Path basePath = inlinePath.getParent().getParent().getParent();
+return new Path(basePath.toString().replaceFirst(outerScheme, scheme));
+  }
+
+  /**
+   * Eg input : "inlinefs:/file1/file/inline_file/?start_offset=20=40".
+   * output: 20
+   *
+   * @param inlinePath
+   * @return
+   */
+  public static int startOffset(Path inlinePath) {
+String pathName = inlinePath.getName();
+return Integer.parseInt(pathName.substring(pathName.indexOf('=') + 1, 
pathName.indexOf('&')));
 
 Review comment:
   can we implement this using `split()` and then picking the ith index.. 
instead of relying on first and last? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393338790
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/inline/fs/TestParquetReadWriteFlow.java
 ##
 @@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hudi.common.HoodieTestDataGenerator;
+import org.apache.hudi.common.model.HoodieRecord;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.avro.AvroParquetWriter;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.hadoop.ParquetWriter;
+import org.apache.parquet.hadoop.metadata.CompressionCodecName;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.UUID;
+
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.FILE_SCHEME;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getPhantomFile;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getRandomOuterInMemPath;
+
+/**
+ * Tests {@link InlineFileSystem} with Parquet writer and reader.
+ */
+public class TestParquetReadWriteFlow {
 
 Review comment:
   rename `TestParquetInlining`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393337020
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/inline/fs/TestHFileReadWriteFlow.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.io.hfile.CacheConfig;
+import org.apache.hadoop.hbase.io.hfile.HFile;
+import org.apache.hadoop.hbase.io.hfile.HFileContext;
+import org.apache.hadoop.hbase.io.hfile.HFileContextBuilder;
+import org.apache.hadoop.hbase.io.hfile.HFileScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.UUID;
+
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.FILE_SCHEME;
+import static org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.RANDOM;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getPhantomFile;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getRandomOuterInMemPath;
+
+/**
+ * Tests {@link InlineFileSystem} using HFile writer and reader.
+ */
+public class TestHFileReadWriteFlow {
+
+  private final Configuration inMemoryConf;
+  private final Configuration inlineConf;
+  private final int minBlockSize = 1024;
+  private static final String LOCAL_FORMATTER = "%010d";
+  private int maxRows = 100 + RANDOM.nextInt(1000);
+  private Path generatedPath;
+
+  public TestHFileReadWriteFlow() {
+inMemoryConf = new Configuration();
+inMemoryConf.set("fs." + InMemoryFileSystem.SCHEME + ".impl", 
InMemoryFileSystem.class.getName());
+inlineConf = new Configuration();
+inlineConf.set("fs." + InlineFileSystem.SCHEME + ".impl", 
InlineFileSystem.class.getName());
+  }
+
+  @After
+  public void teardown() throws IOException {
+if (generatedPath != null) {
+  File filePath = new 
File(generatedPath.toString().substring(generatedPath.toString().indexOf(':') + 
1));
+  if (filePath.exists()) {
+FileSystemTestUtils.deleteFile(filePath);
+  }
+}
+  }
+
+  @Test
+  public void testSimpleInlineFileSystem() throws IOException {
+Path outerInMemFSPath = getRandomOuterInMemPath();
+Path outerPath = new Path(FILE_SCHEME + 
outerInMemFSPath.toString().substring(outerInMemFSPath.toString().indexOf(':')));
+generatedPath = outerPath;
+CacheConfig cacheConf = new CacheConfig(inMemoryConf);
+FSDataOutputStream fout = createFSOutput(outerInMemFSPath, inMemoryConf);
+HFileContext meta = new HFileContextBuilder()
+.withBlockSize(minBlockSize)
+.build();
+HFile.Writer writer = HFile.getWriterFactory(inMemoryConf, cacheConf)
+.withOutputStream(fout)
+.withFileContext(meta)
+.withComparator(new KeyValue.KVComparator())
+.create();
+
+writeRecords(writer);
+fout.close();
+
+byte[] inlineBytes = getBytesToInline(outerInMemFSPath);
+long startOffset = generateOuterFile(outerPath, inlineBytes);
+
+long inlineLength = inlineBytes.length;
+
+// Generate phantom inline file
+Path inlinePath = getPhantomFile(outerPath, startOffset, inlineLength);
+
+InlineFileSystem inlineFileSystem = (InlineFileSystem) 
inlinePath.getFileSystem(inlineConf);
+FSDataInputStream fin = inlineFileSystem.open(inlinePath);
+
+HFile.Reader reader = HFile.createReader(inlineFileSystem, inlinePath, 
cacheConf, inlineConf);
+// Load up the index.
+reader.loadFileInfo();
+// Get a scanner that caches and that does not use pread.
+HFileScanner scanner = 

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393338152
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/inline/fs/TestHFileReadWriteFlow.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.io.hfile.CacheConfig;
+import org.apache.hadoop.hbase.io.hfile.HFile;
+import org.apache.hadoop.hbase.io.hfile.HFileContext;
+import org.apache.hadoop.hbase.io.hfile.HFileContextBuilder;
+import org.apache.hadoop.hbase.io.hfile.HFileScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.UUID;
+
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.FILE_SCHEME;
+import static org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.RANDOM;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getPhantomFile;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getRandomOuterInMemPath;
+
+/**
+ * Tests {@link InlineFileSystem} using HFile writer and reader.
+ */
+public class TestHFileReadWriteFlow {
 
 Review comment:
   rename to `TestHFileInlining`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393338934
 
 

 ##
 File path: pom.xml
 ##
 @@ -684,6 +684,11 @@
 hbase-client
 ${hbase.version}
   
+  
+org.apache.hbase
+hbase-server
+${hbase.version}
 
 Review comment:
   scope to be test? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393336968
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/inline/fs/TestHFileReadWriteFlow.java
 ##
 @@ -0,0 +1,243 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.KeyValue;
+import org.apache.hadoop.hbase.io.hfile.CacheConfig;
+import org.apache.hadoop.hbase.io.hfile.HFile;
+import org.apache.hadoop.hbase.io.hfile.HFileContext;
+import org.apache.hadoop.hbase.io.hfile.HFileContextBuilder;
+import org.apache.hadoop.hbase.io.hfile.HFileScanner;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.UUID;
+
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.FILE_SCHEME;
+import static org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.RANDOM;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getPhantomFile;
+import static 
org.apache.hudi.utilities.inline.fs.FileSystemTestUtils.getRandomOuterInMemPath;
+
+/**
+ * Tests {@link InlineFileSystem} using HFile writer and reader.
+ */
+public class TestHFileReadWriteFlow {
+
+  private final Configuration inMemoryConf;
+  private final Configuration inlineConf;
+  private final int minBlockSize = 1024;
+  private static final String LOCAL_FORMATTER = "%010d";
+  private int maxRows = 100 + RANDOM.nextInt(1000);
+  private Path generatedPath;
+
+  public TestHFileReadWriteFlow() {
+inMemoryConf = new Configuration();
+inMemoryConf.set("fs." + InMemoryFileSystem.SCHEME + ".impl", 
InMemoryFileSystem.class.getName());
+inlineConf = new Configuration();
+inlineConf.set("fs." + InlineFileSystem.SCHEME + ".impl", 
InlineFileSystem.class.getName());
+  }
+
+  @After
+  public void teardown() throws IOException {
+if (generatedPath != null) {
+  File filePath = new 
File(generatedPath.toString().substring(generatedPath.toString().indexOf(':') + 
1));
+  if (filePath.exists()) {
+FileSystemTestUtils.deleteFile(filePath);
+  }
+}
+  }
+
+  @Test
+  public void testSimpleInlineFileSystem() throws IOException {
+Path outerInMemFSPath = getRandomOuterInMemPath();
+Path outerPath = new Path(FILE_SCHEME + 
outerInMemFSPath.toString().substring(outerInMemFSPath.toString().indexOf(':')));
+generatedPath = outerPath;
+CacheConfig cacheConf = new CacheConfig(inMemoryConf);
+FSDataOutputStream fout = createFSOutput(outerInMemFSPath, inMemoryConf);
+HFileContext meta = new HFileContextBuilder()
+.withBlockSize(minBlockSize)
+.build();
+HFile.Writer writer = HFile.getWriterFactory(inMemoryConf, cacheConf)
+.withOutputStream(fout)
+.withFileContext(meta)
+.withComparator(new KeyValue.KVComparator())
+.create();
+
+writeRecords(writer);
+fout.close();
+
+byte[] inlineBytes = getBytesToInline(outerInMemFSPath);
+long startOffset = generateOuterFile(outerPath, inlineBytes);
+
+long inlineLength = inlineBytes.length;
+
+// Generate phantom inline file
+Path inlinePath = getPhantomFile(outerPath, startOffset, inlineLength);
+
+InlineFileSystem inlineFileSystem = (InlineFileSystem) 
inlinePath.getFileSystem(inlineConf);
+FSDataInputStream fin = inlineFileSystem.open(inlinePath);
+
+HFile.Reader reader = HFile.createReader(inlineFileSystem, inlinePath, 
cacheConf, inlineConf);
+// Load up the index.
+reader.loadFileInfo();
+// Get a scanner that caches and that does not use pread.
+HFileScanner scanner = 

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393291524
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InlineFileSystem.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Enables reading any inline file at a given offset and length. This {@link 
FileSystem} is used only in read path and does not support
+ * any write apis.
+ * 
+ * - Reading an inlined file at a given offset, length, read it out as if it 
were an independent file of that length
+ * - Inlined path is of the form 
"inlinefs:///path/to/outer/file//inline_file/?start_offset==
+ * 
+ * TODO: The reader/writer may try to use relative paths based on the 
inlinepath and it may not work. Need to handle
+ * this gracefully eg. the parquet summary metadata reading. TODO: If this 
shows promise, also support directly writing
 
 Review comment:
   In any case, please file JIRA for these gaps after we land this 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393291311
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InlineFileSystem.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Enables reading any inline file at a given offset and length. This {@link 
FileSystem} is used only in read path and does not support
+ * any write apis.
+ * 
+ * - Reading an inlined file at a given offset, length, read it out as if it 
were an independent file of that length
+ * - Inlined path is of the form 
"inlinefs:///path/to/outer/file//inline_file/?start_offset==
+ * 
+ * TODO: The reader/writer may try to use relative paths based on the 
inlinepath and it may not work. Need to handle
+ * this gracefully eg. the parquet summary metadata reading. TODO: If this 
shows promise, also support directly writing
 
 Review comment:
   can you expand on this? supporting parquet summary metadata is a critical 
thing for the PR IMO


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393289965
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
+   * Output: "inlinefs:/file1/file/inline_file/?start_offset=20=40"
+   *
+   * @param outerPath
+   * @param origScheme
+   * @param inLineStartOffset
+   * @param inLineLength
+   * @return
+   */
+  public static Path getEmbeddedInLineFilePath(Path outerPath, String 
origScheme, long inLineStartOffset, long inLineLength) {
+String subPath = 
outerPath.toString().substring(outerPath.toString().indexOf(":") + 1);
+return new Path(
+InlineFileSystem.SCHEME + "://" + subPath + "/" + origScheme + "/" + 
INLINE_FILE_STR + "/"
++ "?" + START_OFFSET_STR + EQUALS_STR + inLineStartOffset + "&" + 
LENGTH_STR + EQUALS_STR + inLineLength
+);
+  }
+
+  /**
+   * Eg input : "inlinefs:/file1/file/inline_file/?start_offset=20=40".
+   * Output : "file:/file1"
+   *
+   * @param inlinePath
+   * @param outerScheme
+   * @return
+   */
+  public static Path getOuterfilePathFromInlinePath(Path inlinePath, String 
outerScheme) {
 
 Review comment:
   above example does not make it easy to understand what `outerScheme` is? is 
n't `outerScheme` always `inlinefs` if this is an inlinePath? Should we need to 
pass it in? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393336202
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InlineFileSystem.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Enables reading any inline file at a given offset and length. This {@link 
FileSystem} is used only in read path and does not support
+ * any write apis.
+ * 
+ * - Reading an inlined file at a given offset, length, read it out as if it 
were an independent file of that length
+ * - Inlined path is of the form 
"inlinefs:///path/to/outer/file//inline_file/?start_offset==
+ * 
+ * TODO: The reader/writer may try to use relative paths based on the 
inlinepath and it may not work. Need to handle
+ * this gracefully eg. the parquet summary metadata reading. TODO: If this 
shows promise, also support directly writing
+ * the inlined file to the underneath file without buffer
+ */
+public class InlineFileSystem extends FileSystem {
+
+  static final String SCHEME = "inlinefs";
+  private Configuration conf = null;
+
+  @Override
+  public void initialize(URI name, Configuration conf) throws IOException {
+super.initialize(name, conf);
+this.conf = conf;
+  }
+
+  @Override
+  public URI getUri() {
+return URI.create(getScheme());
+  }
+
+  public String getScheme() {
+return SCHEME;
+  }
+
+  @Override
+  public FSDataInputStream open(Path inlinePath, int bufferSize) throws 
IOException {
+Path outerPath = InLineFSUtils.getOuterfilePathFromInlinePath(inlinePath, 
getScheme());
+FileSystem outerFs = outerPath.getFileSystem(conf);
+FSDataInputStream outerStream = outerFs.open(outerPath, bufferSize);
+return new InlineFsDataInputStream(InLineFSUtils.startOffset(inlinePath), 
outerStream, InLineFSUtils.length(inlinePath));
+  }
+
+  @Override
+  public boolean exists(Path f) {
+try {
+  return getFileStatus(f) != null;
+} catch (Exception e) {
+  return false;
+}
+  }
+
+  @Override
+  public FileStatus getFileStatus(Path inlinePath) throws IOException {
+Path outerPath = InLineFSUtils.getOuterfilePathFromInlinePath(inlinePath, 
getScheme());
+FileSystem outerFs = outerPath.getFileSystem(conf);
+FileStatus status = outerFs.getFileStatus(outerPath);
+FileStatus toReturn = new FileStatus(InLineFSUtils.length(inlinePath), 
status.isDirectory(), status.getReplication(), status.getBlockSize(),
 
 Review comment:
   and all of this code will read nicely.. as just getters on that pojo 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393286954
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
+   * Output: "inlinefs:/file1/file/inline_file/?start_offset=20=40"
+   *
+   * @param outerPath
+   * @param origScheme
+   * @param inLineStartOffset
+   * @param inLineLength
+   * @return
+   */
+  public static Path getEmbeddedInLineFilePath(Path outerPath, String 
origScheme, long inLineStartOffset, long inLineLength) {
+String subPath = 
outerPath.toString().substring(outerPath.toString().indexOf(":") + 1);
 
 Review comment:
   ensure that `toString()` will always yield something with the scheme? may be 
there is a more direct method for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393288343
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
 
 Review comment:
   whats the purpose of this string? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393292323
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InlineFileSystem.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Enables reading any inline file at a given offset and length. This {@link 
FileSystem} is used only in read path and does not support
+ * any write apis.
+ * 
+ * - Reading an inlined file at a given offset, length, read it out as if it 
were an independent file of that length
+ * - Inlined path is of the form 
"inlinefs:///path/to/outer/file//inline_file/?start_offset==
+ * 
+ * TODO: The reader/writer may try to use relative paths based on the 
inlinepath and it may not work. Need to handle
+ * this gracefully eg. the parquet summary metadata reading. TODO: If this 
shows promise, also support directly writing
+ * the inlined file to the underneath file without buffer
+ */
+public class InlineFileSystem extends FileSystem {
+
+  static final String SCHEME = "inlinefs";
 
 Review comment:
   I think the InlineFSUtils can assume this scheme and can become lot simpler 
to read, with fewer args in methods. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393288676
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
+   * Output: "inlinefs:/file1/file/inline_file/?start_offset=20=40"
+   *
+   * @param outerPath
+   * @param origScheme
+   * @param inLineStartOffset
+   * @param inLineLength
+   * @return
+   */
+  public static Path getEmbeddedInLineFilePath(Path outerPath, String 
origScheme, long inLineStartOffset, long inLineLength) {
+String subPath = 
outerPath.toString().substring(outerPath.toString().indexOf(":") + 1);
+return new Path(
+InlineFileSystem.SCHEME + "://" + subPath + "/" + origScheme + "/" + 
INLINE_FILE_STR + "/"
++ "?" + START_OFFSET_STR + EQUALS_STR + inLineStartOffset + "&" + 
LENGTH_STR + EQUALS_STR + inLineLength
 
 Review comment:
   break to a new line at `"&"` for readability? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393336002
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InlineFileSystem.java
 ##
 @@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.IOException;
+import java.net.URI;
+
+/**
+ * Enables reading any inline file at a given offset and length. This {@link 
FileSystem} is used only in read path and does not support
+ * any write apis.
+ * 
+ * - Reading an inlined file at a given offset, length, read it out as if it 
were an independent file of that length
+ * - Inlined path is of the form 
"inlinefs:///path/to/outer/file//inline_file/?start_offset==
+ * 
+ * TODO: The reader/writer may try to use relative paths based on the 
inlinepath and it may not work. Need to handle
+ * this gracefully eg. the parquet summary metadata reading. TODO: If this 
shows promise, also support directly writing
+ * the inlined file to the underneath file without buffer
+ */
+public class InlineFileSystem extends FileSystem {
+
+  static final String SCHEME = "inlinefs";
+  private Configuration conf = null;
+
+  @Override
+  public void initialize(URI name, Configuration conf) throws IOException {
+super.initialize(name, conf);
+this.conf = conf;
+  }
+
+  @Override
+  public URI getUri() {
+return URI.create(getScheme());
+  }
+
+  public String getScheme() {
+return SCHEME;
+  }
+
+  @Override
+  public FSDataInputStream open(Path inlinePath, int bufferSize) throws 
IOException {
+Path outerPath = InLineFSUtils.getOuterfilePathFromInlinePath(inlinePath, 
getScheme());
 
 Review comment:
   can we introduce a pojo for this? Something that can keep an outerPath, 
scheme, startOffset and length together.. we can have this utils method return 
that.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393286265
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
+   * Output: "inlinefs:/file1/file/inline_file/?start_offset=20=40"
+   *
+   * @param outerPath
+   * @param origScheme
+   * @param inLineStartOffset
+   * @param inLineLength
+   * @return
+   */
+  public static Path getEmbeddedInLineFilePath(Path outerPath, String 
origScheme, long inLineStartOffset, long inLineLength) {
 
 Review comment:
   rename: getInlineFilePath()? (Prefer Inline to InLine everywhere for 
camelcasing it) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393285575
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
 
 Review comment:
   may be a better example on the second line? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393289484
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
+   * Output: "inlinefs:/file1/file/inline_file/?start_offset=20=40"
+   *
+   * @param outerPath
+   * @param origScheme
+   * @param inLineStartOffset
+   * @param inLineLength
+   * @return
+   */
+  public static Path getEmbeddedInLineFilePath(Path outerPath, String 
origScheme, long inLineStartOffset, long inLineLength) {
+String subPath = 
outerPath.toString().substring(outerPath.toString().indexOf(":") + 1);
+return new Path(
+InlineFileSystem.SCHEME + "://" + subPath + "/" + origScheme + "/" + 
INLINE_FILE_STR + "/"
++ "?" + START_OFFSET_STR + EQUALS_STR + inLineStartOffset + "&" + 
LENGTH_STR + EQUALS_STR + inLineLength
+);
+  }
+
+  /**
+   * Eg input : "inlinefs:/file1/file/inline_file/?start_offset=20=40".
+   * Output : "file:/file1"
+   *
+   * @param inlinePath
+   * @param outerScheme
+   * @return
+   */
+  public static Path getOuterfilePathFromInlinePath(Path inlinePath, String 
outerScheme) {
+String scheme = inlinePath.getParent().getParent().getName();
+Path basePath = inlinePath.getParent().getParent().getParent();
+return new Path(basePath.toString().replaceFirst(outerScheme, scheme));
 
 Review comment:
   Similarly above, could we have just replaced `file:` with `inlinefs`, 
instead of indexOf(). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r393287633
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InLineFSUtils.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.fs.Path;
+
+/**
+ * Utils to parse InlineFileSystem paths.
+ * Inline FS format: 
"inlinefs:"
+ * 
"inlinefs:///inline_file/?start_offset=start_offset>="
+ */
+public class InLineFSUtils {
+
+  private static final String INLINE_FILE_STR = "inline_file";
+  private static final String START_OFFSET_STR = "start_offset";
+  private static final String LENGTH_STR = "length";
+  private static final String EQUALS_STR = "=";
+
+  /**
+   * Fetch embedded inline file path from outer path.
+   * Eg
+   * Input:
+   * Path = file:/file1, origScheme: file, startOffset = 20, length = 40
 
 Review comment:
   use a different scheme like hdfs: or s3a for illustration? :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] garyli1019 commented on issue #1362: [WIP]HUDI-644 Implement checkpoint generator helper tool

2020-03-16 Thread GitBox
garyli1019 commented on issue #1362: [WIP]HUDI-644 Implement checkpoint 
generator helper tool
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-599779655
 
 
   > So, I my guess is, we will explore a way to generate checkpoints from 
different other mechanisms like connect-hdfs.?
   
   @vinothchandar right. Step1: implement the tool. Step2: find a way to 
integrate it with the initial bulk insert or the HDFS importer. In this way we 
can provide a migration guide of the delta streamer to the users. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-16 Thread GitBox
umehrot2 commented on issue #1406: [HUDI-713] Fix conversion of Spark array of 
struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-599776958
 
 
   > @umehrot2 are you interested in reviewing this? :)
   
   For sure. I either ways have to review it internally as well :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1413: [HUDI-539] Add "Config" constructor to HoodieROTablePathFilter

2020-03-16 Thread GitBox
codecov-io commented on issue #1413: [HUDI-539] Add "Config" constructor to 
HoodieROTablePathFilter
URL: https://github.com/apache/incubator-hudi/pull/1413#issuecomment-599763248
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=h1) 
Report
   > Merging 
[#1413](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/3ef9e885cacc064fc316c61c7c826f3a1cb96da0?src=pr=desc)
 will **increase** coverage by `0.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1413/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1413  +/-   ##
   ===
   + Coverage 67.69%   67.7%   +0.01% 
 Complexity  243 243  
   ===
 Files   338 338  
 Lines 16371   16374   +3 
 Branches   16721672  
   ===
   + Hits  11082   11086   +4 
 Misses 45484548  
   + Partials741 740   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...rg/apache/hudi/hadoop/HoodieROTablePathFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1413/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVJPVGFibGVQYXRoRmlsdGVyLmphdmE=)
 | `64.17% <100%> (+1.67%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1413/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `76% <0%> (+4%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=footer).
 Last update 
[3ef9e88...142c972](https://codecov.io/gh/apache/incubator-hudi/pull/1413?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-539) RO Path filter does not pick up hadoop configs from the spark context

2020-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-539:

Labels: pull-request-available  (was: )

> RO Path filter does not pick up hadoop configs from the spark context
> -
>
> Key: HUDI-539
> URL: https://issues.apache.org/jira/browse/HUDI-539
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.5.1
> Environment: Spark version : 2.4.4
> Hadoop version : 2.7.3
> Databricks Runtime: 6.1
>Reporter: Sam Somuah
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> Hi,
>  I'm trying to use hudi to write to one of the Azure storage container file 
> systems, ADLS Gen 2 (abfs://). ABFS:// is one of the whitelisted file 
> schemes. The issue I'm facing is that in {{HoodieROTablePathFilter}} it tries 
> to get a file path passing in a blank hadoop configuration. This manifests as 
> {{java.io.IOException: No FileSystem for scheme: abfss}} because it doesn't 
> have any of the configuration in the environment.
> The problematic line is
> [https://github.com/apache/incubator-hudi/blob/2bb0c21a3dd29687e49d362ed34f050380ff47ae/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L96]
>  
> Stacktrace
> java.io.IOException: No FileSystem for scheme: abfss
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at 
> org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:96)
> at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:349)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bschell opened a new pull request #1413: [HUDI-539] Add "Config" constructor to HoodieROTablePathFilter

2020-03-16 Thread GitBox
bschell opened a new pull request #1413: [HUDI-539] Add "Config" constructor to 
HoodieROTablePathFilter
URL: https://github.com/apache/incubator-hudi/pull/1413
 
 
   Allows HoodieROTablePathFilter to accept a configuration for
   initializing the filesystem. This fixes a bug with Presto's use of this
   pathfilter. (when combined with the corresponding presto patch)
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1368: [HUDI-650] Modify handleUpdate path to validate partitionPath

2020-03-16 Thread GitBox
satishkotha commented on issue #1368: [HUDI-650] Modify handleUpdate path to 
validate partitionPath
URL: https://github.com/apache/incubator-hudi/pull/1368#issuecomment-599730581
 
 
   @bvaradar addressed comments, please take another look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1368: [HUDI-650] Modify handleUpdate path to validate partitionPath

2020-03-16 Thread GitBox
satishkotha commented on a change in pull request #1368: [HUDI-650] Modify 
handleUpdate path to validate partitionPath
URL: https://github.com/apache/incubator-hudi/pull/1368#discussion_r393275553
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java
 ##
 @@ -1208,6 +1207,83 @@ public void testRollingStatsWithSmallFileHandling() 
throws Exception {
 }
   }
 
+  @Test
+  public void testHandleUpdateWithMultiplePartitions() throws Exception {
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1368: [HUDI-650] Modify handleUpdate path to validate partitionPath

2020-03-16 Thread GitBox
satishkotha commented on a change in pull request #1368: [HUDI-650] Modify 
handleUpdate path to validate partitionPath
URL: https://github.com/apache/incubator-hudi/pull/1368#discussion_r393275515
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/table/TestMergeOnReadTable.java
 ##
 @@ -1208,6 +1207,83 @@ public void testRollingStatsWithSmallFileHandling() 
throws Exception {
 }
   }
 
+  @Test
+  public void testHandleUpdateWithMultiplePartitions() throws Exception {
+HoodieWriteConfig cfg = getConfig(true);
+try (HoodieWriteClient client = getWriteClient(cfg);) {
+
+  /**
+   * Write 1 (only inserts, written as parquet file)
+   */
+  String newCommitTime = "001";
+  client.startCommitWithTime(newCommitTime);
+
+  List records = dataGen.generateInserts(newCommitTime, 20);
+  JavaRDD writeRecords = jsc.parallelize(records, 1);
+
+  List statuses = client.upsert(writeRecords, 
newCommitTime).collect();
+  assertNoWriteErrors(statuses);
+
+  HoodieTableMetaClient metaClient = new 
HoodieTableMetaClient(jsc.hadoopConfiguration(), cfg.getBasePath());
+  HoodieMergeOnReadTable hoodieTable = (HoodieMergeOnReadTable) 
HoodieTable.getHoodieTable(metaClient, cfg, jsc);
+
+  Option deltaCommit = 
metaClient.getActiveTimeline().getDeltaCommitTimeline().firstInstant();
+  assertTrue(deltaCommit.isPresent());
+  assertEquals("Delta commit should be 001", "001", 
deltaCommit.get().getTimestamp());
+
+  Option commit = 
metaClient.getActiveTimeline().getCommitTimeline().firstInstant();
+  assertFalse(commit.isPresent());
+
+  FileStatus[] allFiles = 
HoodieTestUtils.listAllDataFilesInPath(metaClient.getFs(), cfg.getBasePath());
+  BaseFileOnlyView roView =
+  new HoodieTableFileSystemView(metaClient, 
metaClient.getCommitTimeline().filterCompletedInstants(), allFiles);
+  Stream dataFilesToRead = roView.getLatestBaseFiles();
+  assertFalse(dataFilesToRead.findAny().isPresent());
+
+  roView = new HoodieTableFileSystemView(metaClient, 
hoodieTable.getCompletedCommitsTimeline(), allFiles);
+  dataFilesToRead = roView.getLatestBaseFiles();
+  assertTrue("should list the parquet files we wrote in the delta commit",
+  dataFilesToRead.findAny().isPresent());
+
+  /**
+   * Write 2 (only updates, written to .log file)
+   */
+  newCommitTime = "002";
+  client.startCommitWithTime(newCommitTime);
+
+  records = dataGen.generateUpdates(newCommitTime, records);
+  writeRecords = jsc.parallelize(records, 1);
+  statuses = client.upsert(writeRecords, newCommitTime).collect();
+  assertNoWriteErrors(statuses);
+
+  /**
+   * Write 3 (only deletes, written to .log file)
+   */
+  final String newDeleteTime = "004";
+  final String partitionPath = records.get(0).getPartitionPath();
+  final String fileId = statuses.get(0).getFileId();
+  client.startCommitWithTime(newDeleteTime);
+
+  List fewRecordsForDelete = 
dataGen.generateDeletesFromExistingRecords(records);
+  JavaRDD deleteRDD = jsc.parallelize(fewRecordsForDelete, 
1);
+
+  // initialize partitioner
+  hoodieTable.getUpsertPartitioner(new WorkloadProfile(deleteRDD));
+  final List> deleteStatus = 
jsc.parallelize(Arrays.asList(1)).map(x -> {
+return hoodieTable.handleUpdate(newDeleteTime, partitionPath, fileId, 
fewRecordsForDelete.iterator());
+  }).map(x -> (List) 
HoodieClientTestUtils.collectStatuses(x)).collect();
+
+  // Verify there are  errors
+  WriteStatus status = deleteStatus.get(0).get(0);
+  assertTrue(status.hasErrors());
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1368: [HUDI-650] Modify handleUpdate path to validate partitionPath

2020-03-16 Thread GitBox
satishkotha commented on a change in pull request #1368: [HUDI-650] Modify 
handleUpdate path to validate partitionPath
URL: https://github.com/apache/incubator-hudi/pull/1368#discussion_r393275453
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
 ##
 @@ -61,6 +61,7 @@
   private Map> keyToNewRecords;
   private Set writtenRecordKeys;
   private HoodieStorageWriter storageWriter;
+  private String partitionPath;
 
 Review comment:
   Fixed. I tried to see if there is a way to have this partition validation 
check in one place, but couldn't find a common entry point. Please let me know 
if you have suggestions here. happy to refactor more if needed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1412: [HUDI-504] 
Restructuring and auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412#discussion_r393269575
 
 

 ##
 File path: .travis.yml
 ##
 @@ -0,0 +1,42 @@
+language: ruby
+rvm:
+  - 2.6.3
+
+git:
+  clone: false
+
+env:
+  global:
+- GIT_USER="CI BOT"
+- GIT_EMAIL="ci...@hudi.apache.org"
+- GIT_REPO="apache"
+- GIT_PROJECT="incubator-hudi"
+- GIT_BRANCH="asf-site"
+- DOCS_ROOT="`pwd`/${GIT_PROJECT}/docs"
+
+before_install:
+  - git config --global user.name ${GIT_USER}
+  - git config --global user.email ${GIT_EMAIL}
+  - git clone https://${GIT_TOKEN}@github.com/${GIT_REPO}/${GIT_PROJECT}.git
+  - cd ${GIT_PROJECT} && git checkout ${GIT_BRANCH}
+  - gem install bundler:2.0.2
+
+script:
+  - pushd ${DOCS_ROOT}
+  - bundle install
+  - bundle update --bundler
+  - bundle exec jekyll build _config.yml --source . --destination _site
+  - popd
+
+after_success:
 
 Review comment:
   >>I found a better way to control whether push build result or not, by using 
$TRAVIS_PULL_REQUEST env variable.
   
   Copying conversation from RFC.. I was still trying to understand 
https://docs.travis-ci.com/user/web-ui/#build-pushed-branches "Build pushed 
pull requests" option that we need to enable in travis CI. This is already ON 
and after each pull request is merged to `master`, a job runs. 
   
   I understand that's what we want to do here as well. i.e once the docs PR is 
approved, merged, then this job should run and auto generate the site.. 
   
   Few questions.. 
   
   1. How does this file interplay with `.travis.yml` on master.. Travis will 
only look into this file, since the PR's base is asf-site?
   2. TRAVIS_PULL_REQUEST is set to the pull request number if the current job 
is a pull request build, or false if it’s not. (from travis docs). Below you 
are exiting if its a PR and proceeding to build if it's a pushed pull request.. 
i.e the change has been landed on asf-site already and this job is triggered 
after that? 
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-03-16 Thread GitBox
vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for 
Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-599720266
 
 
   @prashantwason still driving this? Can I help get this moving along? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1362: [WIP]HUDI-644 Implement checkpoint generator helper tool

2020-03-16 Thread GitBox
vinothchandar commented on issue #1362: [WIP]HUDI-644 Implement checkpoint 
generator helper tool
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-599719902
 
 
   @garyli1019 I understand what you are getting at.. We had a similar issue 
cutting over pipelines and we handled that by having ability to force a 
checkpoint for a single run of delta streamer.. 
   
   So, I my guess is, we will explore a way to generate checkpoints from 
different other mechanisms like connect-hdfs.? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-16 Thread GitBox
vinothchandar commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-599717588
 
 
   I am bit confused at this point for the PR.. Can you summarize where we are 
at? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1408: [HUDI-437] Support user-defined index

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1408: [HUDI-437] Support 
user-defined index
URL: https://github.com/apache/incubator-hudi/pull/1408#discussion_r393257875
 
 

 ##
 File path: 
hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
 ##
 @@ -37,6 +37,8 @@
   public static final String INDEX_TYPE_PROP = "hoodie.index.type";
   public static final String DEFAULT_INDEX_TYPE = 
HoodieIndex.IndexType.BLOOM.name();
 
+  public static final String INDEX_CLASS_PROP = "hoodie.index.class";
 
 Review comment:
   for consistency, lets have a default value which is `""` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
lamber-ken commented on issue #1412: [HUDI-504] Restructuring and 
auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412#issuecomment-599703073
 
 
   hi @vinothchandar, the doc about `$TRAVIS_PULL_REQUEST` variable
   
https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
lamber-ken commented on issue #1412: [HUDI-504] Restructuring and 
auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412#issuecomment-599700989
 
 
   Missing `GIT_TOKEN` env variable.
   
![image](https://user-images.githubusercontent.com/20113411/76790051-0cf32380-67f9-11ea-9d24-9f790186a292.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-504) Restructuring and auto-generation of docs

2020-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-504:

Labels: pull-request-available  (was: )

> Restructuring and auto-generation of docs
> -
>
> Key: HUDI-504
> URL: https://issues.apache.org/jira/browse/HUDI-504
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Ethan Guo
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>
> RFC-10: Restructuring and auto-generation of docs
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1412: [HUDI-504] Restructuring and auto-generation of docs

2020-03-16 Thread GitBox
lamber-ken opened a new pull request #1412: [HUDI-504] Restructuring and 
auto-generation of docs
URL: https://github.com/apache/incubator-hudi/pull/1412
 
 
   ## What is the purpose of the pull request
   
   Go ahead with RFC-10: 
[RFC-10](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs)
   
   Generate docs by using travis and push the builded result to asf-site branch.
   
   ## Brief change log
   
 - Add .travis.yml file
   
   ## Verify this pull request
   
   **Main repo**
   https://github.com/lamber-ken/hdocs/tree/asf-site
   
   **Forked repo**
   https://github.com/BigDataArtisans/hdocs/tree/asf-site
   
   **Each submit PR** (ignore push build result)
   https://travis-ci.com/github/lamber-ken/hdocs/builds/153544519
   
   **After merged PR**
   https://www.travis-ci.org/github/lamber-ken/hdocs/builds/663171902
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-16 Thread GitBox
codecov-io edited a comment on issue #1406: [HUDI-713] Fix conversion of Spark 
array of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-599159357
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=h1) 
Report
   > Merging 
[#1406](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e=desc)
 will **increase** coverage by `0.29%`.
   > The diff coverage is `50.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1406/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1406  +/-   ##
   
   + Coverage 67.47%   67.77%   +0.29% 
   - Complexity  230  243  +13 
   
 Files   338  338  
 Lines 1636516369   +4 
 Branches   1671 1672   +1 
   
   + Hits  1104311094  +51 
   + Misses 4583 4533  -50 
   - Partials739  742   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...in/scala/org/apache/hudi/AvroConversionUtils.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1406/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25VdGlscy5zY2FsYQ==)
 | `54.16% <ø> (-5.10%)` | `0.00 <0.00> (ø)` | |
   | 
[...n/scala/org/apache/hudi/AvroConversionHelper.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1406/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvQXZyb0NvbnZlcnNpb25IZWxwZXIuc2NhbGE=)
 | `14.28% <50.00%> (+6.45%)` | `0.00 <0.00> (ø)` | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1406/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `72.00% <0.00%> (-4.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1406/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `77.00% <0.00%> (+37.63%)` | `20.00% <0.00%> (+13.00%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=footer).
 Last update 
[99b7e9e...dec5a88](https://codecov.io/gh/apache/incubator-hudi/pull/1406?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1350: [HUDI-629]: Replace Guava's Hashing with an equivalent in NumericUtils.java

2020-03-16 Thread GitBox
vinothchandar commented on a change in pull request #1350: [HUDI-629]: Replace 
Guava's Hashing with an equivalent in NumericUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1350#discussion_r393102633
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/util/NumericUtils.java
 ##
 @@ -31,4 +38,27 @@ public static String humanReadableByteCount(double bytes) {
 String pre = "KMGTPE".charAt(exp - 1) + "";
 return String.format("%.1f %sB", bytes / Math.pow(1024, exp), pre);
   }
+
+  public static long getMessageDigestHash(final String algorithmName, final 
String string) {
+MessageDigest md;
+try {
+  md = MessageDigest.getInstance(algorithmName);
+} catch (NoSuchAlgorithmException e) {
+  throw new HoodieException(e);
+}
+return 
asLong(Objects.requireNonNull(md).digest(string.getBytes(StandardCharsets.UTF_8)));
+  }
+
+  public static long asLong(byte[] bytes) {
+ValidationUtils.checkState(bytes.length >= 8, "HashCode#asLong() requires 
>= 8 bytes.");
+return padToLong(bytes);
+  }
+
+  public static long padToLong(byte[] bytes) {
+long retVal = (bytes[0] & 0xFF);
 
 Review comment:
   @s-sanjay do you see a specific problem? may be an example could help? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-16 Thread GitBox
vinothchandar commented on issue #1176: [HUDI-430] Adding InlineFileSystem to 
support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#issuecomment-599593502
 
 
   @nsivabalan its on my queue.. while you wait, can you please move the 
classes to the final locations.. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1406: [HUDI-713] Fix conversion of Spark array of struct type to Avro schema

2020-03-16 Thread GitBox
vinothchandar commented on issue #1406: [HUDI-713] Fix conversion of Spark 
array of struct type to Avro schema
URL: https://github.com/apache/incubator-hudi/pull/1406#issuecomment-599591998
 
 
   @umehrot2 are you interested in reviewing this? :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-480) Support a querying delete data methond in incremental view

2020-03-16 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060289#comment-17060289
 ] 

Vinoth Chandar commented on HUDI-480:
-

[~yanghua] This is an interesting topic for sure. Lets discuss.. 

 


Are you interested in getting both the before and after images like above or 
just want a stream of deleted record keys? 

> Support a querying delete data methond in incremental view
> --
>
> Key: HUDI-480
> URL: https://issues.apache.org/jira/browse/HUDI-480
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Incremental Pull
>Reporter: cdmikechen
>Priority: Minor
>
> As we known, hudi have supported many method to query data in Spark and Hive 
> and Presto. And it also provides a very good timeline idea to trace changes 
> in data, and it can be used to query incremental data in incremental view.
> In old time, we just have insert and update funciton to upsert data, and now 
> we have added new functions to delete some existing data.
> *[HUDI-328] Adding delete api to HoodieWriteClient* 
> https://github.com/apache/incubator-hudi/pull/1004
> *[HUDI-377] Adding Delete() support to 
> DeltaStreamer**https://github.com/apache/incubator-hudi/pull/1073
> So I think if we have delete api, should we add another method to get deleted 
> data in incremental view?
> I've looked at the methods for generating new parquet files. I think the main 
> idea is to combine old and new data, and then filter the data which need to 
> be deleted, so that the deleted data does not exist in the new dataset. 
> However, in this way, the data to be deleted will not be retained in new 
> dataset, so that only the inserted or modified data can be found according to 
> the existing timestamp field during data tracing in incremental view.
> If we can do it, I feel that there are two ideas to consider:
> 1. Trace the dataset in the same file at different time check points 
> according to the timeline, compare the two datasets according to the key and 
> filter out the deleted data. This method does not consume extra when writing, 
> but it needs to call the analysis function according to the actual request 
> during query, which consumes a lot.
> 2. When writing data, if there is any deleted data, we will record it. File 
> name such as *.delete_filename_version_timestamp*. So that we can immediately 
> give feedback according to the time. But additional processing will be done 
> at the time of writing.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1159: [HUDI-479] Eliminate or Minimize use of Guava if possible

2020-03-16 Thread GitBox
vinothchandar commented on issue #1159: [HUDI-479] Eliminate or Minimize use of 
Guava if possible
URL: https://github.com/apache/incubator-hudi/pull/1159#issuecomment-599589665
 
 
   @smarthi is this ready for final review? Have we eliminated Guava at this 
point? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
yanghua commented on issue #1411: [HUDI-695]Add unit test for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#issuecomment-599555941
 
 
   @hddong The Travis is red. Please check the reason.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Assigned] (HUDI-400) Add more checks to TestCompactionUtils#testUpgradeDowngrade

2020-03-16 Thread jerry (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jerry reassigned HUDI-400:
--

Assignee: jerry

> Add more checks to TestCompactionUtils#testUpgradeDowngrade
> ---
>
> Key: HUDI-400
> URL: https://issues.apache.org/jira/browse/HUDI-400
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: newbie, Testing
>Reporter: leesf
>Assignee: jerry
>Priority: Minor
>
> Currently, the TestCompactionUtils#testUpgradeDowngrade does not check 
> upgrade from old plan to new plan, it is proper to add some checks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-504) Restructuring and auto-generation of docs

2020-03-16 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-504:

Status: Open  (was: New)

> Restructuring and auto-generation of docs
> -
>
> Key: HUDI-504
> URL: https://issues.apache.org/jira/browse/HUDI-504
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Ethan Guo
>Assignee: lamber-ken
>Priority: Major
>
> RFC-10: Restructuring and auto-generation of docs
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+10+%3A+Restructuring+and+auto-generation+of+docs]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong commented on issue #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
hddong commented on issue #1411: [HUDI-695]Add unit test for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411#issuecomment-599450836
 
 
   @yanghua @vinothchandar please have a review when you are free.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-695) Add unit test for TableCommand

2020-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-695:

Labels: pull-request-available  (was: )

> Add unit test for TableCommand
> --
>
> Key: HUDI-695
> URL: https://issues.apache.org/jira/browse/HUDI-695
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> Add unit test for TableCommand in hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] hddong opened a new pull request #1411: [HUDI-695]Add unit test for TableCommand

2020-03-16 Thread GitBox
hddong opened a new pull request #1411: [HUDI-695]Add unit test for TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1411
 
 
   ## What is the purpose of the pull request
   
   *Add unit test for TableCommand in hudi-cli module*
   
   ## Brief change log
   
 - *Add unit test for TableCommand*
   
   ## Verify this pull request
   
 - *Add unit test for TableCommand*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-715) Fix duplicate name in TableCommand

2020-03-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-715:
--
Status: Open  (was: New)

> Fix duplicate name in TableCommand
> --
>
> Key: HUDI-715
> URL: https://issues.apache.org/jira/browse/HUDI-715
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> connect command has duplicate key name maxCheckIntervalMs, fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-715) Fix duplicate name in TableCommand

2020-03-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-715.
-
Fix Version/s: 0.6.0
   Resolution: Fixed

Fixed via master branch: 3ef9e885cacc064fc316c61c7c826f3a1cb96da0

> Fix duplicate name in TableCommand
> --
>
> Key: HUDI-715
> URL: https://issues.apache.org/jira/browse/HUDI-715
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> connect command has duplicate key name maxCheckIntervalMs, fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [HUDI-715] Fix duplicate name in TableCommand (#1410)

2020-03-16 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ef9e88  [HUDI-715] Fix duplicate name in TableCommand (#1410)
3ef9e88 is described below

commit 3ef9e885cacc064fc316c61c7c826f3a1cb96da0
Author: hongdd 
AuthorDate: Mon Mar 16 17:19:57 2020 +0800

[HUDI-715] Fix duplicate name in TableCommand (#1410)
---
 hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java
index 439b9c8..9dcd5c9 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java
@@ -54,7 +54,7 @@ public class TableCommand implements CommandMarker {
   help = "Enable eventual consistency") final boolean 
eventuallyConsistent,
   @CliOption(key = {"initialCheckIntervalMs"}, unspecifiedDefaultValue = 
"2000",
   help = "Initial wait time for eventual consistency") final Integer 
initialConsistencyIntervalMs,
-  @CliOption(key = {"maxCheckIntervalMs"}, unspecifiedDefaultValue = 
"30",
+  @CliOption(key = {"maxWaitIntervalMs"}, unspecifiedDefaultValue = 
"30",
   help = "Max wait time for eventual consistency") final Integer 
maxConsistencyIntervalMs,
   @CliOption(key = {"maxCheckIntervalMs"}, unspecifiedDefaultValue = "7",
   help = "Max checks for eventual consistency") final Integer 
maxConsistencyChecks)



[GitHub] [incubator-hudi] yanghua merged pull request #1410: [HUDI-715]Fix duplicate key name in TableCommand

2020-03-16 Thread GitBox
yanghua merged pull request #1410: [HUDI-715]Fix duplicate key name in 
TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1410
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hddong opened a new pull request #1410: [HUDI-715]Fix duplicate key name in TableCommand

2020-03-16 Thread GitBox
hddong opened a new pull request #1410: [HUDI-715]Fix duplicate key name in 
TableCommand
URL: https://github.com/apache/incubator-hudi/pull/1410
 
 
   ## What is the purpose of the pull request
   
   *`connect` command has duplicate key name `maxCheckIntervalMs`, fix it.*
   
   ## Brief change log
   
 - *Fix duplicate key name in TableCommand*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-715) Fix duplicate name in TableCommand

2020-03-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-715:

Labels: pull-request-available  (was: )

> Fix duplicate name in TableCommand
> --
>
> Key: HUDI-715
> URL: https://issues.apache.org/jira/browse/HUDI-715
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>
> connect command has duplicate key name maxCheckIntervalMs, fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-715) Fix duplicate name in TableCommand

2020-03-16 Thread hong dongdong (Jira)
hong dongdong created HUDI-715:
--

 Summary: Fix duplicate name in TableCommand
 Key: HUDI-715
 URL: https://issues.apache.org/jira/browse/HUDI-715
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: CLI
Reporter: hong dongdong
Assignee: hong dongdong


connect command has duplicate key name maxCheckIntervalMs, fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-692) Add delete savepoint for cli

2020-03-16 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong closed HUDI-692.
--
Fix Version/s: 0.6.0
   Resolution: Implemented

> Add delete savepoint for cli
> 
>
> Key: HUDI-692
> URL: https://issues.apache.org/jira/browse/HUDI-692
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
> to user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-692) Add delete savepoint for cli

2020-03-16 Thread hong dongdong (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hong dongdong updated HUDI-692:
---
Status: Open  (was: New)

> Add delete savepoint for cli
> 
>
> Key: HUDI-692
> URL: https://issues.apache.org/jira/browse/HUDI-692
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: CLI
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now, deleteSavepoint already provided in HoodieWriteClient, but not provide 
> to user, add it in CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-694) Add unit test for SparkEnvCommand

2020-03-16 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang closed HUDI-694.
-
Fix Version/s: 0.6.0
   Resolution: Implemented

Implemented via master branch: 55e6d348155f63eb128cd208687d02206bad66a5

> Add unit test for SparkEnvCommand
> -
>
> Key: HUDI-694
> URL: https://issues.apache.org/jira/browse/HUDI-694
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: CLI, Testing
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit test for SparkEnvCommand in hudi-cli



--
This message was sent by Atlassian Jira
(v8.3.4#803005)