[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597457998
 
 
   > One question about using nested schema. Can you remind me what happens if 
someone passes in a nested schema for CsvDeltaStreamer?
   
   I used some code below to test the nested schema for CSV reader in Spark.  
It throws the following exception, which means that Spark CSV source does not 
support nested schema currently.
   
   In most cases, the CSV schemas should be flattened.  It depends on Spark's 
behavior whether nested schema is supported for CSV source (in the future 
nested schema may be supported for CSV).  So we don't enforce the check in our 
Hudi code. 
   
   ```
   org.apache.spark.sql.AnalysisException: CSV data source does not support 
struct data type.;
   
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:69)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$$anonfun$verifySchema$1.apply(DataSourceUtils.scala:67)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifySchema(DataSourceUtils.scala:67)
at 
org.apache.spark.sql.execution.datasources.DataSourceUtils$.verifyReadSchema(DataSourceUtils.scala:41)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:400)
at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
at 
org.apache.hudi.utilities.sources.CsvDFSSource.fromFiles(CsvDFSSource.java:120)
at 
org.apache.hudi.utilities.sources.CsvDFSSource.fetchNextBatch(CsvDFSSource.java:93)
at 
org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:73)
at 
org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:66)
at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:317)
at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:226)
at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:121)
at 
org.apache.hudi.utilities.TestHoodieDeltaStreamer.testCsvDFSSourceWithNestedSchema(TestHoodieDeltaStreamer.java:812)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at 

[GitHub] [incubator-hudi] bvaradar commented on issue #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
bvaradar commented on issue #1392: [HUDI-689] Change CLI command names to not 
have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392#issuecomment-597434910
 
 
   @nsivabalan : Please review this PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1395: [HUDI-667] Fixing delete tests for DeltaStreamer

2020-03-10 Thread GitBox
bvaradar commented on issue #1395: [HUDI-667] Fixing delete tests for 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1395#issuecomment-597434678
 
 
   @lamber-ken : Can you review this PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597431353
 
 
   Sorry, I revert it, hudi already handled the empty checkpoint by 
`KafkaOffsetGen.KafkaResetOffsetStrategies`. 
   
   User can decide `EARLIEST` or `LATEST` by using `auto.offset.reset` property.
   
   
![image](https://user-images.githubusercontent.com/20113411/76381177-b10b4380-638f-11ea-8eb4-34542b6a06f3.png)
   
   
![image](https://user-images.githubusercontent.com/20113411/76380883-faa75e80-638e-11ea-8b6e-6eed6ff5aaa2.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken edited a comment on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer 
offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597431353
 
 
   Sorry, I revert it, hudi already handled the empty checkpoint by 
`KafkaOffsetGen.KafkaResetOffsetStrategies`
   
   So, IMO, we 
   
   
![image](https://user-images.githubusercontent.com/20113411/76380883-faa75e80-638e-11ea-8b6e-6eed6ff5aaa2.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597431353
 
 
   Sorry, I revert it, already handled the empty checkpoint by 
`KafkaOffsetGen.KafkaResetOffsetStrategies`
   
   
![image](https://user-images.githubusercontent.com/20113411/76380883-faa75e80-638e-11ea-8b6e-6eed6ff5aaa2.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
codecov-io commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-597430876
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=h1) 
Report
   > Merging 
[#1396](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806?src=pr=desc)
 will **increase** coverage by `<.01%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1396/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1396  +/-   ##
   ===
   + Coverage  67.4%   67.4%   +<.01% 
 Complexity  230 230  
   ===
 Files   336 336  
 Lines 16366   16379  +13 
 Branches   16721673   +1 
   ===
   + Hits  11031   11041  +10 
   - Misses 46024603   +1 
   - Partials733 735   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/table/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...oop/realtime/HoodieParquetRealtimeInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVBhcnF1ZXRSZWFsdGltZUlucHV0Rm9ybWF0LmphdmE=)
 | `73.4% <100%> (+0.28%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `93.24% <100%> (+1.05%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `81.3% <100%> (+1.96%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `25% <0%> (-50%)` | `0% <0%> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1396/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=footer).
 Last update 
[77d5b92...a023c31](https://codecov.io/gh/apache/incubator-hudi/pull/1396?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597428982
 
 
   Thanks for reviewing @garyli1019 @vinothchandar. Had updated the pr by 
double check empty checkpoint in `KafkaOffsetGen#checkupValidOffsets`.
   
   > I'd appreciate it if we took into consideration how checkpoint is handled 
in a general source agnostic way and also fix this issue..
   
   This is good idea as bvaradar suggested, but it seems impossible, because 
different data streams handle empty checkpoint in different way.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #213

2020-03-10 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.33 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[GitHub] [incubator-hudi] codecov-io commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
codecov-io commented on issue #1377: [HUDI-663] Fix HoodieDeltaStreamer offset 
not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#issuecomment-597426610
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr=h1) 
Report
   > Merging 
[#1377](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/5f85c267040fd51c186794fdae900162ab176b14?src=pr=desc)
 will **decrease** coverage by `66.32%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1377/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1377   +/-   ##
   
   - Coverage 66.96%   0.64%   -66.33% 
   + Complexity  223   2  -221 
   
 Files   334 289   -45 
 Lines 16276   14375 -1901 
 Branches   16611467  -194 
   
   - Hits  10900  92-10808 
   - Misses 4639   14280 +9641 
   + Partials737   3  -734
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/model/HoodieDeltaWriteStat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZURlbHRhV3JpdGVTdGF0LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...org/apache/hudi/common/model/HoodieFileFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVGb3JtYXQuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/IteratorBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvSXRlcmF0b3JCYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...rg/apache/hudi/index/bloom/KeyRangeLookupTree.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vS2V5UmFuZ2VMb29rdXBUcmVlLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...apache/hudi/timeline/service/handlers/Handler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvSGFuZGxlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/FunctionBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvRnVuY3Rpb25CYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/index/bloom/ListBasedIndexFileFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vTGlzdEJhc2VkSW5kZXhGaWxlRmlsdGVyLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | ... and [299 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1377/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1377?src=pr=footer).
 

[GitHub] [incubator-hudi] satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
satishkotha commented on issue #1396: [HUDI-687] Stop incremental reader on RO 
table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396#issuecomment-597421941
 
 
   @bvaradar sorry, I messed up rebase on 
https://github.com/apache/incubator-hudi/pull/1389, Please take a look at this 
instead. As discussed in the other PR, I updated RO and RT views. Spark 
DataSource does not seem to support MOR tables, so i'm skipping that part for 
now. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha opened a new pull request #1396: [HUDI-687] Stop incremental reader on RO table before a pending compaction

2020-03-10 Thread GitBox
satishkotha opened a new pull request #1396: [HUDI-687] Stop incremental reader 
on RO table before a pending compaction
URL: https://github.com/apache/incubator-hudi/pull/1396
 
 
   ## What is the purpose of the pull request
   example timeline:
   
   t0 -> create bucket1.parquet
   t1 -> create and append updates bucket1.log
   t2 -> request compaction
   t3 -> create bucket2.parquet
   
   if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat may make progress to read commits at t3 and skip data 
ingested at t1 leading to 'data loss' .(Data will still be on disk, but 
incremental readers wont see it because its in log file and readers move to t3)
   
   To workaround this problem, we want to stop returning data belonging to 
commits > compaction_requested/inprogress_instant. After compaction is 
complete, incremental reader would see updates in t2, t3, so on. Disadvantage 
is that long running compactions can make it look like reader is 'stuck'. But 
that is better than skipping updates.
   
   ## Brief change log
   
   - Change HoodieParquetInputFormat to read commits prior to compaction instant
   - Added unit tests to validate behavior
   - Fix broken test utils for reading records
   
   ## Verify this pull request
   This change added tests and can be verified as follows:
   mvn test (TestMergeOnReadTable and TestHoodieActiveTimeline)
   
   Some discussion is on https://github.com/apache/incubator-hudi/pull/1389, 
sorry I messed up rebase, so resending as a new PR to avoid confusion
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha closed pull request #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
satishkotha closed pull request #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
codecov-io edited a comment on issue #1389: [HUDI-687] Stop incremental reader 
when there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-596951199
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=h1) 
Report
   > Merging 
[#1389](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/415882f9023795994e9cc8a8294909bbec7ab191?src=pr=desc)
 will **increase** coverage by `0.21%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1389/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1389  +/-   ##
   ===
   + Coverage 67.19%   67.4%   +0.21% 
   - Complexity  223 230   +7 
   ===
 Files   335 336   +1 
 Lines 16279   16376  +97 
 Branches   16611673  +12 
   ===
   + Hits  10939   11039 +100 
   + Misses 46044602   -2 
   + Partials736 735   -1
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/table/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[.../java/org/apache/hudi/client/HoodieReadClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0hvb2RpZVJlYWRDbGllbnQuamF2YQ==)
 | `100% <100%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllU3BhcmtTcWxXcml0ZXIuc2NhbGE=)
 | `52.79% <100%> (-0.87%)` | `0 <0> (ø)` | |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `93.24% <100%> (+1.05%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `50.56% <100%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `80.99% <100%> (+1.65%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=)
 | `72.41% <100%> (ø)` | `38 <0> (ø)` | :arrow_down: |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0%> (-5.56%)` | `0% <0%> (ø)` | |
   | 
[...a/org/apache/hudi/client/AbstractHoodieClient.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpZW50L0Fic3RyYWN0SG9vZGllQ2xpZW50LmphdmE=)
 | `76.31% <0%> (-2.64%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   | ... and [4 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io commented on issue #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

2020-03-10 Thread GitBox
codecov-io commented on issue #1394: [HUDI-656][Performance] Return a dummy 
Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394#issuecomment-597393691
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=h1) 
Report
   > Merging 
[#1394](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **decrease** coverage by `0.01%`.
   > The diff coverage is `75.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1394/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1394  +/-   ##
   
   - Coverage 67.40%   67.38%   -0.02% 
 Complexity  230  230  
   
 Files   336  337   +1 
 Lines 1636616369   +3 
 Branches   1672 1672  
   
   - Hits  1103111030   -1 
   - Misses 4602 4603   +1 
   - Partials733  736   +3 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...main/scala/org/apache/hudi/HudiEmptyRelation.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSHVkaUVtcHR5UmVsYXRpb24uc2NhbGE=)
 | `66.66% <66.66%> (ø)` | `0.00 <0.00> (?)` | |
   | 
[...src/main/scala/org/apache/hudi/DefaultSource.scala](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvRGVmYXVsdFNvdXJjZS5zY2FsYQ==)
 | `70.58% <100.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1394/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=footer).
 Last update 
[77d5b92...4e9198c](https://codecov.io/gh/apache/incubator-hudi/pull/1394?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
lamber-ken commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r390695516
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   hi @bvaradar, add `!commitMetadata.getMetadata(CHECKPOINT_KEY).isEmpty()`.
   but if we do that, the application will always throw 
`HoodieDeltaStreamerException`
   
![image](https://user-images.githubusercontent.com/20113411/76372301-ee63d700-6377-11ea-863a-21a99028dc5d.png)
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-597385302
 
 
   looks like i messed up merging. i'm going to close this one and open a new 
PR. sorry about noise


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan closed pull request #1393: [WIP] Fixing delta streamer tests.

2020-03-10 Thread GitBox
nsivabalan closed pull request #1393: [WIP] Fixing delta streamer tests. 
URL: https://github.com/apache/incubator-hudi/pull/1393
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1395: [HUDI-667] Fixing delete tests for DeltaStreamer

2020-03-10 Thread GitBox
nsivabalan opened a new pull request #1395: [HUDI-667] Fixing delete tests for 
DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1395
 
 
   PR fixes a bug in delete record generation for tests in hoodie delta 
streamer. 
   
   ## Brief change log
   - Fixing a bug in delete record generation for tests in hoodie delta streamer
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests. Most tests in 
TestHoodieDeltaStreamer tests deletes. Have to make some fixes to continuous 
tests as part of the bug fix.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-667) HoodieTestDataGenerator does not delete keys correctly

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-667:

Labels: pull-request-available  (was: )

> HoodieTestDataGenerator does not delete keys correctly
> --
>
> Key: HUDI-667
> URL: https://issues.apache.org/jira/browse/HUDI-667
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Prashant Wason
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HoodieTestDataGenerator is used to generate sample data for unit-tests. It 
> allows generating HoodieRecords for insert/update/delete. It maintains the 
> record keys in a HashMap.
> private final Map existingKeys;
> There are two issues in the implementation:
>  # Delete from existingKeys uses KeyPartition rather than Integer keys
>  # Inserting records after deletes is not correctly handled
> The implementation uses the Integer key so that values can be looked up 
> randomly. Assume three values were inserted, then the HashMap will hold:
> 0 -> KeyPartition1
> 1 -> KeyPartition2
> 2 -> KeyPartition3
> Now if we delete KeyPartition2  (generate a random record for deletion), the 
> HashMap will be:
> 0 -> KeyPartition1
> 2 -> KeyPartition3
>  
> Now if we issue a insertBatch() then the insert is 
> existingKeys.put(existingKeys.size(), KeyPartition3) which will overwrite the 
> KeyPartition3 already in the map rather than actually inserting a new entry 
> in the map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-656) Write Performance - Driver spends too much time creating Parquet DataSource after writes

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-656:

Labels: pull-request-available  (was: )

> Write Performance - Driver spends too much time creating Parquet DataSource 
> after writes
> 
>
> Key: HUDI-656
> URL: https://issues.apache.org/jira/browse/HUDI-656
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Performance, Spark Integration
>Reporter: Udit Mehrotra
>Assignee: Udit Mehrotra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> h2. Problem Statement
> We have noticed this performance bottleneck at EMR, and it has been reported 
> here as well [https://github.com/apache/incubator-hudi/issues/1371]
> Hudi for writes through DataSource API uses 
> [this|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L85]
>  to create the spark relation. Here it uses HoodieSparkSqlWriter to write the 
> dataframe and after it tries to 
> [return|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L92]
>  a relation by creating it through parquet data source 
> [here|https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala#L72]
> In the process of creating this parquet data source, Spark creates an 
> *InMemoryFileIndex* 
> [here|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L371]
>  as part of which it performs file listing of the base path. While the 
> listing itself is 
> [parallelized|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L289],
>  the filter that we pass which is *HoodieROTablePathFilter* is applied 
> [sequentially|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L294]
>  on the driver side on all the 1000s of files returned during listing. This 
> part is not parallelized by spark, and it takes a lot of time probably 
> because of the filters logic. This causes the driver to just spend time 
> filtering. We have seen it take 10-12 minutes to do this process for just 50 
> partitions in S3, and this time is spent after the writing has finished.
> Solving this will significantly reduce the writing time across all sorts of 
> writes. This time is essentially getting wasted, because we do not really 
> have to return a relation after the write. This relation is never really used 
> by Spark either ways 
> [here|https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SaveIntoDataSourceCommand.scala#L45]
>  and writing process returns empty set of rows..
> h2. Proposed Solution
> Proposal is to return an Empty Spark relation after the write, which will cut 
> down all this unnecessary time spent to create a parquet relation that never 
> gets used.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] umehrot2 opened a new pull request #1394: [HUDI-656][Performance] Return a dummy Spark relation after writing the DataFrame

2020-03-10 Thread GitBox
umehrot2 opened a new pull request #1394: [HUDI-656][Performance] Return a 
dummy Spark relation after writing the DataFrame
URL: https://github.com/apache/incubator-hudi/pull/1394
 
 
   ## What is the purpose of the pull request
   
   This PR fixes the performance issue mentioned in 
https://issues.apache.org/jira/browse/HUDI-656 by returning a dummy Spark 
relation after the write, instead of creating a Parquet data source relation.
   
   ## Brief change log
   
   - Update `DefaultSource.scala` to return a dummy relation after writing the 
data frame
   - Added a dummy relation `HudiEmptyRelation`
   
   ## Verify this pull request
   
   - Manual verification on EMR cluster
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha closed issue #910: hoodie.*.consume.* should be set whitelist in hive-site.xml

2020-03-10 Thread GitBox
bhasudha closed issue #910: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
URL: https://github.com/apache/incubator-hudi/issues/910
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on issue #910: hoodie.*.consume.* should be set whitelist in hive-site.xml

2020-03-10 Thread GitBox
bhasudha commented on issue #910: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
URL: https://github.com/apache/incubator-hudi/issues/910#issuecomment-597340894
 
 
   > @bhasudha : Can you kindly let me know if this is documented so that we 
can close this ticket.
   
   This is not documented yet. I created a jira issue to track this - 
https://issues.apache.org/jira/browse/HUDI-691. Will close the GH issue to 
track this in Jira further.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-691) hoodie.*.consume.* should be set whitelist in hive-site.xml

2020-03-10 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-691:
--

 Summary: hoodie.*.consume.* should be set whitelist in 
hive-site.xml
 Key: HUDI-691
 URL: https://issues.apache.org/jira/browse/HUDI-691
 Project: Apache Hudi (incubating)
  Issue Type: Task
  Components: Docs, newbie
Reporter: Bhavani Sudha
 Fix For: 0.6.0


More details in this GH issue - 
https://github.com/apache/incubator-hudi/issues/910



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056437#comment-17056437
 ] 

Jasmine Omeke edited comment on HUDI-690 at 3/10/20, 9:42 PM:
--

pinging to triage 

[~vbalaji]


was (Author: jomeke):
 

[~vbalaji]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Priority: Major
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> 

[jira] [Commented] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056437#comment-17056437
 ] 

Jasmine Omeke commented on HUDI-690:


 

[~vbalaji]

> filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR 
> tables
> 
>
> Key: HUDI-690
> URL: https://issues.apache.org/jira/browse/HUDI-690
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: Jasmine Omeke
>Priority: Major
>
> Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
> Backup of merge on read tables: 
> [https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]
>  
> The error:
>  
> {code:java}
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
> /.hoodie/hoodie.properties
> 20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
> web-proxy.bt.local Proxy Port: 3128
> 20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
> MERGE_ON_READ from 
> 20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
> java.util.stream.ReferencePipeline$Head@77f7352a
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) 
> with ID 2
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has 
> registered (new total is 1)
> 20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
> BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
> 1, None)
> 20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
> executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) 
> with ID 4
> 20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has 
> registered (new total is 2)Exception in thread "main" 
> java.lang.IllegalStateException: Hudi File Id 
> (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
> fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
> compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
> "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
>  ".7104bb0b-20f6-4dec-981b-c11
> bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
> 177872977", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
>  "dataFilePath": 
> "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
>  "fileId": "7
> 104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
> "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
> "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
> "TOTAL_IO_WRITE_MB": 512.0,
>  "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
> (20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
> [".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
>  
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
>  ".7104bb0b-20f6-4dec-981b-c11bf20ad
> e4a-0_20200308180755.log.4_3-727192-165430450", 
> ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
>  "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
> 0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
> "partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
> 5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
> 44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
> "TOTAL_LOG_FILE_SIZE": 44197.0}})at 
> org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> at 
> java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at 
> 

[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390527605
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
 
 Review comment:
   We can move this up


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390471157
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
 
 Review comment:
   Link seems to point to 0.5.1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390505002
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
 
 Review comment:
   Move this down ? Also, maybe add a line to describe an example on how to use 
it ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390622214
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
 
 Review comment:
   nit: change "Client allows to overwrite the payload implementation in 
`hoodie.properties` " to "Support for overwriting payload implementation in 
`hoodie.properties`.
   
   Also specify, how to do this ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390622859
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
+ * With 0.5.2, the community has supported to published the coverage to 
codecov.io on every build. With this feature, the community will know the 
change of test coverage more clearly.
 
 Review comment:
   Wondering if this should be part of Release note ? It is not user facing 
right ? It is only interesting for PR submitters. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390621530
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390528487
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
 
 Review comment:
   You can group all the CLI related changes together and add sub-bullet points.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390527272
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
 
 Review comment:
   nit: load -> loading 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390527950
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
 
 Review comment:
   Link to any config that needs to setup ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390623457
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
+ * With 0.5.2, the community has supported to published the coverage to 
codecov.io on every build. With this feature, the community will know the 
change of test coverage more clearly.
+ * A `JdbcbasedSchemaProvider` schema provider has been provided to get 
metadata through JDBC. For the use case that users want to synchronize data 
from MySQL, and at the same time, want to get the schema from the database, 
it's very helpful.
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554))
 
 Review comment:
   I think we can skip the refactoring part unless it is user-facing. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-03-10 Thread GitBox
yihua commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta 
Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-597292426
 
 
   Sorry for the delay.  I'll get to this PR this week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1393: [WIP] Fixing delta streamer tests.

2020-03-10 Thread GitBox
nsivabalan opened a new pull request #1393: [WIP] Fixing delta streamer tests. 
URL: https://github.com/apache/incubator-hudi/pull/1393
 
 
   WIP. Draft PR.
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
prashantwason commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390562273
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -394,8 +394,8 @@ private void testUpsertsContinuousMode(HoodieTableType 
tableType, String tempDir
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, datasetBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
-  TestHelpers.assertDistanceCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
+  TestHelpers.assertRecordCount(totalRecords + 200, datasetBasePath + 
"/*/*.parquet", sqlContext);
 
 Review comment:
   Thanks Sivabalan for such a quick reply.
   
   I had filed https://issues.apache.org/jira/browse/HUDI-667 to work on this 
fix. You may wish to use it to submit your PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390557620
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -394,8 +394,8 @@ private void testUpsertsContinuousMode(HoodieTableType 
tableType, String tempDir
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, datasetBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
-  TestHelpers.assertDistanceCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
+  TestHelpers.assertRecordCount(totalRecords + 200, datasetBasePath + 
"/*/*.parquet", sqlContext);
 
 Review comment:
   you might be right. as mentioned in the other thread, I am working on the 
fix. For some reason, my continuous tests times out w/ hitting the expected no 
of commits. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
nsivabalan commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390557169
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -435,11 +439,46 @@ public HoodieRecord generateUpdateRecord(HoodieKey key, 
String commitTime) throw
 index = (index + 1) % numExistingKeys;
 kp = existingKeys.get(index);
   }
+  existingKeys.remove(kp);
 
 Review comment:
   yes, you are right. I figured this recently. working on the fix. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
prashantwason commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390555211
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##
 @@ -394,8 +394,8 @@ private void testUpsertsContinuousMode(HoodieTableType 
tableType, String tempDir
   } else {
 TestHelpers.assertAtleastNCompactionCommits(5, datasetBasePath, dfs);
   }
-  TestHelpers.assertRecordCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
-  TestHelpers.assertDistanceCount(totalRecords, datasetBasePath + 
"/*/*.parquet", sqlContext);
+  TestHelpers.assertRecordCount(totalRecords + 200, datasetBasePath + 
"/*/*.parquet", sqlContext);
 
 Review comment:
   I could not reason why there is +200 here? The inserts/deletes/updates are 
calculated to keep the number of records equal to totalRecords.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1073: [HUDI-377] Adding Delete() support to DeltaStreamer

2020-03-10 Thread GitBox
prashantwason commented on a change in pull request #1073: [HUDI-377] Adding 
Delete() support to DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1073#discussion_r390553417
 
 

 ##
 File path: 
hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java
 ##
 @@ -435,11 +439,46 @@ public HoodieRecord generateUpdateRecord(HoodieKey key, 
String commitTime) throw
 index = (index + 1) % numExistingKeys;
 kp = existingKeys.get(index);
   }
+  existingKeys.remove(kp);
 
 Review comment:
   Shouldn't the remove be with the key rather than the value?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-690) filtercompletedInstants in HudiSnapshotCopier not working as expected for MOR tables

2020-03-10 Thread Jasmine Omeke (Jira)
Jasmine Omeke created HUDI-690:
--

 Summary: filtercompletedInstants in HudiSnapshotCopier not working 
as expected for MOR tables
 Key: HUDI-690
 URL: https://issues.apache.org/jira/browse/HUDI-690
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: Jasmine Omeke


Hi. I encountered an error while using the HudiSnapshotCopier class to make a 
Backup of merge on read tables: 
[https://github.com/apache/incubator-hudi/blob/release-0.5.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java]

 

The error:

 
{code:java}
20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
web-proxy.bt.local Proxy Port: 3128
20/03/09 15:43:19 INFO HoodieTableConfig: Loading dataset properties from 
/.hoodie/hoodie.properties
20/03/09 15:43:19 INFO AmazonHttpClient: Configuring Proxy. Proxy Host: 
web-proxy.bt.local Proxy Port: 3128
20/03/09 15:43:19 INFO HoodieTableMetaClient: Finished Loading Table of type 
MERGE_ON_READ from 
20/03/09 15:43:20 INFO HoodieActiveTimeline: Loaded instants 
java.util.stream.ReferencePipeline$Head@77f7352a
20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40894) with 
ID 2
20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 2 has registered 
(new total is 1)
20/03/09 15:43:21 INFO BlockManagerMasterEndpoint: Registering block manager 
ip-10-49-26-74.us-east-2.compute.internal:32831 with 12.4 GB RAM, 
BlockManagerId(2, ip-10-49-26-74.us-east-2.compute.internal, 3283
1, None)
20/03/09 15:43:21 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered 
executor NettyRpcEndpointRef(spark-client://Executor) (10.49.26.74:40902) with 
ID 4
20/03/09 15:43:21 INFO ExecutorAllocationManager: New executor 4 has registered 
(new total is 2)Exception in thread "main" java.lang.IllegalStateException: 
Hudi File Id (HoodieFileGroupId{partitionPath='created_at_month=2020-03', 
fileId='7104bb0b-20f6-4dec-981b-c11bf20ade4a-0'}) has more than 1 pending
compactions. Instants: (20200309011643,{"baseInstantTime": "20200308213934", 
"deltaFilePaths": 
[".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.1_3-751289-170568496",
 ".7104bb0b-20f6-4dec-981b-c11
bf20ade4a-0_20200308213934.log.2_3-761601-172985464", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.3_1-772174-175483657",
 ".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.4_2-782377-
177872977", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308213934.log.5_1-790994-179909226"],
 "dataFilePath": 
"7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-746201-169642460_20200308213934.parquet",
 "fileId": "7
104bb0b-20f6-4dec-981b-c11bf20ade4a-0", "partitionPath": 
"created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 5.0, 
"TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE": 33789.0, 
"TOTAL_IO_WRITE_MB": 512.0,
 "TOTAL_IO_MB": 1024.0, "TOTAL_LOG_FILE_SIZE": 33789.0}}), 
(20200308213934,{"baseInstantTime": "20200308180755", "deltaFilePaths": 
[".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.1_3-696047-158157865",
 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.2_2-706457-160605423",
 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.3_1-716977-163056814",
 ".7104bb0b-20f6-4dec-981b-c11bf20ad
e4a-0_20200308180755.log.4_3-727192-165430450", 
".7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_20200308180755.log.5_3-737755-167913339"],
 "dataFilePath": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0_0-690668-157158597_2
0200308180755.parquet", "fileId": "7104bb0b-20f6-4dec-981b-c11bf20ade4a-0", 
"partitionPath": "created_at_month=2020-03", "metrics": {"TOTAL_LOG_FILES": 
5.0, "TOTAL_IO_READ_MB": 512.0, "TOTAL_LOG_FILES_SIZE":
44197.0, "TOTAL_IO_WRITE_MB": 511.0, "TOTAL_IO_MB": 1023.0, 
"TOTAL_LOG_FILE_SIZE": 44197.0}})at 
org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Write 
release blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390530073
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
 
 Review comment:
   may be expand this a bit more? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Write 
release blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390531386
 
 

 ##
 File path: docs/_pages/releases.md
 ##
 @@ -6,6 +6,31 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
 
 Review comment:
   can we call out user facing changes for upgrading in a `### Migration Guide 
for this release` section right before highlights? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
vinothchandar commented on a change in pull request #1390: [HUDI-634] Write 
release blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390529875
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,31 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * CLI supports `temp_query` and `temp_delete` to query and delete temp view. 
This command creates a temp table. Users can write HiveQL queries against the 
table to filter the desired row.
+ * `TimestampBasedKeyGenerator` supports for data types convertible to String. 
Previously `TimestampBasedKeyGenerator` only supports `Double`, `Long`, `Float` 
and `String` 4 data types for the partition key. Now, users can convert date 
type to string in `TimestampBasedKeyGenerator`.
+ * Hudi now supports incremental pulling from defined partitions. For some use 
case that users only need to pull the incremental part of certain partitions, 
it can run faster by only load relevant parquet files.
+ * CLI allows users to specify option to print additional commit metadata, 
e.g. *Total Log Blocks*, *Total Rollback Blocks*, *Total Updated Records 
Compacted* and so on.
+ * With 0.5.2, hudi allows partition path to be updated with `GLOBAL_BLOOM` 
index.
+ * Client allows to overwrite the payload implementation in 
`hoodie.properties`. Previously, once the payload class is set once in 
`hoodie.properties`, it cannot be changed. In some cases, if a code refactor is 
done and the jar updated, one may need to pass the new payload class name.
+ * With 0.5.2, the community has supported to published the coverage to 
codecov.io on every build. With this feature, the community will know the 
change of test coverage more clearly.
+ * A `JdbcbasedSchemaProvider` schema provider has been provided to get 
metadata through JDBC. For the use case that users want to synchronize data 
from MySQL, and at the same time, want to get the schema from the database, 
it's very helpful.
+ * Simplify `HoodieBloomIndex` without the need for 2GB limit handling. Prior 
to spark 2.4.0, each spark partition has a limit of 2GB. In Hudi 0.5.1, after 
we upgraded to spark 2.4.4, we don't have the limitation anymore. Hence 
removing the safe parallelism constraint we had in` HoodieBloomIndex`.
+ * Write Client restructuring has moved classes around 
([HUDI-554](https://issues.apache.org/jira/browse/HUDI-554))
+   - `client` now has all the various client classes, that do the transaction 
management
 
 Review comment:
   can we remove the bullets and summarize it further?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
codecov-io edited a comment on issue #1392: [HUDI-689] Change CLI command names 
to not have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392#issuecomment-597241101
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=h1) 
Report
   > Merging 
[#1392](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1392/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1392  +/-   ##
   
   - Coverage 67.40%   67.37%   -0.03% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111027   -4 
 Misses 4602 4602  
   - Partials733  737   +4 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `72.00% <0.00%> (-4.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=footer).
 Last update 
[77d5b92...c3f11b8](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
codecov-io commented on issue #1392: [HUDI-689] Change CLI command names to not 
have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392#issuecomment-597241101
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=h1) 
Report
   > Merging 
[#1392](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **decrease** coverage by `0.02%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1392/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1392  +/-   ##
   
   - Coverage 67.40%   67.37%   -0.03% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111027   -4 
 Misses 4602 4602  
   - Partials733  737   +4 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `72.00% <0.00%> (-4.00%)` | `0.00% <0.00%> (ø%)` | |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1392/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0.00%> (-1.02%)` | `8.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=footer).
 Last update 
[77d5b92...c3f11b8](https://codecov.io/gh/apache/incubator-hudi/pull/1392?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
satishkotha commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-597231598
 
 
   > @satishkotha : I did a quick look. Will do a more comprehensive review 
later.
   > 
   > One quick comment :
   > The code change in HoodieParquetInputFormat also affects 
HoodieRealtimeParquetInputFormat which should not be the case. 
HoodieRealtimeParquetInputFormat should be allowed to read past earliest 
pending compaction instants.
   > 
   > Balaji.V
   
   Ah, didn't see the inheritance. Thanks for context. I'll work on fixing and 
adding unit tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-687) incremental reads on MOR tables using RO view can lead to missing updates

2020-03-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-687:

Summary: incremental reads on MOR tables using RO view can lead to missing 
updates  (was: incremental reads on MOR RO tables can lead to data loss)

> incremental reads on MOR tables using RO view can lead to missing updates
> -
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
bvaradar commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-597229288
 
 
   @satishkotha : Also, Spark DataSource for Incremental reads needs to employ 
similar mechanism 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-687) incremental reads on MOR RO tables can lead to data loss

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-687:

Summary: incremental reads on MOR RO tables can lead to data loss  (was: 
incremental reads on MOR tables can lead to data loss)

> incremental reads on MOR RO tables can lead to data loss
> 
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-687) incremental reads on MOR tables can lead to data loss

2020-03-10 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056186#comment-17056186
 ] 

Balaji Varadarajan commented on HUDI-687:
-

cc [~vinothchandar] 

Just to be really clear, the potential race-condition happens only when doing 
incremental read using RO view (not RT) against MOR table.  In this case, 
Incremental Read will not make progress past the earliest pending compaction 
time to avoid any data-loss.

 

> incremental reads on MOR tables can lead to data loss
> -
>
> Key: HUDI-687
> URL: https://issues.apache.org/jira/browse/HUDI-687
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] satishkotha opened a new pull request #1392: [HUDI-689] Change CLI command names to not have overlap

2020-03-10 Thread GitBox
satishkotha opened a new pull request #1392: [HUDI-689] Change CLI command 
names to not have overlap
URL: https://github.com/apache/incubator-hudi/pull/1392
 
 
   ## What is the purpose of the pull request
   I broke CLI when i added compactions show archived command.
I still dont understand spring shell well enough to explain why the 
existing commands wont work. But the alternative commands I picked seemed to 
work. Please let me know if any of you have seen similar issues with spring 
shell and how command parsing works.
   
   CLI 'commits show archived' fails with
   
   ->commits show archived
   Option '' is not available for this command. Use tab assist or the "help" 
command to see the legal options
   This seems to be because 'compactions show archived'. If i remove 
@CliCommand annotation from compactions, commits show archived works.
   
   ## Brief change log
   
   - Chose different command names to make all commands 'work'
   - change method names to not overlap with each other
   
   
   ## Verify this pull request
   
   Manually verified the change by running CLI
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-689:

Labels: pull-request-available  (was: )

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056166#comment-17056166
 ] 

Balaji Varadarajan commented on HUDI-689:
-

[~satishkotha] : Can you add more context for this ticket so that everybody can 
understand.

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan updated HUDI-689:

Status: Open  (was: New)

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Description: 
CLI 'commits show archived' fails with
{code}
->commits show archived
Option '' is not available for this command. Use tab assist or the "help" 
command to see the legal options
{code}

This seems to be because 'compactions show archived'. If i remove @CliCommand 
annotation from compactions, commits show archived works.  

  was:




> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Labels:   (was: pull-request-available)

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>
> CLI 'commits show archived' fails with
> {code}
> ->commits show archived
> Option '' is not available for this command. Use tab assist or the "help" 
> command to see the legal options
> {code}
> This seems to be because 'compactions show archived'. If i remove @CliCommand 
> annotation from compactions, commits show archived works.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-689) Fix hudi cli commands with overlap

2020-03-10 Thread satish (Jira)
satish created HUDI-689:
---

 Summary: Fix hudi cli commands with overlap
 Key: HUDI-689
 URL: https://issues.apache.org/jira/browse/HUDI-689
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
Reporter: satish
Assignee: satish


example timeline:

t0 -> create bucket1.parquet
t1 -> create and append updates bucket1.log
t2 -> request compaction 
t3 -> create bucket2.parquet

if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
(Data will still be on disk, but incremental readers wont see it because its in 
log file and readers move to t3)

To workaround this problem, we want to stop returning data belonging to commits 
> t1. After compaction is complete, incremental reader would see updates in t2, 
t3, so on.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Summary: Fix hudi cli commands with overlapping words  (was: Fix hudi cli 
commands with overlap)

> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>
> example timeline:
> t0 -> create bucket1.parquet
> t1 -> create and append updates bucket1.log
> t2 -> request compaction 
> t3 -> create bucket2.parquet
> if compaction at t2 takes a long time, incremental reads using 
> HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
> (Data will still be on disk, but incremental readers wont see it because its 
> in log file and readers move to t3)
> To workaround this problem, we want to stop returning data belonging to 
> commits > t1. After compaction is complete, incremental reader would see 
> updates in t2, t3, so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-689) Fix hudi cli commands with overlapping words

2020-03-10 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-689:

Description: 



  was:
example timeline:

t0 -> create bucket1.parquet
t1 -> create and append updates bucket1.log
t2 -> request compaction 
t3 -> create bucket2.parquet

if compaction at t2 takes a long time, incremental reads using 
HoodieParquetInputFormat can skip data ingested at t1 leading to 'data loss' 
(Data will still be on disk, but incremental readers wont see it because its in 
log file and readers move to t3)

To workaround this problem, we want to stop returning data belonging to commits 
> t1. After compaction is complete, incremental reader would see updates in t2, 
t3, so on.



> Fix hudi cli commands with overlapping words
> 
>
> Key: HUDI-689
> URL: https://issues.apache.org/jira/browse/HUDI-689
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: satish
>Assignee: satish
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan resolved HUDI-669.
-
Resolution: Duplicate

> HoodieDeltaStreamer offset not handled correctly when using LATEST offset 
> reset strategy
> 
>
> Key: HUDI-669
> URL: https://issues.apache.org/jira/browse/HUDI-669
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-669) HoodieDeltaStreamer offset not handled correctly when using LATEST offset reset strategy

2020-03-10 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056140#comment-17056140
 ] 

Balaji Varadarajan commented on HUDI-669:
-

THanks [~lamber-ken]. Closing this as duplicate

> HoodieDeltaStreamer offset not handled correctly when using LATEST offset 
> reset strategy
> 
>
> Key: HUDI-669
> URL: https://issues.apache.org/jira/browse/HUDI-669
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.6.0
>
>
> Context : [https://github.com/apache/incubator-hudi/issues/1375]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io commented on issue #1391: [HUDI-688] Paring down the NOTICE file to minimum required notices

2020-03-10 Thread GitBox
codecov-io commented on issue #1391: [HUDI-688] Paring down the NOTICE file to 
minimum required notices
URL: https://github.com/apache/incubator-hudi/pull/1391#issuecomment-597203280
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=h1) 
Report
   > Merging 
[#1391](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806=desc)
 will **decrease** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1391/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1391  +/-   ##
   
   - Coverage 67.40%   67.39%   -0.01% 
 Complexity  230  230  
   
 Files   336  336  
 Lines 1636616366  
 Branches   1672 1672  
   
   - Hits  1103111030   -1 
 Misses 4602 4602  
   - Partials733  734   +1 
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...n/java/org/apache/hudi/common/model/HoodieKey.java](https://codecov.io/gh/apache/incubator-hudi/pull/1391/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUtleS5qYXZh)
 | `88.88% <0.00%> (-5.56%)` | `0.00% <0.00%> (ø%)` | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=footer).
 Last update 
[77d5b92...87bb2da](https://codecov.io/gh/apache/incubator-hudi/pull/1391?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-10 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-662:

Fix Version/s: (was: 0.5.2)

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056136#comment-17056136
 ] 

Vinoth Chandar commented on HUDI-662:
-

We can again revisit this if needed.. untagging fix version 

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1377: [HUDI-663] Fix HoodieDeltaStreamer offset not handled correctly

2020-03-10 Thread GitBox
bvaradar commented on a change in pull request #1377: [HUDI-663] Fix 
HoodieDeltaStreamer offset not handled correctly
URL: https://github.com/apache/incubator-hudi/pull/1377#discussion_r390466238
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java
 ##
 @@ -180,7 +180,7 @@ public KafkaOffsetGen(TypedProperties props) {
   .map(x -> new TopicPartition(x.topic(), 
x.partition())).collect(Collectors.toSet());
 
   // Determine the offset ranges to read from
-  if (lastCheckpointStr.isPresent()) {
+  if (lastCheckpointStr.isPresent() && !lastCheckpointStr.get().isEmpty()) 
{
 
 Review comment:
   @lamber-ken : Also, Instead of handling the empty checkpoints only for 
kafka, can we handle it generically in DeltaSync 
(https://github.com/apache/incubator-hudi/blob/77d5b92d88d6583bdfc09e4c10ecfe7ddbb04806/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L262)
 so that we can have an uniform handling of checkpoints across sources ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar closed issue #1359: [SUPPORT] handle partition value containing colon ?

2020-03-10 Thread GitBox
bvaradar closed issue #1359: [SUPPORT] handle partition value containing colon ?
URL: https://github.com/apache/incubator-hudi/issues/1359
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1359: [SUPPORT] handle partition value containing colon ?

2020-03-10 Thread GitBox
bvaradar commented on issue #1359: [SUPPORT] handle partition value containing 
colon ?
URL: https://github.com/apache/incubator-hudi/issues/1359#issuecomment-597184490
 
 
   Closing this ticket due to inactivity.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-688:

Labels: pull-request-available  (was: )

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar opened a new pull request #1391: [HUDI-688] Paring down the NOTICE file to minimum required notices

2020-03-10 Thread GitBox
vinothchandar opened a new pull request #1391: [HUDI-688] Paring down the 
NOTICE file to minimum required notices
URL: https://github.com/apache/incubator-hudi/pull/1391
 
 
- Based on analysis, we don't need to call out anything
- We only do source releases at this time
- Fix typo in LICENSE
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-680) Update Jackson databind to 2.6.7.3

2020-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-680:
--
Fix Version/s: (was: 0.5.2)
   0.6.0

> Update Jackson databind to 2.6.7.3
> --
>
> Key: HUDI-680
> URL: https://issues.apache.org/jira/browse/HUDI-680
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Common Core
>Reporter: Aki Tanaka
>Assignee: Aki Tanaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I would like to update Jackson databind to 2.6.7.3. Because this version is 
> the latest jackson-databind of 2.6.7.x line and it has all CVE fixes up to 
> 2.9.10.
> https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.6.7.x



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] smarthi commented on a change in pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
smarthi commented on a change in pull request #1390: [HUDI-634] Write release 
blog and document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#discussion_r390284780
 
 

 ##
 File path: docs/_pages/releases.cn.md
 ##
 @@ -7,6 +7,33 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
+## [Release 
0.5.2-incubating](https://github.com/apache/incubator-hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/0.5.2-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi(incubating) 0.5.2-incubating Source 
Release](https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
+ * Apache Hudi (incubating) jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Release Highlights
+ * Dependency Version Upgrades
+   - Upgrade from Jackson-databind 2.6.7.1 to 2.6.7.3
 
 Review comment:
   Did we cherry-pick this for 0.5.2 release ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua edited a comment on issue #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
yanghua edited a comment on issue #1390: [HUDI-634] Write release blog and 
document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#issuecomment-597037052
 
 
   Please temporarily ignore those links.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on issue #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
yanghua commented on issue #1390: [HUDI-634] Write release blog and document 
breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390#issuecomment-597037052
 
 
   Please temporarily ignore the those links.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-634) Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-634:

Labels: pull-request-available  (was: )

> Write release blog and document breaking changes for 0.5.2 release
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua opened a new pull request #1390: [HUDI-634] Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread GitBox
yanghua opened a new pull request #1390: [HUDI-634] Write release blog and 
document breaking changes for 0.5.2 release
URL: https://github.com/apache/incubator-hudi/pull/1390
 
 
   
   
   ## What is the purpose of the pull request
   
   *This pull request writes release blog and document breaking changes for 
0.5.2 release*
   
   ## Brief change log
   
 - *Write release blog and document breaking changes for 0.5.2 release*
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-634) Write release blog and document breaking changes for 0.5.2 release

2020-03-10 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated HUDI-634:
--
Summary: Write release blog and document breaking changes for 0.5.2 release 
 (was: Document breaking changes for 0.5.2 release)

> Write release blog and document breaking changes for 0.5.2 release
> --
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread Suneel Marthi (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055810#comment-17055810
 ] 

Suneel Marthi commented on HUDI-688:


We are back to what [~lresende] had suggested what the NOTICE file should have 
- nothing whatsoever.  The confusion stemmed from Justin's unclear comments 
from last release.   Let's pare down the NOTICE file to what it should be. 

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] codecov-io commented on issue #1389: [HUDI-687] Stop incremental reader when there is a pending compaction…

2020-03-10 Thread GitBox
codecov-io commented on issue #1389: [HUDI-687] Stop incremental reader when 
there is a pending compaction…
URL: https://github.com/apache/incubator-hudi/pull/1389#issuecomment-596951199
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=h1) 
Report
   > Merging 
[#1389](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/f93e64fee413ed1b774156e688794ee7937cc01a?src=pr=desc)
 will **increase** coverage by `0.22%`.
   > The diff coverage is `100%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1389/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master   #1389  +/-   ##
   ===
   + Coverage 67.18%   67.4%   +0.22% 
   - Complexity  221 230   +9 
   ===
 Files   335 336   +1 
 Lines 16272   16376 +104 
 Branches   16611673  +12 
   ===
   + Hits  10933   11039 +106 
   + Misses 46044602   -2 
 Partials735 735
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...a/org/apache/hudi/common/table/HoodieTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRpbWVsaW5lLmphdmE=)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh)
 | `93.24% <100%> (+1.05%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...g/apache/hudi/hadoop/HoodieParquetInputFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZVBhcnF1ZXRJbnB1dEZvcm1hdC5qYXZh)
 | `80.99% <100%> (+1.65%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | 
[...che/hudi/common/util/BufferedRandomAccessFile.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvQnVmZmVyZWRSYW5kb21BY2Nlc3NGaWxlLmphdmE=)
 | `54.38% <0%> (-0.88%)` | `0% <0%> (ø)` | |
   | 
[.../apache/hudi/utilities/HoodieSnapshotExporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0hvb2RpZVNuYXBzaG90RXhwb3J0ZXIuamF2YQ==)
 | `39.36% <0%> (ø)` | `7% <0%> (?)` | |
   | 
[...in/java/org/apache/hudi/metrics/HoodieMetrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1389/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Ib29kaWVNZXRyaWNzLmphdmE=)
 | `87.5% <0%> (+57.69%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=footer).
 Last update 
[f93e64f...e639491](https://codecov.io/gh/apache/incubator-hudi/pull/1389?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055639#comment-17055639
 ] 

vinoyang commented on HUDI-688:
---

[~vinoth] IMO, we can follow druid's solution, since it's graduate date is so 
close to now. It should be checked and verified before graduating.

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055637#comment-17055637
 ] 

Vinoth Chandar commented on HUDI-662:
-

yes. Spark/Flink all provide binary distros... we only do source

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.2
>
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055635#comment-17055635
 ] 

Vinoth Chandar commented on HUDI-688:
-

https://github.com/apache/druid/blob/master/NOTICE simply mentions the ALv2 
dependencies like suneel did in his PR.. 

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-634) Document breaking changes for 0.5.2 release

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055636#comment-17055636
 ] 

Vinoth Chandar commented on HUDI-634:
-

Yes.. we will flag it in the release highlights.. on the release blog .. see 
older releases for example 

> Document breaking changes for 0.5.2 release
> ---
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on issue #1354: [HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-03-10 Thread GitBox
vinothchandar commented on issue #1354: [HUDI-581] NOTICE need more work as it 
missing content form included 3rd party ALv2 licensed NOTICE files
URL: https://github.com/apache/incubator-hudi/pull/1354#issuecomment-596925151
 
 
   > Also, there shouldn't be a difference between the source and binary distro 
for the notice.
   
   @lresende Language 
[here](http://www.apache.org/dev/licensing-howto.html#bundled-vs-non-bundled) 
seems to suggest that for the source release, we should not include the 
dependencies  which arenot bundled? where as for binary distro, lot more 
attribution may be necessary.?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1354: [HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-03-10 Thread GitBox
vinothchandar commented on issue #1354: [HUDI-581] NOTICE need more work as it 
missing content form included 3rd party ALv2 licensed NOTICE files
URL: https://github.com/apache/incubator-hudi/pull/1354#issuecomment-596924478
 
 
   @smarthi @yanghua I think we did this due to the following comment during 
last release's vote
   
   > - NOTICE need more work as it missing content form included 3rd party ALv2 
licensed NOTICE files
   
   @smarthi  can you please enlighten me on the principle using which we  
decided to add metrics, jackson, kryo, guava to the NOTICE file. We don't ship 
their binaries.. Their NOTICE had these things?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055628#comment-17055628
 ] 

vinoyang commented on HUDI-688:
---

If we can make sure the recently graduated project also can verify our thought. 
It would be very good.

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-634) Document breaking changes for 0.5.2 release

2020-03-10 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055623#comment-17055623
 ] 

vinoyang commented on HUDI-634:
---

[~vinoth] I am browsing the commit history and jira issues that involved in 
v0.5.2. What's more, where would we document the break changes? Is the release 
blog a suitable place? And I am also preparing the release blog.

> Document breaking changes for 0.5.2 release
> ---
>
> Key: HUDI-634
> URL: https://issues.apache.org/jira/browse/HUDI-634
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 0.5.2
>
>
> * Write Client restructuring has moved classes around (HUDI-554) 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055621#comment-17055621
 ] 

Vinoth Chandar commented on HUDI-688:
-

[~vinoyang] [~smarthi] based on this, dont think we need to add anything to our 
NOTICE file 

> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-688) Ensure NOTICE contains all notices for the dependencies called out in LICENSE

2020-03-10 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055620#comment-17055620
 ] 

Vinoth Chandar commented on HUDI-688:
-

Apache Hive NOTICE is clean : 
https://github.com/apache/hive/blob/branch-2.3/NOTICE
Apache Spark NOTICE has two mentions: 
https://github.com/apache/spark/blob/branch-2.4/NOTICE , code we reused is not 
related..So since the bundled bits are not related, skip 
Apache SystemML is clean : 
https://github.com/apache/systemml/blob/master/NOTICE 
https://github.com/twitter-archive/commons has no NOTICE 
https://github.com/big-data-europe/docker-hadoop has no NOTICE
https://github.com/apache/hadoop/blob/branch-2.7/NOTICE.txt has a long NOTICE, 
which has both binary and source distribution specific things.. We use the 
bloom filter, which is not referenced there
https://github.com/apache/cassandra/blob/trunk/NOTICE.txt hasa bunch. but again 
not related to the one class we adapted 



> Ensure NOTICE contains all notices for the dependencies called out in LICENSE
> -
>
> Key: HUDI-688
> URL: https://issues.apache.org/jira/browse/HUDI-688
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Blocker
> Fix For: 0.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-662) Attribute binary dependencies that are included in the bundle jars

2020-03-10 Thread vinoyang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055615#comment-17055615
 ] 

vinoyang commented on HUDI-662:
---

{quote}Apache Singa [https://github.com/apache/singa] follows same model, only 
does source releases
Apache Dubbo [http://dubbo.apache.org/en-us/blog/download.html] also only puts 
out source releases

Their LICENSE and NOTICE follow same principles as ours.
{quote}

In short, we did not reference the suitable projects? We should reference those 
projects which only release source distribution?

> Attribute binary dependencies that are included in the bundle jars
> --
>
> Key: HUDI-662
> URL: https://issues.apache.org/jira/browse/HUDI-662
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Release  Administrative
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.2
>
>
> [https://www.apache.org/legal/resolved.html] is the comprehensive guide here.
>  [http://www.apache.org/dev/licensing-howto.html] is the comprehensive guide 
> here.'
> [http://www.apache.org/legal/src-headers.html] also 
>  
> Previously, we asked about some specific dependencies here
>  https://issues.apache.org/jira/browse/LEGAL-461



--
This message was sent by Atlassian Jira
(v8.3.4#803005)