[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1075: [HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

2019-12-09 Thread GitBox
pratyakshsharma commented on a change in pull request #1075: [HUDI-114]: added 
option to overwrite payload implementation in hoodie.properties file
URL: https://github.com/apache/incubator-hudi/pull/1075#discussion_r355880698
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java
 ##
 @@ -228,6 +228,10 @@ public Operation convert(String value) throws 
ParameterException {
 + " source-fetch -> Transform -> Hudi Write in loop")
 public Boolean continuousMode = false;
 
+@Parameter(names = {"--update-payload-class"}, description = "Update 
payload class in hoodie.properties file if needed, "
 
 Review comment:
   Done with the changes @n3nash 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] Neo2007 edited a comment on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
Neo2007 edited a comment on issue #1083: [SUPPORT] getting error in 
KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563820726
 
 
   Thank you. Actually, I was using older version of Apache Hudi, which doesn't 
contains this --checkpoint support.
   I will be adding it manually.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken commented on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563829151
 
 
   You are welcome


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] Neo2007 commented on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
Neo2007 commented on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563820726
 
 
   Thank you. Actually, I was using older version of Apache Hudi, which doesn't 
contains this --chekpoint support.
   I will be adding it manually.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] Neo2007 closed issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
Neo2007 closed issue #1083: [SUPPORT] getting error in KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #124

2019-12-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.17 KB...]
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.1-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark [jar]
[INFO] hudi-utilities [jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle  [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle  [jar]
[INFO] hudi-timeline-server-bundle[jar]
[INFO] hudi-hadoop-docker 

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in 
KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563279470
 
 
   @Neo2007 , sorry for delay. You can use add param `--checkpoint 
topic_account_hudi,0:0`, it will work through. By the way, if topic 
`topic_account_hudi` has more than one partition, the param will like 
`--checkpoint topic_account_hudi,0:0,1:0...`
   
   ### Root cause
   The reason for the null pointer exception is `earliestOffsets` don't 
contains the key `sbx-frb1-dell-onep-account`, so when try to get offset from 
`earliestOffsets`, it causes NPE.
   ```
   boolean checkpointOffsetReseter = checkpointOffsets.entrySet()
   .stream()
   .anyMatch(offset -> offset.getValue().offset() < 
earliestOffsets.get(offset.getKey()).offset());
   ```
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in 
KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563279470
 
 
   @Neo2007 , sorry for delay. You can use add param `--checkpoint 
topic_account_hudi,0:0`, it will work through. By the way, if topic 
`topic_account_hudi` has more than one partition, the param will like 
`--checkpoint topic_account_hudi,0:0,1:0...`
   
   The reason for the null pointer exception is `earliestOffsets` don't 
contains the key `sbx-frb1-dell-onep-account`, so when try to get offset from 
`earliestOffsets`, it causes NPE.
   ```
   boolean checkpointOffsetReseter = checkpointOffsets.entrySet()
   .stream()
   .anyMatch(offset -> offset.getValue().offset() < 
earliestOffsets.get(offset.getKey()).offset());
   ```
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2019-12-09 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992040#comment-16992040
 ] 

leesf edited comment on HUDI-395 at 12/10/19 1:39 AM:
--

Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a 
is supported. and you would check it here 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java
and maybe you could send a PR to support it.


was (Author: xleesf):
Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a 
is supported. and you would check it here 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java

> hudi does not support scheme s3n when wrtiing to S3
> ---
>
> Key: HUDI-395
> URL: https://issues.apache.org/jira/browse/HUDI-395
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark datasource
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: rui feng
>Priority: Major
>
> When I use Hudi to create a hudi table then write to s3, I used below maven 
> snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
> 
>  org.apache.hudi
>  hudi-spark-bundle
>  0.5.0-incubating
> 
> 
>  org.apache.hadoop
>  hadoop-aws
>  2.7.3
> 
> 
>  com.amazonaws
>  aws-java-sdk
>  1.10.34
> 
> and add the below configuration:
> sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
>  sc.hadoopConfiguration.set("fs.s3.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3n.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "x")
>  sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "x")
>  
> my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
> {color:#FF}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
> val hudiOptions = Map[String,String](
>  HoodieWriteConfig.TABLE_NAME -> "hudi12",
>  DataSourceWriteOptions.OPERATION_OPT_KEY -> 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
>  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
>  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
> val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
> the exception occur:
> j{color:#FF}ava.lang.IllegalArgumentException: 
> BlockAlignedAvroParquetWriter does not support scheme s3n{color}
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.(HoodieParquetWriter.java:57)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
>  at org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:70)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> Is anyone can tell me what's cause this exception, I tried to use 
> org.apache.hadoop.fs.s3.S3FileSystem to replace 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", 
> but other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem 
> fit hadoop 2.6.
>  
> Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1093: [MINOR] replace scala map add operator

2019-12-09 Thread GitBox
lamber-ken opened a new pull request #1093: [MINOR] replace scala map add 
operator
URL: https://github.com/apache/incubator-hudi/pull/1093
 
 
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a 
pull request.*
   
   ## What is the purpose of the pull request
   
   IDEA can't resolve symbol `++:`, code prompt will be affected.
   
   
![image](https://user-images.githubusercontent.com/20113411/70487126-40098500-1b2f-11ea-8e57-0e34010c68b6.png)
   
   ## Brief change log
   
 - replace `++` with `++:`
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bschell commented on issue #1052: [HUDI-326] Add new index to suppport global update/delete

2019-12-09 Thread GitBox
bschell commented on issue #1052: [HUDI-326] Add new index to suppport global 
update/delete
URL: https://github.com/apache/incubator-hudi/pull/1052#issuecomment-563523296
 
 
   @vinothchandar thanks for pointing out that other issue by nsivabalan. I was 
actually running into the problem he described which is why I took the approach 
I did. I think his solution is really what I was looking for


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2019-12-09 Thread Brandon Scheller (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992077#comment-16992077
 ] 

Brandon Scheller commented on HUDI-389:
---

[~vinoth] [~shivnarayan] 

I was working on a similar issue and I think this fix is a better long term 
solution to what I was trying to accomplish here: 
[https://github.com/apache/incubator-hudi/pull/1052] . I'd love to see this 
merged as this is needed for creating a proper delete-by-record-key only API 
with global index. I can act as a reviewer for this code.

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at 

[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1092: [MINOR] Unify LOG form

2019-12-09 Thread GitBox
lamber-ken opened a new pull request #1092: [MINOR] Unify LOG form
URL: https://github.com/apache/incubator-hudi/pull/1092
 
 
   ## What is the purpose of the pull request
   
   There many kinds of form of `Logger`, it's meaningful to unify LOG form.
   
   ## Brief change log
   
 - Unify LOG form
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2019-12-09 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992040#comment-16992040
 ] 

leesf commented on HUDI-395:


Hi, thanks for reporting this, right now, s3n is not supported yet, s3 and s3a 
is supported. and you would check it here 
https://github.com/apache/incubator-hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/storage/StorageSchemes.java

> hudi does not support scheme s3n when wrtiing to S3
> ---
>
> Key: HUDI-395
> URL: https://issues.apache.org/jira/browse/HUDI-395
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Spark datasource
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: rui feng
>Priority: Major
>
> When I use Hudi to create a hudi table then write to s3, I used below maven 
> snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
> 
>  org.apache.hudi
>  hudi-spark-bundle
>  0.5.0-incubating
> 
> 
>  org.apache.hadoop
>  hadoop-aws
>  2.7.3
> 
> 
>  com.amazonaws
>  aws-java-sdk
>  1.10.34
> 
> and add the below configuration:
> sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
>  sc.hadoopConfiguration.set("fs.s3.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3n.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "x")
>  sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "x")
>  
> my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
> {color:#FF}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
> val hudiOptions = Map[String,String](
>  HoodieWriteConfig.TABLE_NAME -> "hudi12",
>  DataSourceWriteOptions.OPERATION_OPT_KEY -> 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
>  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
>  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
> val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
> the exception occur:
> j{color:#FF}ava.lang.IllegalArgumentException: 
> BlockAlignedAvroParquetWriter does not support scheme s3n{color}
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.(HoodieParquetWriter.java:57)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
>  at org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:70)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> Is anyone can tell me what's cause this exception, I tried to use 
> org.apache.hadoop.fs.s3.S3FileSystem to replace 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", 
> but other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem 
> fit hadoop 2.6.
>  
> Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[incubator-hudi] branch master updated: [MINOR] Beautify the cli banner (#1089)

2019-12-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 70a1040  [MINOR] Beautify the cli banner (#1089)
70a1040 is described below

commit 70a1040998deb4585f6622e23f887dba21556cbd
Author: lamber-ken 
AuthorDate: Tue Dec 10 05:24:42 2019 +0800

[MINOR] Beautify the cli banner (#1089)

* Add one empty line
* replace Cli to CLI
* replace Hoodie to Apache Hudi
---
 .../org/apache/hudi/cli/HoodieSplashScreen.java| 26 +-
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java
index 53709d6..b7f016b 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/HoodieSplashScreen.java
@@ -35,16 +35,20 @@ public class HoodieSplashScreen extends 
DefaultBannerProvider {
 System.out.println("HoodieSplashScreen loaded");
   }
 
-  private static String screen = 
"" + OsUtils.LINE_SEPARATOR
-  + "*  *" + OsUtils.LINE_SEPARATOR
-  + "* __   _   _   *" + OsUtils.LINE_SEPARATOR
-  + "*| |  | | | | (_)  *" + OsUtils.LINE_SEPARATOR
-  + "*| |__| |   __| |  -   *" + OsUtils.LINE_SEPARATOR
-  + "*|  __  ||   | / _` | ||   *" + OsUtils.LINE_SEPARATOR
-  + "*| |  | ||   || (_| | ||   *" + OsUtils.LINE_SEPARATOR
-  + "*|_|  |_|\\___/ \\/ ||   *" + 
OsUtils.LINE_SEPARATOR
-  + "*  *" + OsUtils.LINE_SEPARATOR
-  + "" + 
OsUtils.LINE_SEPARATOR;
+  private static String screen = 
"===" + 
OsUtils.LINE_SEPARATOR
+  + "* ___  ___*" 
+ OsUtils.LINE_SEPARATOR
+  + "*/\\__\\  ___   /\\  \\   ___ 
*" + OsUtils.LINE_SEPARATOR
+  + "*   / /  / /\\__\\ /  \\  \\ /\\  \\  
  *" + OsUtils.LINE_SEPARATOR
+  + "*  / /__/ / /  // /\\ \\  \\\\ \\  \\ 
  *" + OsUtils.LINE_SEPARATOR
+  + "* /  \\  \\ ___/ /  // /  \\ \\__\\   /  \\__\\   
   *" + OsUtils.LINE_SEPARATOR
+  + "*/ /\\ \\  /\\__\\  / /__/  ___   / /__/ \\ |__| / /\\/__/
  *" + OsUtils.LINE_SEPARATOR
+  + "*\\/  \\ \\/ /  /  \\ \\  \\ /\\__\\  \\ \\  \\ / /  /  /\\/ /  / 
*" + OsUtils.LINE_SEPARATOR
+  + "* \\  /  /\\ \\  / /  /   \\ \\  / /  /   \\  /__/
  *" + OsUtils.LINE_SEPARATOR
+  + "* / /  /  \\ \\/ /  / \\ \\/ /  / \\ \\__\\   
   *" + OsUtils.LINE_SEPARATOR
+  + "*/ /  /\\  /  /   \\  /  /   \\/__/  
*" + OsUtils.LINE_SEPARATOR
+  + "*\\/__/  \\/__/ \\/__/Apache Hudi CLI
*" + OsUtils.LINE_SEPARATOR
+  + "* *" 
+ OsUtils.LINE_SEPARATOR
+  + "===" 
+ OsUtils.LINE_SEPARATOR;
 
   public String getBanner() {
 return screen;
@@ -55,7 +59,7 @@ public class HoodieSplashScreen extends DefaultBannerProvider 
{
   }
 
   public String getWelcomeMessage() {
-return "Welcome to Hoodie CLI. Please type help if you are looking for 
help. ";
+return "Welcome to Apache Hudi CLI. Please type help if you are looking 
for help. ";
   }
 
   @Override



[GitHub] [incubator-hudi] vinothchandar merged pull request #1089: [MINOR] Beautify the cli banner

2019-12-09 Thread GitBox
vinothchandar merged pull request #1089: [MINOR] Beautify the cli banner
URL: https://github.com/apache/incubator-hudi/pull/1089
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nisheet195 commented on issue #1084: Hive Sync fails when table name is a keyword

2019-12-09 Thread GitBox
nisheet195 commented on issue #1084: Hive Sync fails when table name is a 
keyword
URL: https://github.com/apache/incubator-hudi/issues/1084#issuecomment-563423925
 
 
   #1090 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated (63e330b -> e555aa5)

2019-12-09 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository.

vbalaji pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 63e330b  [MINOR] add *.log to .gitignore file (#1086)
 add e555aa5  [HUDI-353] Add hive style partitioning path

No new revisions were added by this update.

Summary of changes:
 .../strategy/DayBasedCompactionStrategy.java   | 12 ++
 .../SlashEncodedDayPartitionValueExtractor.java|  6 +--
 .../java/org/apache/hudi/ComplexKeyGenerator.java  |  9 +++-
 .../java/org/apache/hudi/SimpleKeyGenerator.java   |  7 
 .../scala/org/apache/hudi/DataSourceOptions.scala  |  8 
 .../org/apache/hudi/HoodieSparkSqlWriter.scala |  3 +-
 .../src/test/scala/TestDataSourceDefaults.scala| 48 +-
 .../keygen/TimestampBasedKeyGenerator.java |  5 ++-
 8 files changed, 72 insertions(+), 26 deletions(-)



[GitHub] [incubator-hudi] bvaradar merged pull request #1036: [HUDI-353] Add hive style partitioning path

2019-12-09 Thread GitBox
bvaradar merged pull request #1036: [HUDI-353] Add hive style partitioning path
URL: https://github.com/apache/incubator-hudi/pull/1036
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bvaradar commented on issue #1036: [HUDI-353] Add hive style partitioning path

2019-12-09 Thread GitBox
bvaradar commented on issue #1036: [HUDI-353] Add hive style partitioning path
URL: https://github.com/apache/incubator-hudi/pull/1036#issuecomment-563423530
 
 
   Looks good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2019-12-09 Thread rui feng (Jira)
rui feng created HUDI-395:
-

 Summary: hudi does not support scheme s3n when wrtiing to S3
 Key: HUDI-395
 URL: https://issues.apache.org/jira/browse/HUDI-395
 Project: Apache Hudi (incubating)
  Issue Type: Bug
  Components: Spark datasource
 Environment: spark-2.4.4-bin-hadoop2.7

Reporter: rui feng


When I use Hudi to create a hudi table then write to s3, I used below maven 
snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]


 org.apache.hudi
 hudi-spark-bundle
 0.5.0-incubating



 org.apache.hadoop
 hadoop-aws
 2.7.3


 com.amazonaws
 aws-java-sdk
 1.10.34


and add the below configuration:

sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
 sc.hadoopConfiguration.set("fs.s3.impl", 
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
 sc.hadoopConfiguration.set("fs.s3n.impl", 
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
 sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xx")
 sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "x")
 sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xx")
 sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "x")

 

my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below

{color:#FF}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}

val hudiOptions = Map[String,String](
 HoodieWriteConfig.TABLE_NAME -> "hudi12",
 DataSourceWriteOptions.OPERATION_OPT_KEY -> 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
 DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
 DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)

val hudiTablePath = "s3://niketest1/hudi_test/hudi12"

the exception occur:

j{color:#FF}ava.lang.IllegalArgumentException: 
BlockAlignedAvroParquetWriter does not support scheme s3n{color}

 at 
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)

 at 
org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)

 at 
org.apache.hudi.io.storage.HoodieParquetWriter.(HoodieParquetWriter.java:57)

 at 
org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)

 at 
org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)

 at org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:70)

 at 
org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)

 at 
org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)

 at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)

 at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)

 at java.util.concurrent.FutureTask.run(FutureTask.java:266)

 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

 at java.lang.Thread.run(Thread.java:748)

 

 

Is anyone can tell me what's cause this exception, I tried to use 
org.apache.hadoop.fs.s3.S3FileSystem to replace 
org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", but 
other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem fit 
hadoop 2.6.

 

Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1036: [HUDI-353] Add hive style partitioning path

2019-12-09 Thread GitBox
zhedoubushishi commented on a change in pull request #1036: [HUDI-353] Add hive 
style partitioning path
URL: https://github.com/apache/incubator-hudi/pull/1036#discussion_r355619914
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
 ##
 @@ -68,6 +68,10 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = "--use-hive-style-partitioning", description = "Use Hive 
style partitioning, the name of "
+  + "partition folders follow = 
format")
+  public Boolean useHiveStylePartitioning = false;
 
 Review comment:
   > @zhedoubushishi : Is this still being used ? If not, Can you remove it
   
   Oh its not. All related code is removed now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1036: [HUDI-353] Add hive style partitioning path

2019-12-09 Thread GitBox
zhedoubushishi commented on a change in pull request #1036: [HUDI-353] Add hive 
style partitioning path
URL: https://github.com/apache/incubator-hudi/pull/1036#discussion_r355619914
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
 ##
 @@ -68,6 +68,10 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = "--use-hive-style-partitioning", description = "Use Hive 
style partitioning, the name of "
+  + "partition folders follow = 
format")
+  public Boolean useHiveStylePartitioning = false;
 
 Review comment:
   > @zhedoubushishi : Is this still being used ? If not, Can you remove it
   
   Oh its not. All related code is removed now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] zhedoubushishi commented on a change in pull request #1036: [HUDI-353] Add hive style partitioning path

2019-12-09 Thread GitBox
zhedoubushishi commented on a change in pull request #1036: [HUDI-353] Add hive 
style partitioning path
URL: https://github.com/apache/incubator-hudi/pull/1036#discussion_r355619914
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
 ##
 @@ -68,6 +68,10 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = "--use-hive-style-partitioning", description = "Use Hive 
style partitioning, the name of "
+  + "partition folders follow = 
format")
+  public Boolean useHiveStylePartitioning = false;
 
 Review comment:
   Oh its not. All related code is removed now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pmalafosse edited a comment on issue #143: Tracking ticket for folks to be added to slack group

2019-12-09 Thread GitBox
pmalafosse edited a comment on issue #143: Tracking ticket for folks to be 
added to slack group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-563336514
 
 
   Please add me too: pierre.malafo...@letgo.com  thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pmalafosse commented on issue #143: Tracking ticket for folks to be added to slack group

2019-12-09 Thread GitBox
pmalafosse commented on issue #143: Tracking ticket for folks to be added to 
slack group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-563336514
 
 
   Please add me too: pierre.malafo...@letgo.com


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2019-12-09 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991749#comment-16991749
 ] 

Raymond Xu commented on HUDI-309:
-

[~vbalaji] Thanks for the clarification. Yes January would be great for me as 
well. (vacation too :))

> General Redesign of Archived Timeline for efficient scan and management
> ---
>
> Key: HUDI-309
> URL: https://issues.apache.org/jira/browse/HUDI-309
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Archive TImeline Notes by Vinoth 1.jpg, Archived 
> Timeline Notes by Vinoth 2.jpg
>
>
> As designed by Vinoth:
> Goals
>  # Archived Metadata should be scannable in the same way as data
>  # Provides more safety by always serving committed data independent of 
> timeframe when the corresponding commit action was tried. Currently, we 
> implicitly assume a data file to be valid if its commit time is older than 
> the earliest time in the active timeline. While this works ok, any inherent 
> bugs in rollback could inadvertently expose a possibly duplicate file when 
> its commit timestamp becomes older than that of any commits in the timeline.
>  # We had to deal with lot of corner cases because of the way we treat a 
> "commit" as special after it gets archived. Examples also include Savepoint 
> handling logic by cleaner.
>  # Small Files : For Cloud stores, archiving simply moves fils from one 
> directory to another causing the archive folder to grow. We need a way to 
> efficiently compact these files and at the same time be friendly to scans
> Design:
>  The basic file-group abstraction for managing file versions for data files 
> can be extended to managing archived commit metadata. The idea is to use an 
> optimal format (like HFile) for storing compacted version of  Metadata> pairs. Every archiving run will read  pairs 
> from active timeline and append to indexable log files. We will run periodic 
> minor compactions to merge multiple log files to a compacted HFile storing 
> metadata for a time-range. It should be also noted that we will partition by 
> the action types (commit/clean).  This design would allow for the archived 
> timeline to be queryable for determining whether a timeline is valid or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2019-12-09 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991746#comment-16991746
 ] 

Balaji Varadarajan commented on HUDI-309:
-

[~nicholasjiang] : Absolutely. 

> General Redesign of Archived Timeline for efficient scan and management
> ---
>
> Key: HUDI-309
> URL: https://issues.apache.org/jira/browse/HUDI-309
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Archive TImeline Notes by Vinoth 1.jpg, Archived 
> Timeline Notes by Vinoth 2.jpg
>
>
> As designed by Vinoth:
> Goals
>  # Archived Metadata should be scannable in the same way as data
>  # Provides more safety by always serving committed data independent of 
> timeframe when the corresponding commit action was tried. Currently, we 
> implicitly assume a data file to be valid if its commit time is older than 
> the earliest time in the active timeline. While this works ok, any inherent 
> bugs in rollback could inadvertently expose a possibly duplicate file when 
> its commit timestamp becomes older than that of any commits in the timeline.
>  # We had to deal with lot of corner cases because of the way we treat a 
> "commit" as special after it gets archived. Examples also include Savepoint 
> handling logic by cleaner.
>  # Small Files : For Cloud stores, archiving simply moves fils from one 
> directory to another causing the archive folder to grow. We need a way to 
> efficiently compact these files and at the same time be friendly to scans
> Design:
>  The basic file-group abstraction for managing file versions for data files 
> can be extended to managing archived commit metadata. The idea is to use an 
> optimal format (like HFile) for storing compacted version of  Metadata> pairs. Every archiving run will read  pairs 
> from active timeline and append to indexable log files. We will run periodic 
> minor compactions to merge multiple log files to a compacted HFile storing 
> metadata for a time-range. It should be also noted that we will partition by 
> the action types (commit/clean).  This design would allow for the archived 
> timeline to be queryable for determining whether a timeline is valid or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-325) Unable to query by Hive after updating HDFS Hudi table

2019-12-09 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken closed HUDI-325.
---
Resolution: Fixed

Fixed via master: d6e83e8f49828940159cd34711cc88ee7b42dc1c

> Unable to query by Hive after updating HDFS Hudi table
> --
>
> Key: HUDI-325
> URL: https://issues.apache.org/jira/browse/HUDI-325
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>Reporter: Wenning Ding
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h3. Description
> While doing internal testing in EMR, we found that if Hudi table path follows 
> this kind of format: hdfs:///user/... or hdfs:/user/... then Hudi table would 
> unable to query by Hive after updating.
> h3. Reproduction
> {code:java}
> import org.apache.hudi.DataSourceWriteOptions
> import org.apache.hudi.config.HoodieWriteConfig
> import org.apache.spark.sql.SaveModeval 
> df = Seq(
>   (100, "event_name_900", "2015-01-01T13:51:39.340396Z", "type1"),
>   (101, "event_name_546", "2015-01-01T12:14:58.597216Z", "type2"),
>   (104, "event_name_123", "2015-01-01T12:15:00.512679Z", "type1"),
>   (105, "event_name_678", "2015-01-01T13:51:42.248818Z", "type2")
>   ).toDF("event_id", "event_name", "event_ts", "event_type")
> var tableName = "hudi_test"
> var tablePath = "hdfs:///user/hadoop/" + tableName
> // write hudi dataset
> df.write.format("org.apache.hudi")
>   .option("hoodie.upsert.shuffle.parallelism", "2")
>   .option(HoodieWriteConfig.TABLE_NAME, tableName)
>   .option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
>   .option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.COW_STORAGE_TYPE_OPT_VAL)
>   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "event_id")
>   .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "event_type")
>   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "event_ts")
>   .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
>   .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, tableName)
>   .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "event_type")
>   .option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY, "false")
>   .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor")
>   .mode(SaveMode.Overwrite)
>   .save(tablePath)
> // update hudi dataset
> val df2 = Seq(
>   (100, "event_name_1", "2015-01-01T13:51:39.340396Z", "type1"),
>   (107, "event_name_578", "2015-01-01T13:51:42.248818Z", "type3")
>   ).toDF("event_id", "event_name", "event_ts", "event_type")
> df2.write.format("org.apache.hudi")
>.option("hoodie.upsert.shuffle.parallelism", "2")
>.option(HoodieWriteConfig.TABLE_NAME, tableName)
>.option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
> DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
>.option(DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY, 
> DataSourceWriteOptions.COW_STORAGE_TYPE_OPT_VAL)
>.option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "event_id")
>.option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "event_type")
>.option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "event_ts")
>.option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
>.option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, tableName)
>.option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "event_type")
>.option(DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY, "false")
>.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, 
> "org.apache.hudi.hive.MultiPartKeysValueExtractor")
>.mode(SaveMode.Append)
>.save(tablePath)
> {code}
> Then do query in Hive:
> {code:java}
> select count(*) from hudi_test;
> {code}
> It returns: 
> {code:java}
> java.io.IOException: cannot find dir = 
> hdfs://ip-172-30-6-236.ec2.internal:8020/user/hadoop/elb_logs_hudi_cow_8/2015-01-01/cb7531ac-dadf-4118-b722-55cb34bc66f2-0_34-7-336_20191104223321.parquet
>  in pathToPartitionInfo: [hdfs:/user/hadoop/elb_logs_hudi_cow_8/2015-01-01]
> at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:394)
> at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:357)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.schemaEvolved(SplitGrouper.java:284)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:184)
> at 
> org.apache.hadoop.hive.ql.exec.tez.SplitGrouper.generateGroupedSplits(SplitGrouper.java:161)
> at 
> 

[jira] [Commented] (HUDI-375) Refactor the configure framework of hudi project

2019-12-09 Thread lamber-ken (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991712#comment-16991712
 ] 

lamber-ken commented on HUDI-375:
-

Hi, [~vinoth], after refactoring these, the code structure is concise and 
clear, each component just care about its own options, and developers will use 
these options easily. 

Let's imagine a scenario, as hudi project growth, more and more components will 
be introduced into project, their keys and default values are defined in a fat 
config file. If use a parameter in a wrong place uncarefully, it may have a 
great impact to us.

 

> Refactor the configure framework of hudi project
> 
>
> Key: HUDI-375
> URL: https://issues.apache.org/jira/browse/HUDI-375
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>
> Currently, config items and their default value are dispersed in the java 
> class file. It's easy to confuse when config items are defined more and more, 
> so it's necessary to refactor the configure framework.
> May some things need to consider
>  # config item and default value may defined in a class
>  # provide a mechanism which can extract some config items for specific 
> component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2019-12-09 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991709#comment-16991709
 ] 

sivabalan narayanan commented on HUDI-389:
--

[~vinoth]: Whom do you recommend as reviewers for 
[https://github.com/apache/incubator-hudi/pull/1091]. I wanted to see if we can 
assign to one known person(balaji/nishith/sudha) and one new contributor as way 
to welcoming them to start reviewing code. Just a suggestion. 

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Time Spent: 10m
>  Remaining Estimate: 47h 50m
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at 

[jira] [Updated] (HUDI-389) Updates sent to diff partition for a given key with Global Index

2019-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-389:

Labels: pull-request-available  (was: )

> Updates sent to diff partition for a given key with Global Index 
> -
>
> Key: HUDI-389
> URL: https://issues.apache.org/jira/browse/HUDI-389
> Project: Apache Hudi (incubating)
>  Issue Type: Bug
>  Components: Index
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Updates sent to diff partition for a given key with Global Index should 
> succeed by updating the record under original partition. As of now, it throws 
> exception. 
> [https://github.com/apache/incubator-hudi/issues/1021] 
>  
>  
> error log:
> {code:java}
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.timeline.HoodieActiveTimeline - Loaded instants 
> java.util.stream.ReferencePipeline$Head@d02b1c7
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Building file 
> system view for partition (2016/04/15)
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - #files found 
> in partition (2016/04/15) =0, Time taken =0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - 
> addFilesToView: NumFiles=0, FileGroupsCreationTime=0, StoreTimeTaken=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.HoodieTableFileSystemView - Adding 
> file-groups for partition :2016/04/15, #FileGroups=0
>  14738 [Executor task launch worker-0] INFO 
> com.uber.hoodie.common.table.view.AbstractTableFileSystemView - Time to load 
> partition (2016/04/15) =0
>  14754 [Executor task launch worker-0] ERROR 
> com.uber.hoodie.table.HoodieCopyOnWriteTable - Error upserting bucketType 
> UPDATE for partition :0
>  java.util.NoSuchElementException: No value present
>  at com.uber.hoodie.common.util.Option.get(Option.java:112)
>  at com.uber.hoodie.io.HoodieMergeHandle.(HoodieMergeHandle.java:71)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.getUpdateHandle(HoodieCopyOnWriteTable.java:226)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:180)
>  at 
> com.uber.hoodie.table.HoodieCopyOnWriteTable.handleUpsertPartition(HoodieCopyOnWriteTable.java:263)
>  at 
> com.uber.hoodie.HoodieWriteClient.lambda$upsertRecordsInternal$7ef77fd$1(HoodieWriteClient.java:442)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:843)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
>  at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:973)
>  at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
>  at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:99)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 

[GitHub] [incubator-hudi] nsivabalan opened a new pull request #1091: [HUDI-389] Fixing Index look up to return partitions for a given key along with fileId with Global Bloom

2019-12-09 Thread GitBox
nsivabalan opened a new pull request #1091: [HUDI-389] Fixing Index look up to 
return partitions for a given key along with fileId with Global Bloom
URL: https://github.com/apache/incubator-hudi/pull/1091
 
 
   ## What is the purpose of the pull request
   
   Fixing Index look up to return partitions for a given key along with fileId 
with Global Bloom
   
   Use-case: 
   If a record is updated with a different partition than where it exists, with 
Global bloom, an exception is thrown as given 
[here](https://issues.apache.org/jira/browse/HUDI-389). This patch fixes the 
same. 
   
   Essentially in HoodieGlobaIndex#tagLocationBacktoRecords, each record is 
tagged with the right fileId and Partition and not just fileId (which was the 
case before this patch). In order to achieve this, I had to change the 
interface for IndexFileFilter from getMatchingFiles(String partitionPath, 
String recordKey) returning Sets to returning Set>s. 
   
   ## Brief change log
   
 - Change interface of IndexfileFilter to return Set>
 - Fix Global Bloom to tag records with correct partitonPath from index 
look up and not from the passed in records(HoodieKey). 
   
   Tests:
   
   Added test in 
TestHoodieClientOnCopyOnWriteStorage#testUpsertToDiffPartitionGlobaIndex. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in 
KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563279470
 
 
   @Neo2007 , sorry for delay. You can use add param `--checkpoint 
topic_account_hudi,0:0`, it will work through. By the way, if topic 
`topic_account_hudi` has more than one partition, the param will like 
`--checkpoint topic_account_hudi,0:0,1:0...`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2019-12-09 Thread Nicholas Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991682#comment-16991682
 ] 

Nicholas Jiang commented on HUDI-309:
-

[~vbalaji]Yeah, I would like to collaborate with you for this redesign. I will 
wait for you to refactor this together. And We could discuss further in Hudi 
Slack Group. Could you please?

> General Redesign of Archived Timeline for efficient scan and management
> ---
>
> Key: HUDI-309
> URL: https://issues.apache.org/jira/browse/HUDI-309
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Archive TImeline Notes by Vinoth 1.jpg, Archived 
> Timeline Notes by Vinoth 2.jpg
>
>
> As designed by Vinoth:
> Goals
>  # Archived Metadata should be scannable in the same way as data
>  # Provides more safety by always serving committed data independent of 
> timeframe when the corresponding commit action was tried. Currently, we 
> implicitly assume a data file to be valid if its commit time is older than 
> the earliest time in the active timeline. While this works ok, any inherent 
> bugs in rollback could inadvertently expose a possibly duplicate file when 
> its commit timestamp becomes older than that of any commits in the timeline.
>  # We had to deal with lot of corner cases because of the way we treat a 
> "commit" as special after it gets archived. Examples also include Savepoint 
> handling logic by cleaner.
>  # Small Files : For Cloud stores, archiving simply moves fils from one 
> directory to another causing the archive folder to grow. We need a way to 
> efficiently compact these files and at the same time be friendly to scans
> Design:
>  The basic file-group abstraction for managing file versions for data files 
> can be extended to managing archived commit metadata. The idea is to use an 
> optimal format (like HFile) for storing compacted version of  Metadata> pairs. Every archiving run will read  pairs 
> from active timeline and append to indexable log files. We will run periodic 
> minor compactions to merge multiple log files to a compacted HFile storing 
> metadata for a time-range. It should be also noted that we will partition by 
> the action types (commit/clean).  This design would allow for the archived 
> timeline to be queryable for determining whether a timeline is valid or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in 
KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563279470
 
 
   @Neo2007 , sorry for delay. You can use add param `--checkpoint 
topic_account_hudi,0:0`, it will work through. By the way, if topic 
`topic_account_hudi` has more than one partition, the param will like 
`--checkpoint topic_account_hudi,0:0,1:0`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken edited a comment on issue #1083: [SUPPORT] getting error in 
KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563279470
 
 
   @Neo2007 , sorry for delay. You can use add param `--checkpoint 
topic_account_hudi,0:0`, it will work through


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,

2019-12-09 Thread GitBox
lamber-ken commented on issue #1083: [SUPPORT] getting error in KafkaOffsetGen,
URL: https://github.com/apache/incubator-hudi/issues/1083#issuecomment-563279470
 
 
   @Neo2007 , sorry for delay. you can use add param `--checkpoint 
topic_account_hudi,0:0`, it will work through


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-390) Hive Sync should support keywords are table names

2019-12-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-390:

Labels: pull-request-available  (was: )

> Hive Sync should support keywords are table names
> -
>
> Key: HUDI-390
> URL: https://issues.apache.org/jira/browse/HUDI-390
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.1
>
>
> [https://github.com/apache/incubator-hudi/issues/1084]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nisheet195 opened a new pull request #1090: [HUDI-390] Add backtick character in hive queries to support hive identiter as tablename

2019-12-09 Thread GitBox
nisheet195 opened a new pull request #1090: [HUDI-390] Add backtick character 
in hive queries to support hive identiter as tablename
URL: https://github.com/apache/incubator-hudi/pull/1090
 
 
   ## What is the purpose of the pull request
   This pull request adds support for keyword as tablename in hive sync 
   
   ## Brief change log
- Add back tick character to escape keywords in hive queries
   
   ## Verify this pull request
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   ## Committer checklist
   
- [x] Has a corresponding JIRA in PR title & commit

- [x] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HUDI-309) General Redesign of Archived Timeline for efficient scan and management

2019-12-09 Thread Balaji Varadarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991654#comment-16991654
 ] 

Balaji Varadarajan commented on HUDI-309:
-

[~nicholasjiang] [~rxu] : Sorry for the late reply. Yes, we would need 
multi-level partitioning (actionType, commitTime) to make the storage structure 
efficient for timeline query use-cases. 

I would be more than happy to collaborate with you in discussing further and 
coming up with final complete design. Would it work for you if we can revive 
this discussion in January ?  I was planning to work on this once I finish up 
the Bootstrap proposal for Hudi and back from vacation :)

 

 

> General Redesign of Archived Timeline for efficient scan and management
> ---
>
> Key: HUDI-309
> URL: https://issues.apache.org/jira/browse/HUDI-309
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Common Core
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.5.1
>
> Attachments: Archive TImeline Notes by Vinoth 1.jpg, Archived 
> Timeline Notes by Vinoth 2.jpg
>
>
> As designed by Vinoth:
> Goals
>  # Archived Metadata should be scannable in the same way as data
>  # Provides more safety by always serving committed data independent of 
> timeframe when the corresponding commit action was tried. Currently, we 
> implicitly assume a data file to be valid if its commit time is older than 
> the earliest time in the active timeline. While this works ok, any inherent 
> bugs in rollback could inadvertently expose a possibly duplicate file when 
> its commit timestamp becomes older than that of any commits in the timeline.
>  # We had to deal with lot of corner cases because of the way we treat a 
> "commit" as special after it gets archived. Examples also include Savepoint 
> handling logic by cleaner.
>  # Small Files : For Cloud stores, archiving simply moves fils from one 
> directory to another causing the archive folder to grow. We need a way to 
> efficiently compact these files and at the same time be friendly to scans
> Design:
>  The basic file-group abstraction for managing file versions for data files 
> can be extended to managing archived commit metadata. The idea is to use an 
> optimal format (like HFile) for storing compacted version of  Metadata> pairs. Every archiving run will read  pairs 
> from active timeline and append to indexable log files. We will run periodic 
> minor compactions to merge multiple log files to a compacted HFile storing 
> metadata for a time-range. It should be also noted that we will partition by 
> the action types (commit/clean).  This design would allow for the archived 
> timeline to be queryable for determining whether a timeline is valid or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1036: [HUDI-353] Add hive style partitioning path

2019-12-09 Thread GitBox
bvaradar commented on a change in pull request #1036: [HUDI-353] Add hive style 
partitioning path
URL: https://github.com/apache/incubator-hudi/pull/1036#discussion_r355473878
 
 

 ##
 File path: hudi-hive/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
 ##
 @@ -68,6 +68,10 @@
   @Parameter(names = {"--use-jdbc"}, description = "Hive jdbc connect url")
   public Boolean useJdbc = true;
 
+  @Parameter(names = "--use-hive-style-partitioning", description = "Use Hive 
style partitioning, the name of "
+  + "partition folders follow = 
format")
+  public Boolean useHiveStylePartitioning = false;
 
 Review comment:
   @zhedoubushishi : Is this still being used ? If not, Can you remove it


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] nisheet195 commented on issue #1084: Hive Sync fails when table name is a keyword

2019-12-09 Thread GitBox
nisheet195 commented on issue #1084: Hive Sync fails when table name is a 
keyword
URL: https://github.com/apache/incubator-hudi/issues/1084#issuecomment-563237716
 
 
   Sure. Will take it up


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

2019-12-09 Thread GitBox
pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: 
Options provided for passing properties to Cleaner, compactor and importer 
commands
URL: https://github.com/apache/incubator-hudi/pull/1080#discussion_r355375523
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java
 ##
 @@ -129,4 +125,31 @@ public String showCleanPartitions(@CliOption(key = 
{"clean"}, help = "clean to s
 return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, 
descending, limit, headerOnly, rows);
 
   }
+
+  @CliCommand(value = "cleans run", help = "run clean")
+  public String runClean(@CliOption(key = "sparkMemory", 
unspecifiedDefaultValue = "4G",
+  help = "Spark executor memory") final String sparkMemory,
+ @CliOption(key = "propsFilePath", help = "path to 
properties file on localfs or dfs with configurations for hoodie client for 
cleaning",
+   unspecifiedDefaultValue = "") final String 
propsFilePath,
+ @CliOption(key = "hoodieConfigs", help = "Any 
configuration that can be set in the properties file can be passed here in the 
form of an array",
+   unspecifiedDefaultValue = "") final String[] 
configs,
+ @CliOption(key = "sparkMaster", 
unspecifiedDefaultValue = "", help = "Spark Master ") String master) throws 
IOException, InterruptedException, URISyntaxException {
+boolean initialized = HoodieCLI.initConf();
+HoodieCLI.initFS(initialized);
+
+String sparkPropertiesPath =
+
Utils.getDefaultPropertiesFile(JavaConverters.mapAsScalaMapConverter(System.getenv()).asScala());
+SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath);
+
+String cmd = SparkMain.SparkCommand.CLEAN.toString();
+sparkLauncher.addAppArgs(cmd, HoodieCLI.tableMetadata.getBasePath(), 
master, propsFilePath, sparkMemory);
+Arrays.stream(configs).filter(config -> config.contains("=") && 
config.split("=").length == 2).forEach(sparkLauncher::addAppArgs);
 
 Review comment:
   ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

2019-12-09 Thread GitBox
pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: 
Options provided for passing properties to Cleaner, compactor and importer 
commands
URL: https://github.com/apache/incubator-hudi/pull/1080#discussion_r355375233
 
 

 ##
 File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java
 ##
 @@ -73,18 +79,42 @@ public static void main(String[] args) throws Exception {
 break;
   case IMPORT:
   case UPSERT:
-assert (args.length == 11);
+assert (args.length >= 12);
 
 Review comment:
   Let us discuss this on 
https://github.com/apache/incubator-hudi/pull/1080#discussion_r355368831. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

2019-12-09 Thread GitBox
pratyakshsharma commented on a change in pull request #1080: [HUDI-118]: 
Options provided for passing properties to Cleaner, compactor and importer 
commands
URL: https://github.com/apache/incubator-hudi/pull/1080#discussion_r355368831
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java
 ##
 @@ -129,4 +125,31 @@ public String showCleanPartitions(@CliOption(key = 
{"clean"}, help = "clean to s
 return HoodiePrintHelper.print(header, new HashMap<>(), sortByField, 
descending, limit, headerOnly, rows);
 
   }
+
+  @CliCommand(value = "cleans run", help = "run clean")
+  public String runClean(@CliOption(key = "sparkMemory", 
unspecifiedDefaultValue = "4G",
+  help = "Spark executor memory") final String sparkMemory,
+ @CliOption(key = "propsFilePath", help = "path to 
properties file on localfs or dfs with configurations for hoodie client for 
cleaning",
+   unspecifiedDefaultValue = "") final String 
propsFilePath,
+ @CliOption(key = "hoodieConfigs", help = "Any 
configuration that can be set in the properties file can be passed here in the 
form of an array",
 
 Review comment:
   @n3nash Please refer to this comment by @bvaradar - 
https://jira.apache.org/jira/browse/HUDI-118?focusedCommentId=16983739=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16983739.
 
   
   I was trying to implement as per what is suggested in this comment. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua edited a comment on issue #1057: Hudi Test Suite

2019-12-09 Thread GitBox
yanghua edited a comment on issue #1057: Hudi Test Suite
URL: https://github.com/apache/incubator-hudi/pull/1057#issuecomment-563148174
 
 
   I have made [HUDI-289](https://issues.apache.org/jira/browse/HUDI-289) as an 
umbrella issue and created some subtasks to measure detailed subtasks. @n3nash 
The basic implementation is related to 
[HUDI-394](https://issues.apache.org/jira/browse/HUDI-394). Can you squash the 
previous commits and rename the final commit message?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-394) Provide a basic implementation of test suite

2019-12-09 Thread vinoyang (Jira)
vinoyang created HUDI-394:
-

 Summary: Provide a basic implementation of test suite
 Key: HUDI-394
 URL: https://issues.apache.org/jira/browse/HUDI-394
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Testing
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua commented on issue #1057: Hudi Test Suite

2019-12-09 Thread GitBox
yanghua commented on issue #1057: Hudi Test Suite
URL: https://github.com/apache/incubator-hudi/pull/1057#issuecomment-563148174
 
 
   I have made [HUDI-289](https://issues.apache.org/jira/browse/HUDI-289) as an 
umbrella issue and created some subtasks to measure detailed subtasks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (HUDI-393) Integrate with Azure Pipeline run the end to end tests

2019-12-09 Thread vinoyang (Jira)
vinoyang created HUDI-393:
-

 Summary: Integrate with Azure Pipeline run the end to end tests
 Key: HUDI-393
 URL: https://issues.apache.org/jira/browse/HUDI-393
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Testing
Reporter: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-391) Rename module name from hudi-bench to hudi-test-suite

2019-12-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-391:
-

Assignee: vinoyang

> Rename module name from hudi-bench to hudi-test-suite
> -
>
> Key: HUDI-391
> URL: https://issues.apache.org/jira/browse/HUDI-391
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-392) Introduce DIstributedTestDataSource to generate test data

2019-12-09 Thread vinoyang (Jira)
vinoyang created HUDI-392:
-

 Summary: Introduce DIstributedTestDataSource to generate test data
 Key: HUDI-392
 URL: https://issues.apache.org/jira/browse/HUDI-392
 Project: Apache Hudi (incubating)
  Issue Type: Sub-task
  Components: Testing
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-289) Implement a test suite to support long running test for Hudi writing and querying end-end

2019-12-09 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned HUDI-289:
-

Assignee: (was: vinoyang)

> Implement a test suite to support long running test for Hudi writing and 
> querying end-end
> -
>
> Key: HUDI-289
> URL: https://issues.apache.org/jira/browse/HUDI-289
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>  Components: Usability
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.5.1
>
>
> We would need an equivalent of an end-end test which runs some workload for 
> few hours atleast, triggers various actions like commit, deltacopmmit, 
> rollback, compaction and ensures correctness of code before every release
> P.S: Learn from all the CSS issues managing compaction..
> The feature branch is here: 
> [https://github.com/apache/incubator-hudi/tree/hudi_test_suite_refactor]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)