[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1306: [HUDI-598] Update quick start page

2020-02-04 Thread GitBox
lamber-ken edited a comment on issue #1306: [HUDI-598] Update quick start page
URL: https://github.com/apache/incubator-hudi/pull/1306#issuecomment-582246372
 
 
   Hi @bhasudha, I did and verify that "org.apache.hudi" to "hudi" 
https://github.com/apache/incubator-hudi/pull/1054 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #765: [WIP] Fix KafkaAvroSource to use the latest schema

2020-02-04 Thread GitBox
vinothchandar commented on issue #765: [WIP] Fix KafkaAvroSource to use the 
latest schema
URL: https://github.com/apache/incubator-hudi/pull/765#issuecomment-582259704
 
 
   cc @pratyakshsharma this has been around for long.. Could you take a look to 
understand if we still need this and may be retarget this for the next release 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1165: [HUDI-76] Add CSV Source support for Hudi Delta Streamer

2020-02-04 Thread GitBox
vinothchandar commented on issue #1165: [HUDI-76] Add CSV Source support for 
Hudi Delta Streamer
URL: https://github.com/apache/incubator-hudi/pull/1165#issuecomment-582259477
 
 
   cc @pratyakshsharma could you help review this? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375075893
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   The right way is add `archiveFolderPattern` to `show archived commits 
command`
   like `show archived commit stats` dose.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 
'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375075573
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##
 @@ -290,7 +291,7 @@ public long getTotalRollbackBlocks() {
 return totalRollbackBlocks;
   }
 
-  public void setTotalRollbackBlocks(Long totalRollbackBlocks) {
 
 Review comment:
   Discussed offline. Without this we are not able to read certain archived 
commits


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
satishkotha commented on a change in pull request #1274: [HUDI-571] Add 
'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375075546
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##
 @@ -135,6 +135,7 @@
   /**
* Total number of rollback blocks seen in a compaction operation.
*/
+  @Nullable
 
 Review comment:
   Discussed offline. Without this we are not able to read certain archived 
commits


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375075667
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   @n3nash, they are different,
   one: `/table/.hoodie/archived/.commits_.archive*`
   another: `/table/.hoodie/.commits_archive*`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-04 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582248349
 
 
   Thanks for review this pr @vinothchandar, this pr does not lock the user 
into `spark-avro:2.4.4`. 
   
   If the user wants some changes in `spark-avro:3.0-preview2`, the right way 
is modify spark-version to `3.0-preview2` at pom.xml file, then build hudi 
project source.
   
   
https://github.com/apache/incubator-hudi/blob/4de0fcfcb54ac76ed3b6852917588c32fec9bea8/pom.xml#L95
   
   ```
   
 org.apache.spark
 spark-avro_${scala.binary.version}
 ${spark.version}
 provided
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] UZi5136225 opened a new pull request #1308: Hudi 561 partition path config

2020-02-04 Thread GitBox
UZi5136225 opened a new pull request #1308: Hudi 561 partition path config
URL: https://github.com/apache/incubator-hudi/pull/1308
 
 
   ##What is the purpose of the pull request
   
   source and target time partition field format customization
   
   ##Brief change log
   
   1.HoodieCreateHandle.class added code to handle time format.
   2.added HoodieParseException.class
   
   ##Verify this pull request
   
   Manually verified the change by running a job locally.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1306: [HUDI-598] Update quick start page

2020-02-04 Thread GitBox
lamber-ken commented on issue #1306: [HUDI-598] Update quick start page
URL: https://github.com/apache/incubator-hudi/pull/1306#issuecomment-582246372
 
 
   Hi @bhasudha, I do and verify that "org.apache.hudi" to "hudi" 
https://github.com/apache/incubator-hudi/pull/1054 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1306: [HUDI-598] Update quick start page

2020-02-04 Thread GitBox
lamber-ken commented on a change in pull request #1306: [HUDI-598] Update quick 
start page
URL: https://github.com/apache/incubator-hudi/pull/1306#discussion_r375064809
 
 

 ##
 File path: docs/_docs/1_1_quick_start_guide.md
 ##
 @@ -176,28 +176,28 @@ Delete records for the HoodieKeys passed in.
 
 ```scala
 // fetch total records count
-spark.sql("select uuid, partitionPath from hudi_ro_table").count()
+spark.sql("select uuid, partitionPath from hudi_trips_snapshot").count()
 // fetch two records to be deleted
-val ds = spark.sql("select uuid, partitionPath from hudi_ro_table").limit(2)
+val ds = spark.sql("select uuid, partitionPath from 
hudi_trips_snapshot").limit(2)
 
 // issue deletes
 val deletes = dataGen.generateDeletes(ds.collectAsList())
 val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
-df.write.format("org.apache.hudi").
-options(getQuickstartWriteConfigs).
-option(OPERATION_OPT_KEY,"delete").
-option(PRECOMBINE_FIELD_OPT_KEY, "ts").
-option(RECORDKEY_FIELD_OPT_KEY, "uuid").
-option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
-option(TABLE_NAME, tableName).
-mode(Append).
-save(basePath);
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(OPERATION_OPT_KEY,"delete").
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
 
 // run the same read query as above.
 val roAfterDeleteViewDF = spark.
-read.
-format("org.apache.hudi").
-load(basePath + "/*/*/*/*")
+  read.
+  format("hudi").
+  load(basePath + "/*/*/*/*")
 roAfterDeleteViewDF.registerTempTable("hudi_ro_table")
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-02-04 Thread GitBox
vinothchandar commented on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-582238771
 
 
   @umehrot2 as well.. We considered doing this when we moved to `spark-avro` 
actually. The concern was that the bundle now has a dependency on the spark 
version used by Hudi.. i.e we use spark 2.4.4 and the spark bundle would 
contain `spark-avro:2.4.4`. 
   
   I believe this may be working with `spark-3.0-preview2` now. But what if the 
user wants some changes in `spark-avro:3.0-preview2` (esp with avro <=> row 
datatype conversions that keep coming up)? They would have to make a custom 
build right?  
   
   In short, I don't know if this is a good idea..  
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-02-04 Thread GitBox
vinothchandar commented on issue #1187: [HUDI-499] Allow update partition path 
with GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-582236263
 
 
   @xushiyan I think there is a checkstyle error. can you please take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #180

2020-02-04 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.08 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4:
bin
boot
conf
lib
LICENSE
NOTICE
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/bin:
m2.conf
mvn
mvn.cmd
mvnDebug
mvnDebug.cmd
mvnyjp

/home/jenkins/tools/maven/apache-maven-3.5.4/boot:
plexus-classworlds-2.5.2.jar

/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.5.2-SNAPSHOT'
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Hudi   [pom]
[INFO] hudi-common[jar]
[INFO] hudi-timeline-service  [jar]
[INFO] hudi-hadoop-mr [jar]
[INFO] hudi-client[jar]
[INFO] hudi-hive  [jar]
[INFO] hudi-spark_2.11[jar]
[INFO] hudi-utilities_2.11[jar]
[INFO] hudi-cli   [jar]
[INFO] hudi-hadoop-mr-bundle  [jar]
[INFO] hudi-hive-bundle   [jar]
[INFO] hudi-spark-bundle_2.11 [jar]
[INFO] hudi-presto-bundle [jar]
[INFO] hudi-utilities-bundle_2.11

[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375039736
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   archivePath = new Path(metaClient.getArchivePath() + "/.commits_.archive*") 
is equivalent of new Path(basePath + "/.hoodie/.commits_.archive*"). the 
metaClient.getArchivePath() should return `basePath + "/.hoodie"` for all old 
tables.
   @hmatu what concerns do you have ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r375039736
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   archivePath = new Path(metaClient.getArchivePath() + "/.commits_.archive*") 
is equivalent of new Path(basePath + "/.hoodie/.commits_.archive*"). the 
metaClient.getArchivePath() should return `basePath + "/.hoodie"` for all old 
tables.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1307: [HUDI-570] - Improve test coverage for FSUtils.java

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1307: [HUDI-570] - Improve test 
coverage for FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307#discussion_r375038980
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestFSUtils.java
 ##
 @@ -43,11 +49,33 @@
 import java.util.stream.Stream;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
  * Tests file system utils.
  */
 public class TestFSUtils extends HoodieCommonTestHarness {
+  private long minRollbackToKeep = 10;
+  private long minCleanToKeep = 10;
+  protected transient FileSystem fs;
+  protected String basePath = null;
+
+  @Before
 
 Review comment:
   You don't need this init, the HoodieTestHarness actually creates a DFS based 
file system which works for log records and should not run into the exception 
for which you might be creating the local file system 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1307: [HUDI-570] - Improve test coverage for FSUtils.java

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1307: [HUDI-570] - Improve test 
coverage for FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307#discussion_r375039117
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestFSUtils.java
 ##
 @@ -261,4 +289,87 @@ public void testLogFilesComparison() {
   public static String makeOldLogFileName(String fileId, String 
logFileExtension, String baseCommitTime, int version) {
 return "." + String.format("%s_%s%s.%d", fileId, baseCommitTime, 
logFileExtension, version);
   }
+
+  private void cleanupFiles(File[] cleanupFiles) {
 
 Review comment:
   Once you use the basePath from the DFS started by the TestHarness and create 
files underneath, when the test dies, it will automatically clean those files. 
You shouldn't need to clean it here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1307: [HUDI-570] - Improve test coverage for FSUtils.java

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1307: [HUDI-570] - Improve test 
coverage for FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307#discussion_r375038467
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/util/TestFSUtils.java
 ##
 @@ -43,11 +49,33 @@
 import java.util.stream.Stream;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
  * Tests file system utils.
  */
 public class TestFSUtils extends HoodieCommonTestHarness {
+  private long minRollbackToKeep = 10;
 
 Review comment:
   We shouldn't have counter variables as members of a test class, each test 
might mutate it and eventually break other tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository.

nagarwal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4de0fcf  [HUDI-566] Added new test cases for class HoodieTimeline, 
HoodieDefaultTimeline and HoodieActiveTimeline.
4de0fcf is described below

commit 4de0fcfcb54ac76ed3b6852917588c32fec9bea8
Author: Prashant Wason 
AuthorDate: Mon Jan 27 15:38:33 2020 -0800

[HUDI-566] Added new test cases for class HoodieTimeline, 
HoodieDefaultTimeline and HoodieActiveTimeline.
---
 .../apache/hudi/common/table/HoodieTimeline.java   |   4 +
 .../table/string/TestHoodieActiveTimeline.java | 276 -
 2 files changed, 279 insertions(+), 1 deletion(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java 
b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
old mode 100644
new mode 100755
index a964411..015a497
--- a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
@@ -56,6 +56,10 @@ public interface HoodieTimeline extends Serializable {
   String REQUESTED_EXTENSION = ".requested";
   String RESTORE_ACTION = "restore";
 
+  String[] VALID_ACTIONS_IN_TIMELINE = {COMMIT_ACTION, DELTA_COMMIT_ACTION,
+  CLEAN_ACTION, SAVEPOINT_ACTION, RESTORE_ACTION, ROLLBACK_ACTION,
+  COMPACTION_ACTION};
+
   String COMMIT_EXTENSION = "." + COMMIT_ACTION;
   String DELTA_COMMIT_EXTENSION = "." + DELTA_COMMIT_ACTION;
   String CLEAN_EXTENSION = "." + CLEAN_ACTION;
diff --git 
a/hudi-common/src/test/java/org/apache/hudi/common/table/string/TestHoodieActiveTimeline.java
 
b/hudi-common/src/test/java/org/apache/hudi/common/table/string/TestHoodieActiveTimeline.java
old mode 100644
new mode 100755
index 55a91cf..a9f027e
--- 
a/hudi-common/src/test/java/org/apache/hudi/common/table/string/TestHoodieActiveTimeline.java
+++ 
b/hudi-common/src/test/java/org/apache/hudi/common/table/string/TestHoodieActiveTimeline.java
@@ -18,7 +18,6 @@
 
 package org.apache.hudi.common.table.string;
 
-import org.apache.hadoop.fs.Path;
 import org.apache.hudi.common.HoodieCommonTestHarness;
 import org.apache.hudi.common.model.HoodieTestUtils;
 import org.apache.hudi.common.model.TimelineLayoutVersion;
@@ -29,12 +28,22 @@ import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieInstant.State;
 import org.apache.hudi.common.util.Option;
 
+import com.google.common.collect.Sets;
+import org.apache.hadoop.fs.Path;
 import org.junit.Before;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.rules.ExpectedException;
 
 import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+import java.util.function.BiConsumer;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
 import java.util.stream.Stream;
 
 import static org.apache.hudi.common.model.TimelineLayoutVersion.VERSION_0;
@@ -164,4 +173,269 @@ public class TestHoodieActiveTimeline extends 
HoodieCommonTestHarness {
 assertFalse("", activeCommitTimeline.isBeforeTimelineStarts("02"));
 assertTrue("", activeCommitTimeline.isBeforeTimelineStarts("00"));
   }
+
+  @Test
+  public void testTimelineGetOperations() {
+List allInstants = getAllInstants();
+Supplier> sup = () -> allInstants.stream();
+timeline = new HoodieActiveTimeline(metaClient, true);
+timeline.setInstants(allInstants);
+
+/**
+ * Helper function to check HoodieTimeline only contains some type of 
Instant actions.
+ * @param timeline The HoodieTimeline to check
+ * @param actions The actions that should be present in the timeline being 
checked
+ */
+BiConsumer> checkTimeline = (HoodieTimeline 
timeline, Set actions) -> {
+  sup.get().filter(i -> actions.contains(i.getAction())).forEach(i -> 
assertTrue(timeline.containsInstant(i)));
+  sup.get().filter(i -> !actions.contains(i.getAction())).forEach(i -> 
assertFalse(timeline.containsInstant(i)));
+};
+
+// Test that various types of getXXX operations from HoodieActiveTimeline
+// return the correct set of Instant
+checkTimeline.accept(timeline.getCommitsTimeline(),
+Sets.newHashSet(HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION));
+checkTimeline.accept(timeline.getCommitsAndCompactionTimeline(),
+Sets.newHashSet(HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.COMPACTION_ACTION));
+checkTimeline.accept(timeline.getCommitTimeline(), 
Sets.newHashSet(HoodieTimeline.COMMIT_ACTION));
+
+checkTimeline.accept(timeline.getDeltaCommitTimeline(), 
Sets.newHashSet(HoodieTimeline.DELTA_COMMIT_ACTION));
+

[GitHub] [incubator-hudi] n3nash merged pull request #1287: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread GitBox
n3nash merged pull request #1287: [HUDI-566] Added new test cases for class 
HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.
URL: https://github.com/apache/incubator-hudi/pull/1287
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with GLOBAL_BLOOM

2020-02-04 Thread GitBox
xushiyan commented on issue #1187: [HUDI-499] Allow update partition path with 
GLOBAL_BLOOM
URL: https://github.com/apache/incubator-hudi/pull/1187#issuecomment-582215148
 
 
   @vinothchandar Squashing done. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on issue #1287: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread GitBox
prashantwason commented on issue #1287: [HUDI-566] Added new test cases for 
class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.
URL: https://github.com/apache/incubator-hudi/pull/1287#issuecomment-582205972
 
 
   Pushed a merged commit. The test which failed on travis (some timeout) 
succeeds on my local setup. Hopefully it was a transient error and works this 
time.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits 
show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375016293
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##
 @@ -290,7 +291,7 @@ public long getTotalRollbackBlocks() {
 return totalRollbackBlocks;
   }
 
-  public void setTotalRollbackBlocks(Long totalRollbackBlocks) {
 
 Review comment:
   same here, we can address this as part of some other refactoring diff


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
n3nash commented on a change in pull request #1274: [HUDI-571] Add 'commits 
show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r375016233
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieWriteStat.java
 ##
 @@ -135,6 +135,7 @@
   /**
* Total number of rollback blocks seen in a compaction operation.
*/
+  @Nullable
 
 Review comment:
   Please avoid this change as part of this diff


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-570) Improve unit test coverage FSUtils.java

2020-02-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-570:

Labels: pull-request-available  (was: )

> Improve unit test coverage FSUtils.java
> ---
>
> Key: HUDI-570
> URL: https://issues.apache.org/jira/browse/HUDI-570
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Balajee Nagasubramaniam
>Priority: Minor
>  Labels: pull-request-available
>
> Add test cases for 
> - deleteOlderRollbackMetaFiles()
> - deleteOlderCleanMetaFiles()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] nbalajee opened a new pull request #1307: [HUDI-570] - Improve test coverage for FSUtils.java

2020-02-04 Thread GitBox
nbalajee opened a new pull request #1307: [HUDI-570] - Improve test coverage 
for FSUtils.java
URL: https://github.com/apache/incubator-hudi/pull/1307
 
 
   
   ## What is the purpose of the pull request
   Increase the test coverage for FSUtils.java under hudi-common.
   
   ## Brief change log
   Added test cases to increase coverage.
   
   ## Verify this pull request
   This change added tests and can be verified as follows:
   - testDeleteOlderRollbackFiles
   - testDeleteOlderCleanMetaFiles
   - testFileNameRelatedFunctions
   
   ## Committer checklist
   
- [x ] Has a corresponding JIRA in PR title & commit

- [x ] Commit message is descriptive of the change

- [x ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-594) Improve unit test coverage for HoodieReadClient

2020-02-04 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish resolved HUDI-594.
-
Resolution: Fixed

[https://github.com/apache/incubator-hudi/pull/1299]

> Improve unit test coverage for HoodieReadClient
> ---
>
> Key: HUDI-594
> URL: https://issues.apache.org/jira/browse/HUDI-594
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-594) Improve unit test coverage for HoodieReadClient

2020-02-04 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-594:

Status: Open  (was: New)

> Improve unit test coverage for HoodieReadClient
> ---
>
> Key: HUDI-594
> URL: https://issues.apache.org/jira/browse/HUDI-594
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-594) Improve unit test coverage for HoodieReadClient

2020-02-04 Thread satish (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

satish updated HUDI-594:

Status: In Progress  (was: Open)

> Improve unit test coverage for HoodieReadClient
> ---
>
> Key: HUDI-594
> URL: https://issues.apache.org/jira/browse/HUDI-594
> Project: Apache Hudi (incubating)
>  Issue Type: Test
>Reporter: satish
>Assignee: satish
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1306: [HUDI-598] Update quick start page

2020-02-04 Thread GitBox
bhasudha commented on a change in pull request #1306: [HUDI-598] Update quick 
start page
URL: https://github.com/apache/incubator-hudi/pull/1306#discussion_r374955592
 
 

 ##
 File path: docs/_docs/1_1_quick_start_guide.md
 ##
 @@ -176,28 +176,28 @@ Delete records for the HoodieKeys passed in.
 
 ```scala
 // fetch total records count
-spark.sql("select uuid, partitionPath from hudi_ro_table").count()
+spark.sql("select uuid, partitionPath from hudi_trips_snapshot").count()
 // fetch two records to be deleted
-val ds = spark.sql("select uuid, partitionPath from hudi_ro_table").limit(2)
+val ds = spark.sql("select uuid, partitionPath from 
hudi_trips_snapshot").limit(2)
 
 // issue deletes
 val deletes = dataGen.generateDeletes(ds.collectAsList())
 val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
-df.write.format("org.apache.hudi").
-options(getQuickstartWriteConfigs).
-option(OPERATION_OPT_KEY,"delete").
-option(PRECOMBINE_FIELD_OPT_KEY, "ts").
-option(RECORDKEY_FIELD_OPT_KEY, "uuid").
-option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
-option(TABLE_NAME, tableName).
-mode(Append).
-save(basePath);
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(OPERATION_OPT_KEY,"delete").
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
 
 // run the same read query as above.
 val roAfterDeleteViewDF = spark.
-read.
-format("org.apache.hudi").
-load(basePath + "/*/*/*/*")
+  read.
+  format("hudi").
+  load(basePath + "/*/*/*/*")
 roAfterDeleteViewDF.registerTempTable("hudi_ro_table")
 // fetch should return (total - 2) records
 spark.sql("select uuid, partitionPath from hudi_ro_table").count()
 
 Review comment:
   hudi_ro_table -> hudi_trips_snapshot here as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1306: [HUDI-598] Update quick start page

2020-02-04 Thread GitBox
bhasudha commented on a change in pull request #1306: [HUDI-598] Update quick 
start page
URL: https://github.com/apache/incubator-hudi/pull/1306#discussion_r374956098
 
 

 ##
 File path: docs/_docs/1_1_quick_start_guide.md
 ##
 @@ -176,28 +176,28 @@ Delete records for the HoodieKeys passed in.
 
 ```scala
 // fetch total records count
-spark.sql("select uuid, partitionPath from hudi_ro_table").count()
+spark.sql("select uuid, partitionPath from hudi_trips_snapshot").count()
 // fetch two records to be deleted
-val ds = spark.sql("select uuid, partitionPath from hudi_ro_table").limit(2)
+val ds = spark.sql("select uuid, partitionPath from 
hudi_trips_snapshot").limit(2)
 
 // issue deletes
 val deletes = dataGen.generateDeletes(ds.collectAsList())
 val df = spark.read.json(spark.sparkContext.parallelize(deletes, 2));
-df.write.format("org.apache.hudi").
-options(getQuickstartWriteConfigs).
-option(OPERATION_OPT_KEY,"delete").
-option(PRECOMBINE_FIELD_OPT_KEY, "ts").
-option(RECORDKEY_FIELD_OPT_KEY, "uuid").
-option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
-option(TABLE_NAME, tableName).
-mode(Append).
-save(basePath);
+df.write.format("hudi").
+  options(getQuickstartWriteConfigs).
+  option(OPERATION_OPT_KEY,"delete").
+  option(PRECOMBINE_FIELD_OPT_KEY, "ts").
+  option(RECORDKEY_FIELD_OPT_KEY, "uuid").
+  option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
+  option(TABLE_NAME, tableName).
+  mode(Append).
+  save(basePath)
 
 // run the same read query as above.
 val roAfterDeleteViewDF = spark.
-read.
-format("org.apache.hudi").
-load(basePath + "/*/*/*/*")
+  read.
+  format("hudi").
+  load(basePath + "/*/*/*/*")
 roAfterDeleteViewDF.registerTempTable("hudi_ro_table")
 
 Review comment:
   consider changing hudi_ro_table -> hudi_trips_snapshot here as well for 
consistency ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on issue #1287: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread GitBox
prashantwason commented on issue #1287: [HUDI-566] Added new test cases for 
class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.
URL: https://github.com/apache/incubator-hudi/pull/1287#issuecomment-582143133
 
 
   I have addressed all the comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-584) Relocate spark-avro dependency by maven-shade-plugin

2020-02-04 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha updated HUDI-584:
---
Status: Open  (was: New)

> Relocate spark-avro dependency by maven-shade-plugin
> 
>
> Key: HUDI-584
> URL: https://issues.apache.org/jira/browse/HUDI-584
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Relocate spark-avro dependency by maven-shade-plugin, spark-avro module is 
> not included with spark-shell by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] zhedoubushishi commented on issue #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-04 Thread GitBox
zhedoubushishi commented on issue #1293: [HUDI-585] Optimize the steps of 
building with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#issuecomment-582124317
 
 
   @lamber-ken Tested. It works for me!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1287: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread GitBox
prashantwason commented on a change in pull request #1287: [HUDI-566] Added new 
test cases for class HoodieTimeline, HoodieDefaultTimeline and 
HoodieActiveTimeline.
URL: https://github.com/apache/incubator-hudi/pull/1287#discussion_r374916739
 
 

 ##
 File path: 
hudi-common/src/test/java/org/apache/hudi/common/table/string/TestHoodieActiveTimeline.java
 ##
 @@ -164,4 +173,269 @@ public void testTimelineOperations() {
 assertFalse("", activeCommitTimeline.isBeforeTimelineStarts("02"));
 assertTrue("", activeCommitTimeline.isBeforeTimelineStarts("00"));
   }
+
+  @Test
+  public void testTimelineGetOperations() {
+List allInstants = getAllInstants();
+Supplier> sup = () -> allInstants.stream();
+timeline = new HoodieActiveTimeline(metaClient, true);
+timeline.setInstants(allInstants);
+
+/**
+ * Helper function to check HoodieTimeline only contains some type of 
Instant actions.
+ * @param timeline The HoodieTimeline to check
+ * @param actions The actions that should be present in the timeline being 
checked
+ */
+BiConsumer> checkTimeline = (HoodieTimeline 
timeline, Set actions) -> {
+  sup.get().filter(i -> actions.contains(i.getAction())).forEach(i -> 
assertTrue(timeline.containsInstant(i)));
+  sup.get().filter(i -> !actions.contains(i.getAction())).forEach(i -> 
assertFalse(timeline.containsInstant(i)));
+};
+
+// Test that various types of getXXX operations from HoodieActiveTimeline
+// return the correct set of Instant
+checkTimeline.accept(timeline.getCommitsTimeline(),
+Sets.newHashSet(HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION));
+checkTimeline.accept(timeline.getCommitsAndCompactionTimeline(),
+Sets.newHashSet(HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION, HoodieTimeline.COMPACTION_ACTION));
+checkTimeline.accept(timeline.getCommitTimeline(), 
Sets.newHashSet(HoodieTimeline.COMMIT_ACTION));
+
+checkTimeline.accept(timeline.getDeltaCommitTimeline(), 
Sets.newHashSet(HoodieTimeline.DELTA_COMMIT_ACTION));
+checkTimeline.accept(timeline.getCleanerTimeline(), 
Sets.newHashSet(HoodieTimeline.CLEAN_ACTION));
+checkTimeline.accept(timeline.getRollbackTimeline(), 
Sets.newHashSet(HoodieTimeline.ROLLBACK_ACTION));
+checkTimeline.accept(timeline.getRestoreTimeline(), 
Sets.newHashSet(HoodieTimeline.RESTORE_ACTION));
+checkTimeline.accept(timeline.getSavePointTimeline(), 
Sets.newHashSet(HoodieTimeline.SAVEPOINT_ACTION));
+checkTimeline.accept(timeline.getAllCommitsTimeline(),
+Sets.newHashSet(HoodieTimeline.COMMIT_ACTION, 
HoodieTimeline.DELTA_COMMIT_ACTION,
+HoodieTimeline.CLEAN_ACTION, 
HoodieTimeline.COMPACTION_ACTION,
+HoodieTimeline.SAVEPOINT_ACTION, 
HoodieTimeline.ROLLBACK_ACTION));
+
+// Get some random Instants
+Random rand = new Random();
+Set randomInstants = sup.get().filter(i -> rand.nextBoolean())
+  .map(i -> i.getAction())
+  .collect(Collectors.toSet());
+checkTimeline.accept(timeline.getTimelineOfActions(randomInstants), 
randomInstants);
+  }
+
+  @Test
+  public void testTimelineInstantOperations() {
+timeline = new HoodieActiveTimeline(metaClient, true);
+assertEquals("No instant present", timeline.countInstants(), 0);
+
+// revertToInflight
+HoodieInstant commit = new HoodieInstant(State.COMPLETED, 
HoodieTimeline.COMMIT_ACTION, "1");
+timeline.createNewInstant(commit);
+timeline = timeline.reload();
+assertEquals(timeline.countInstants(), 1);
+assertTrue(timeline.containsInstant(commit));
+HoodieInstant inflight = timeline.revertToInflight(commit);
+// revert creates the .requested file
+timeline = timeline.reload();
+assertEquals(timeline.countInstants(), 1);
+assertTrue(timeline.containsInstant(inflight));
+assertFalse(timeline.containsInstant(commit));
+
+// deleteInflight
+timeline.deleteInflight(inflight);
+timeline = timeline.reload();
+assertEquals(timeline.countInstants(), 1);
+assertFalse(timeline.containsInstant(inflight));
+assertFalse(timeline.containsInstant(commit));
+
+// deletePending
+timeline.createNewInstant(commit);
+timeline.createNewInstant(inflight);
+timeline = timeline.reload();
+assertEquals(timeline.countInstants(), 1);
+timeline.deletePending(inflight);
+timeline = timeline.reload();
+assertEquals(timeline.countInstants(), 1);
+assertFalse(timeline.containsInstant(inflight));
+assertTrue(timeline.containsInstant(commit));
+
+// deleteCompactionRequested
+HoodieInstant compaction = new HoodieInstant(State.REQUESTED, 
HoodieTimeline.COMPACTION_ACTION, "2");
+timeline.createNewInstant(compaction);
+timeline = timeline.reload();
+assertEquals(timeline.countInstants(), 2);
+

[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1287: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread GitBox
prashantwason commented on a change in pull request #1287: [HUDI-566] Added new 
test cases for class HoodieTimeline, HoodieDefaultTimeline and 
HoodieActiveTimeline.
URL: https://github.com/apache/incubator-hudi/pull/1287#discussion_r374916236
 
 

 ##
 File path: hudi-hadoop-mr/pom.xml
 ##
 @@ -114,6 +114,10 @@
 org.apache.rat
 apache-rat-plugin
   
+  
 
 Review comment:
   Sure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1287: [HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

2020-02-04 Thread GitBox
prashantwason commented on a change in pull request #1287: [HUDI-566] Added new 
test cases for class HoodieTimeline, HoodieDefaultTimeline and 
HoodieActiveTimeline.
URL: https://github.com/apache/incubator-hudi/pull/1287#discussion_r374915701
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java
 ##
 @@ -111,21 +115,21 @@
* view is constructed with this timeline so that file-slice after pending 
compaction-requested instant-time is also
* considered valid. A RT file-system view for reading must then merge the 
file-slices before and after pending
* compaction instant so that all delta-commits are read.
-   * 
+   *
 
 Review comment:
   Sure. will fix my setup.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on a change in pull request #1298: [HUDI-587] Fixed generation of jacoco coverage reports.

2020-02-04 Thread GitBox
prashantwason commented on a change in pull request #1298: [HUDI-587] Fixed 
generation of jacoco coverage reports.
URL: https://github.com/apache/incubator-hudi/pull/1298#discussion_r374913677
 
 

 ##
 File path: pom.xml
 ##
 @@ -113,6 +113,7 @@
 
 provided
 
+-Xmx1024m -XX:MaxPermSize=256m
 
 Review comment:
   I did not add this line. It was already part of the maven-surefire plugin's 
argLine.  I just moved the definition to a property so that jacoco plugin can 
add its own java-agent config to it and the unit tests can execute with jacococ 
agent enabled. 
   
   This was added in the following commit 3 months ago:
   https://github.com/apache/incubator-hudi/pull/979



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
satishkotha commented on issue #1274: [HUDI-571] Add 'commits show archived' 
command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-582105870
 
 
   @n3nash Fixed conflict. Note that test has been moved from hudi-common to 
hudi-client because sequence file based writing for archive log is no longer 
supported. Please take a look after Travis completes building


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] prashantwason commented on issue #1232: [HUDI-529] Added cobertura coverage reporting support.

2020-02-04 Thread GitBox
prashantwason commented on issue #1232: [HUDI-529] Added cobertura coverage 
reporting support.
URL: https://github.com/apache/incubator-hudi/pull/1232#issuecomment-582105349
 
 
   We can only use one of jacoco or cobertura plugins as they both work by 
adding a java-agent to the unit test. I have found that using cobertura plugin 
doubles the test run time. Also, cobertura seems to be less maintained (last 
release was in 2015). So its better to use jacoco.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-598) Update quick start page

2020-02-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-598:

Labels: pull-request-available  (was: )

> Update quick start page 
> 
>
> Key: HUDI-598
> URL: https://issues.apache.org/jira/browse/HUDI-598
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Priority: Major
>  Labels: pull-request-available
>
> 1. code padding
> 2. org.apache.hudi to hudi
> 3. fix wrong table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1306: [HUDI-598] Update quick start page

2020-02-04 Thread GitBox
lamber-ken opened a new pull request #1306: [HUDI-598] Update quick start page
URL: https://github.com/apache/incubator-hudi/pull/1306
 
 
   ## What is the purpose of the pull request
   
   1. Unify code indent ( 4 -> 2 )
   2. Change `org.apache.hudi` to `hudi`
   3. Fix hudi_ro_table not exists at Delete section
   
   ## Verify this pull request
   
   **Compare changes**
   - https://hudi.apache.org/docs/quick-start-guide.html
   - https://lamber-ken.github.io/docs/quick-start-guide.html
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] n3nash commented on issue #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
n3nash commented on issue #1274: [HUDI-571] Add 'commits show archived' command 
to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#issuecomment-582041849
 
 
   @satishkotha one conflict, please rebase and then I can merge this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-598) Update quick start page

2020-02-04 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-598:

Summary: Update quick start page   (was: Update quick start pages )

> Update quick start page 
> 
>
> Key: HUDI-598
> URL: https://issues.apache.org/jira/browse/HUDI-598
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Priority: Major
>
> 1. code padding
> 2. org.apache.hudi to hudi
> 3. fix wrong table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-598) Update quick start pages

2020-02-04 Thread lamber-ken (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lamber-ken updated HUDI-598:

Status: Open  (was: New)

> Update quick start pages 
> -
>
> Key: HUDI-598
> URL: https://issues.apache.org/jira/browse/HUDI-598
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: lamber-ken
>Priority: Major
>
> 1. code padding
> 2. org.apache.hudi to hudi
> 3. fix wrong table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-598) Update quick start pages

2020-02-04 Thread lamber-ken (Jira)
lamber-ken created HUDI-598:
---

 Summary: Update quick start pages 
 Key: HUDI-598
 URL: https://issues.apache.org/jira/browse/HUDI-598
 Project: Apache Hudi (incubating)
  Issue Type: Improvement
  Components: Docs
Reporter: lamber-ken


1. code padding
2. org.apache.hudi to hudi
3. fix wrong table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1305: [MINOR] Remove the declaration of thrown RuntimeException

2020-02-04 Thread GitBox
lamber-ken opened a new pull request #1305: [MINOR] Remove the declaration of 
thrown RuntimeException
URL: https://github.com/apache/incubator-hudi/pull/1305
 
 
   ## What is the purpose of the pull request
   
   From jdk doc, we know RuntimeExceptions do not need to be declared in method.
   
   https://docs.oracle.com/javase/8/docs/api/java/lang/RuntimeException.html
   
   ## Brief change log
   
 - *Remove the declaration of thrown RuntimeException*
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken opened a new pull request #1304: [MINOR] Replace Thread.sleep with TimeUnit*

2020-02-04 Thread GitBox
lamber-ken opened a new pull request #1304: [MINOR] Replace Thread.sleep with 
TimeUnit*
URL: https://github.com/apache/incubator-hudi/pull/1304
 
 
   ## What is the purpose of the pull request
   
   TimeUnit is probably easier to understand for non obvious durations.
   
   ## Brief change log
   
 - *Replace Thread.sleep with TimeUnit**
   
   ## Verify this pull request
   
   This pull request is code cleanup without any test coverage.
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (HUDI-493) Add docs for delete support in Hudi client apis

2020-02-04 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-493.
--
Fix Version/s: (was: 0.5.2)
   0.5.1
   Resolution: Fixed

> Add docs for delete support in Hudi client apis
> ---
>
> Key: HUDI-493
> URL: https://issues.apache.org/jira/browse/HUDI-493
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.5.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-503) Add hudi test suite documentation into the README file of the test suite module

2020-02-04 Thread vinoyang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang resolved HUDI-503.
---
Resolution: Done

Done via hudi_test_suite_refactor branch: 
044759aa5070a0c6def8413ff25ec02370c30947

> Add hudi test suite documentation into the README file of the test suite 
> module
> ---
>
> Key: HUDI-503
> URL: https://issues.apache.org/jira/browse/HUDI-503
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-493) Add docs for delete support in Hudi client apis

2020-02-04 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029937#comment-17029937
 ] 

sivabalan narayanan commented on HUDI-493:
--

nope. its complete. 

> Add docs for delete support in Hudi client apis
> ---
>
> Key: HUDI-493
> URL: https://issues.apache.org/jira/browse/HUDI-493
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>  Components: Docs
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.5.2
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] yanghua merged pull request #1191: [HUDI-503] Add hudi test suite documentation into the README file of the test suite module

2020-02-04 Thread GitBox
yanghua merged pull request #1191: [HUDI-503] Add hudi test suite documentation 
into the README file of the test suite module
URL: https://github.com/apache/incubator-hudi/pull/1191
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf merged pull request #1302: [HUDI-595] code cleanup, refactoring code out of PR# 1159

2020-02-04 Thread GitBox
leesf merged pull request #1302: [HUDI-595] code cleanup, refactoring code out 
of PR# 1159
URL: https://github.com/apache/incubator-hudi/pull/1302
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-595] code cleanup, refactoring code out of PR# 1159 (#1302)

2020-02-04 Thread leesf
This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 594da28  [HUDI-595] code cleanup, refactoring code out of PR# 1159 
(#1302)
594da28 is described below

commit 594da28fbf64fb20432e718a409577fd10516c4a
Author: Suneel Marthi 
AuthorDate: Tue Feb 4 14:52:03 2020 +0100

[HUDI-595] code cleanup, refactoring code out of PR# 1159 (#1302)
---
 .../org/apache/hudi/index/hbase/HBaseIndex.java| 23 ++---
 .../org/apache/hudi/io/HoodieCommitArchiveLog.java |  6 +--
 .../io/compact/strategy/CompactionStrategy.java|  8 +--
 .../main/java/org/apache/hudi/metrics/Metrics.java | 17 +++
 .../src/test/java/org/apache/hudi/TestCleaner.java | 37 +++---
 .../hudi/common/HoodieTestDataGenerator.java   | 14 ++
 .../index/bloom/TestHoodieGlobalBloomIndex.java| 51 ---
 .../org/apache/hudi/common/HoodieJsonPayload.java  |  5 +-
 .../hudi/common/table/log/HoodieLogFileReader.java | 33 ++--
 .../java/org/apache/hudi/common/util/FSUtils.java  |  2 +-
 .../hudi/common/minicluster/HdfsTestService.java   |  2 +-
 .../hudi/common/table/log/TestHoodieLogFormat.java | 26 +++---
 .../table/view/TestHoodieTableFileSystemView.java  | 10 ++--
 .../common/util/collection/TestDiskBasedMap.java   |  6 +--
 .../hadoop/hive/HoodieCombineHiveInputFormat.java  |  9 ++--
 .../realtime/TestHoodieRealtimeRecordReader.java   |  5 +-
 .../java/org/apache/hudi/hive/util/SchemaUtil.java | 31 +---
 .../org/apache/hudi/hive/TestHiveSyncTool.java |  4 +-
 .../org/apache/hudi/hive/util/HiveTestService.java | 11 ++--
 .../org/apache/hudi/integ/ITTestHoodieDemo.java| 58 +++---
 .../apache/hudi/utilities/HDFSParquetImporter.java |  7 ++-
 .../hudi/utilities/perf/TimelineServerPerf.java| 20 +++-
 .../sources/helpers/IncrSourceHelper.java  |  4 +-
 .../utilities/sources/helpers/KafkaOffsetGen.java  |  7 +--
 24 files changed, 172 insertions(+), 224 deletions(-)

diff --git 
a/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java 
b/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
index 3f79096..12d352d 100644
--- a/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
+++ b/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
@@ -205,9 +205,7 @@ public class HBaseIndex 
extends HoodieIndex {
 }
   }
   List> taggedRecords = new ArrayList<>();
-  HTable hTable = null;
-  try {
-hTable = (HTable) 
hbaseConnection.getTable(TableName.valueOf(tableName));
+  try (HTable hTable = (HTable) 
hbaseConnection.getTable(TableName.valueOf(tableName))) {
 List statements = new ArrayList<>();
 List currentBatchOfRecords = new LinkedList<>();
 // Do the tagging.
@@ -250,15 +248,6 @@ public class HBaseIndex 
extends HoodieIndex {
 }
   } catch (IOException e) {
 throw new HoodieIndexException("Failed to Tag indexed locations 
because of exception with HBase Client", e);
-  } finally {
-if (hTable != null) {
-  try {
-hTable.close();
-  } catch (IOException e) {
-// Ignore
-  }
-}
-
   }
   return taggedRecords.iterator();
 };
@@ -444,16 +433,14 @@ public class HBaseIndex 
extends HoodieIndex {
  */
 public int getBatchSize(int numRegionServersForTable, int 
maxQpsPerRegionServer, int numTasksDuringPut,
 int maxExecutors, int sleepTimeMs, float qpsFraction) {
-  int numRSAlive = numRegionServersForTable;
-  int maxReqPerSec = (int) (qpsFraction * numRSAlive * 
maxQpsPerRegionServer);
-  int numTasks = numTasksDuringPut;
-  int maxParallelPuts = Math.max(1, Math.min(numTasks, maxExecutors));
+  int maxReqPerSec = (int) (qpsFraction * numRegionServersForTable * 
maxQpsPerRegionServer);
+  int maxParallelPuts = Math.max(1, Math.min(numTasksDuringPut, 
maxExecutors));
   int maxReqsSentPerTaskPerSec = MILLI_SECONDS_IN_A_SECOND / sleepTimeMs;
   int multiPutBatchSize = Math.max(1, maxReqPerSec / (maxParallelPuts * 
maxReqsSentPerTaskPerSec));
   LOG.info("HbaseIndexThrottling: qpsFraction :" + qpsFraction);
-  LOG.info("HbaseIndexThrottling: numRSAlive :" + numRSAlive);
+  LOG.info("HbaseIndexThrottling: numRSAlive :" + 
numRegionServersForTable);
   LOG.info("HbaseIndexThrottling: maxReqPerSec :" + maxReqPerSec);
-  LOG.info("HbaseIndexThrottling: numTasks :" + numTasks);
+  LOG.info("HbaseIndexThrottling: numTasks :" + numTasksDuringPut);
   LOG.info("HbaseIndexThrottling: maxExecutors :" + maxExecutors);
   LOG.info("HbaseIndexThrottling: maxParallelPuts :" + maxParallelPuts);
   LOG.info("HbaseIndexThrottling: maxReqsSentPerTaskPerSec :" + 

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1302: [HUDI-595] code cleanup, refactoring code out of PR# 1159

2020-02-04 Thread GitBox
leesf commented on a change in pull request #1302: [HUDI-595] code cleanup, 
refactoring code out of PR# 1159
URL: https://github.com/apache/incubator-hudi/pull/1302#discussion_r374680057
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java
 ##
 @@ -491,7 +491,6 @@ public void 
testSchemaEvolutionAndRollbackBlockInLastLogFile() throws Exception
 writer = writeRollbackBlockToLogFile(partitionDir, schema, "fileid0", 
commitTime, newCommitTime, "101", 1);
 logFilePaths.add(writer.getLogFile().getPath().toString());
 writer.close();
-assertTrue("block - size should be > 0", size > 0);
 
 Review comment:
   okay, got it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1302: [HUDI-595] code cleanup, refactoring code out of PR# 1159

2020-02-04 Thread GitBox
yanghua commented on a change in pull request #1302: [HUDI-595] code cleanup, 
refactoring code out of PR# 1159
URL: https://github.com/apache/incubator-hudi/pull/1302#discussion_r374678425
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java
 ##
 @@ -491,7 +491,6 @@ public void 
testSchemaEvolutionAndRollbackBlockInLastLogFile() throws Exception
 writer = writeRollbackBlockToLogFile(partitionDir, schema, "fileid0", 
commitTime, newCommitTime, "101", 1);
 logFilePaths.add(writer.getLogFile().getPath().toString());
 writer.close();
-assertTrue("block - size should be > 0", size > 0);
 
 Review comment:
   Intellij IDEA shows that the size is always larger than 0, so the assertion 
is always true.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] leesf commented on a change in pull request #1302: [HUDI-595] code cleanup, refactoring code out of PR# 1159

2020-02-04 Thread GitBox
leesf commented on a change in pull request #1302: [HUDI-595] code cleanup, 
refactoring code out of PR# 1159
URL: https://github.com/apache/incubator-hudi/pull/1302#discussion_r374677091
 
 

 ##
 File path: 
hudi-hadoop-mr/src/test/java/org/apache/hudi/hadoop/realtime/TestHoodieRealtimeRecordReader.java
 ##
 @@ -491,7 +491,6 @@ public void 
testSchemaEvolutionAndRollbackBlockInLastLogFile() throws Exception
 writer = writeRollbackBlockToLogFile(partitionDir, schema, "fileid0", 
commitTime, newCommitTime, "101", 1);
 logFilePaths.add(writer.getLogFile().getPath().toString());
 writer.close();
-assertTrue("block - size should be > 0", size > 0);
 
 Review comment:
   any reason to remove the assert?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu edited a comment on issue #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu edited a comment on issue #1242: [HUDI-544] Adjust the read and write 
path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#issuecomment-581873000
 
 
   @hddong @n3nash @vinothchandar, `ArchivedCommitsCommand` contains two bellow 
commands,
   if just using `metaClient.getArchivePath() + "/.commits_.archive*"`, it will 
affect old tables.
   
   Compare to changes:
   - `metaClient.getArchivePath() + "/.commits_.archive*"` 
   - `basePath + "/.hoodie/.commits_.archive*"` 
   
   These two archive paths are different:
   - `/table/.hoodie/archived/.commits_.archive*`
   - `/table/.hoodie/.commits_archive*`
   
   So the better way is add `archiveFolderPattern` to `show archived commits` 
command.
   
   ```
   show archived commit stats
   ```
   ```
   show archived commits
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu edited a comment on issue #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu edited a comment on issue #1242: [HUDI-544] Adjust the read and write 
path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#issuecomment-581873000
 
 
   @hddong @n3nash, `ArchivedCommitsCommand` contains two bellow commands,
   if just using `metaClient.getArchivePath() + "/.commits_.archive*"`, it will 
affect old tables.
   
   Compare to changes:
   - `metaClient.getArchivePath() + "/.commits_.archive*"` 
   - `basePath + "/.hoodie/.commits_.archive*"` 
   
   These two archive paths are different:
   - `/table/.hoodie/archived/.commits_.archive*`
   - `/table/.hoodie/.commits_archive*`
   
   So the better way is add `archiveFolderPattern` to `show archived commits` 
command.
   
   ```
   show archived commit stats
   ```
   ```
   show archived commits
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu edited a comment on issue #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu edited a comment on issue #1242: [HUDI-544] Adjust the read and write 
path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#issuecomment-581873000
 
 
   @hddong @n3nash, `ArchivedCommitsCommand` contains two bellow commands,
   if just using `metaClient.getArchivePath() + "/.commits_.archive*"`, it will 
affect old tables.
   
   Compare to changes:
   - `metaClient.getArchivePath() + "/.commits_.archive*"` 
   - `basePath + "/.hoodie/.commits_.archive*"` 
   
   These two archive paths are different:
   - `/table/.hoodie/.commits_archive*`
   - `/table/.hoodie/archived/.commits_.archive*`
   
   So the better way is add `archiveFolderPattern` to `show archived commits` 
command.
   
   ```
   show archived commit stats
   ```
   ```
   show archived commits
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on issue #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu commented on issue #1242: [HUDI-544] Adjust the read and write path of 
archive
URL: https://github.com/apache/incubator-hudi/pull/1242#issuecomment-581873000
 
 
   @hddong @n3nash, `ArchivedCommitsCommand` contains two bellow commands,
   if just using `metaClient.getArchivePath() + "/.commits_.archive*"`, it will 
affect old tables.
   
   Compare to changes:
   - `metaClient.getArchivePath() + "/.commits_.archive*"` = 
'/table/.hoodie/archived/.commits_.archive*'
   - `basePath + "/.hoodie/.commits_.archive*"` = 
'/table/.hoodie/.commits_archive*'
   
   So the better way is add `archiveFolderPattern` to `show archived commits` 
command.
   
   ```
   show archived commit stats
   ```
   ```
   show archived commits
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read and write path of archive

2020-02-04 Thread GitBox
hmatu commented on a change in pull request #1242: [HUDI-544] Adjust the read 
and write path of archive
URL: https://github.com/apache/incubator-hudi/pull/1242#discussion_r374622575
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/ArchivedCommitsCommand.java
 ##
 @@ -138,9 +139,11 @@ public String showCommits(
   throws IOException {
 
 System.out.println("===> Showing only " + limit + " archived 
commits <===");
-String basePath = HoodieCLI.getTableMetaClient().getBasePath();
+HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
+String basePath = metaClient.getBasePath();
+Path archivePath = new Path(metaClient.getArchivePath() + 
"/.commits_.archive*");
 FileStatus[] fsStatuses =
-FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(new Path(basePath + 
"/.hoodie/.commits_.archive*"));
+FSUtils.getFs(basePath, HoodieCLI.conf).globStatus(archivePath);
 
 Review comment:
   This can affect all the old tables read archives.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1232: [HUDI-529] Added cobertura coverage reporting support.

2020-02-04 Thread GitBox
vinothchandar commented on issue #1232: [HUDI-529] Added cobertura coverage 
reporting support.
URL: https://github.com/apache/incubator-hudi/pull/1232#issuecomment-581801761
 
 
   @n3nash just want to avoid bringing in a new build system just for this, 
when things like this exist https://www.mojohaus.org/cobertura-maven-plugin/ 
   
   Slowly overtime, people will add things to gradle.. and I am concerned it 
will get out of hands


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2020-02-04 Thread GitBox
vinothchandar commented on a change in pull request #1274: [HUDI-571] Add 
'commits show archived' command to CLI
URL: https://github.com/apache/incubator-hudi/pull/1274#discussion_r374532498
 
 

 ##
 File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
 ##
 @@ -143,6 +143,83 @@ public HoodieTimeline filter(Predicate 
filter) {
 return new HoodieDefaultTimeline(instants.stream().filter(filter), 
details);
   }
 
+  /**
+   * Get all instants (commits, delta commits) that produce new data, in the 
active timeline.
+   */
+  public HoodieTimeline getCommitsTimeline() {
 
 Review comment:
   @n3nash can't think of anything top of my mind.. Should be fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch hudi_test_suite_refactor updated (33246c4 -> eaa4fb0)

2020-02-04 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository.

vinoyang pushed a change to branch hudi_test_suite_refactor
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git.


from 33246c4  Fixed resource leak in HiveTestService about hive meta store
 add eaa4fb0  [HUDI-441] Rename WorkflowDagGenerator and some class names 
in test package

No new revisions were added by this update.

Summary of changes:
 ...erator.java => SimpleWorkflowDagGenerator.java} |  3 +-
 .../hudi/testsuite/dag/WorkflowDagGenerator.java   | 56 +++---
 ...estComplexDag.java => ComplexDagGenerator.java} |  2 +-
 ...tHiveSyncDag.java => HiveSyncDagGenerator.java} |  2 +-
 ...ertOnlyDag.java => InsertOnlyDagGenerator.java} |  2 +-
 ...psertDag.java => InsertUpsertDagGenerator.java} |  2 +-
 .../apache/hudi/testsuite/dag/TestDagUtils.java|  4 +-
 .../hudi/testsuite/job/TestHoodieTestSuiteJob.java | 12 ++---
 8 files changed, 20 insertions(+), 63 deletions(-)
 copy 
hudi-test-suite/src/main/java/org/apache/hudi/testsuite/dag/{WorkflowDagGenerator.java
 => SimpleWorkflowDagGenerator.java} (96%)
 rename 
hudi-test-suite/src/test/java/org/apache/hudi/testsuite/dag/{TestComplexDag.java
 => ComplexDagGenerator.java} (97%)
 rename 
hudi-test-suite/src/test/java/org/apache/hudi/testsuite/dag/{TestHiveSyncDag.java
 => HiveSyncDagGenerator.java} (96%)
 rename 
hudi-test-suite/src/test/java/org/apache/hudi/testsuite/dag/{TestInsertOnlyDag.java
 => InsertOnlyDagGenerator.java} (96%)
 rename 
hudi-test-suite/src/test/java/org/apache/hudi/testsuite/dag/{TestInsertUpsertDag.java
 => InsertUpsertDagGenerator.java} (96%)



[GitHub] [incubator-hudi] vinothchandar commented on issue #1293: [HUDI-585] Optimize the steps of building with scala-2.12

2020-02-04 Thread GitBox
vinothchandar commented on issue #1293: [HUDI-585] Optimize the steps of 
building with scala-2.12
URL: https://github.com/apache/incubator-hudi/pull/1293#issuecomment-581788516
 
 
   Looks like you folks already have a handle on this :) .. @leesf I will let 
you take this across the finish line 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-02-04 Thread GitBox
vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for 
Spark DAG stages in Hudi.
URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-581787335
 
 
   @prashantwason  this is a great contribution for anyone debugging hudi 
writing... Can you post some screenshots for how upsert/bulk_insert dags now 
show up on the UI? 
   
   also @n3nash if you want to review this, feel free to grab this from me 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services