[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593252380
 
 
   hi @bhasudha, addressed and add `--packages` to READ.ME doc.   Thanks  
   ```
   spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
 --packages org.apache.spark:spark-avro_2.11:2.4.4 \
 --jars `ls 
packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-03-01 Thread GitBox
codecov-io commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as 
it missing content form included 3rd party ALv2 licensed NOTICE files
URL: https://github.com/apache/incubator-hudi/pull/1354#issuecomment-593259405
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1354?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@4e7fcde`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1354/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1354?src=pr=tree)
   
   ```diff
   @@   Coverage Diff@@
   ## master   #1354   +/-   ##
   
 Coverage  ?   0.64%   
 Complexity?   2   
   
 Files ? 287   
 Lines ?   14319   
 Branches  ?1465   
   
 Hits  ?  92   
 Misses?   14224   
 Partials  ?   3
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1354?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1354?src=pr=footer).
 Last update 
[4e7fcde...69278ee](https://codecov.io/gh/apache/incubator-hudi/pull/1354?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
codecov-io edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593244136
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@078d482`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1290/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1290   +/-   ##
   =
 Coverage  ?   67.09%   
 Complexity?  223   
   =
 Files ?  333   
 Lines ?16216   
 Branches  ? 1659   
   =
 Hits  ?10880   
 Misses? 4598   
 Partials  ?  738
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=footer).
 Last update 
[078d482...7be999a](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1341: [HUDI-626] Add exportToTable option to CLI

2020-03-01 Thread GitBox
codecov-io edited a comment on issue #1341: [HUDI-626] Add exportToTable option 
to CLI
URL: https://github.com/apache/incubator-hudi/pull/1341#issuecomment-593252610
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=h1) 
Report
   > Merging 
[#1341](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/acf359c834bc1d9b9c4ea64d362ea20c7410c70a?src=pr=desc)
 will **decrease** coverage by `<.01%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1341/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1341  +/-   ##
   
   - Coverage 67.09%   67.09%   -0.01% 
 Complexity  223  223  
   
 Files   333  333  
 Lines 1620716216   +9 
 Branches   1657 1659   +2 
   
   + Hits  1087410880   +6 
   - Misses 4597 4598   +1 
   - Partials736  738   +2
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/DataSourceUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9EYXRhU291cmNlVXRpbHMuamF2YQ==)
 | `50.56% <0%> (+3.06%)` | `0% <0%> (ø)` | :arrow_down: |
   | 
[...a/org/apache/hudi/common/util/collection/Pair.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9QYWlyLmphdmE=)
 | `76% <0%> (+4%)` | `0% <0%> (ø)` | :arrow_down: |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=footer).
 Last update 
[acf359c...41a1ea8](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1341: [HUDI-626] Add exportToTable option to CLI

2020-03-01 Thread GitBox
codecov-io commented on issue #1341: [HUDI-626] Add exportToTable option to CLI
URL: https://github.com/apache/incubator-hudi/pull/1341#issuecomment-593252610
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=h1) 
Report
   > Merging 
[#1341](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/acf359c834bc1d9b9c4ea64d362ea20c7410c70a?src=pr=desc)
 will **decrease** coverage by `66.45%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1341/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1341   +/-   ##
   
   - Coverage 67.09%   0.64%   -66.46% 
   + Complexity  223   2  -221 
   
 Files   333 287   -46 
 Lines 16207   14319 -1888 
 Branches   16571465  -192 
   
   - Hits  10874  92-10782 
   - Misses 4597   14224 +9627 
   + Partials736   3  -733
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...apache/hudi/common/model/HoodieDeltaWriteStat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZURlbHRhV3JpdGVTdGF0LmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...org/apache/hudi/common/model/HoodieFileFormat.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVGb3JtYXQuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...g/apache/hudi/execution/BulkInsertMapFunction.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhlY3V0aW9uL0J1bGtJbnNlcnRNYXBGdW5jdGlvbi5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/IteratorBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvSXRlcmF0b3JCYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...rg/apache/hudi/index/bloom/KeyRangeLookupTree.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vS2V5UmFuZ2VMb29rdXBUcmVlLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...e/hudi/common/table/timeline/dto/FileGroupDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9GaWxlR3JvdXBEVE8uamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...apache/hudi/timeline/service/handlers/Handler.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvSGFuZGxlci5qYXZh)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[.../common/util/queue/FunctionBasedQueueProducer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvRnVuY3Rpb25CYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | 
[...che/hudi/index/bloom/ListBasedIndexFileFilter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW5kZXgvYmxvb20vTGlzdEJhc2VkSW5kZXhGaWxlRmlsdGVyLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | ... and [286 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1341/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1341?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
lamber-ken edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593252380
 
 
   hi @bhasudha, addressed and add `--packages` to READ.ME doc
   ```
   spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
 --packages org.apache.spark:spark-avro_2.11:2.4.4 \
 --jars `ls 
packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593252380
 
 
   hi @bhasudha, addressed and add `--packages` to READ.ME doc
   ```
   export SPARK_HOME=/work/BigData/install/spark/spark-2.4.4-bin-hadoop2.7
   ${SPARK_HOME}/bin/spark-shell \
 --packages org.apache.spark:spark-avro_2.11:2.4.4 \
 --jars `ls 
packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` \
 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-03-01 Thread GitBox
vinothchandar commented on a change in pull request #1149: [WIP] [HUDI-472] 
Introduce configurations and new modes of sorting for bulk_insert
URL: https://github.com/apache/incubator-hudi/pull/1149#discussion_r386218656
 
 

 ##
 File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java
 ##
 @@ -381,20 +384,30 @@ public static SparkConf registerClasses(SparkConf conf) {
 }
   }
 
+  private BulkInsertMapFunction getBulkInsertMapFunction(
 
 Review comment:
   Lets do whats easier and gets us moving faster .. :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1341: [HUDI-626] Add exportToTable option to CLI

2020-03-01 Thread GitBox
satishkotha commented on a change in pull request #1341: [HUDI-626] Add 
exportToTable option to CLI
URL: https://github.com/apache/incubator-hudi/pull/1341#discussion_r386218339
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/utils/TempViewProvider.java
 ##
 @@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.utils;
+
+import java.util.List;
+
+public interface TempViewProvider {
+  void write(String tableName, List headers, List> 
rows);
 
 Review comment:
   this is slightly difficult to do because schema is required for both create 
and write. I just renamed write to createAndWrite to make this easy to use.  Do 
you have strong preference for having separate calls? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on a change in pull request #1341: [HUDI-626] Add exportToTable option to CLI

2020-03-01 Thread GitBox
satishkotha commented on a change in pull request #1341: [HUDI-626] Add 
exportToTable option to CLI
URL: https://github.com/apache/incubator-hudi/pull/1341#discussion_r386218364
 
 

 ##
 File path: 
hudi-cli/src/main/java/org/apache/hudi/cli/commands/TempViewCommand.java
 ##
 @@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.cli.commands;
+
+import org.apache.hudi.cli.HoodieCLI;
+
+import org.springframework.shell.core.CommandMarker;
+import org.springframework.shell.core.annotation.CliCommand;
+import org.springframework.shell.core.annotation.CliOption;
+import org.springframework.stereotype.Component;
+
+import java.io.IOException;
+
+/**
+ * CLI command to query/delete temp views.
+ */
+@Component
+public class TempViewCommand implements CommandMarker {
+
+  @CliCommand(value = "temp_query", help = "query against created temp view")
+  public String query(
+  @CliOption(key = {"sql"}, mandatory = true, help = "select query to 
run against view") final String sql)
+  throws IOException {
+
+HoodieCLI.getTempViewProvider().runQuery(sql);
+return "";
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1341: [HUDI-626] Add exportToTable option to CLI

2020-03-01 Thread GitBox
satishkotha commented on issue #1341: [HUDI-626] Add exportToTable option to CLI
URL: https://github.com/apache/incubator-hudi/pull/1341#issuecomment-593245486
 
 
   @n3nash Please take a look


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on issue #1354: [WIP][HUDI-581] NOTICE need more work as it missing content form included 3rd party ALv2 licensed NOTICE files

2020-03-01 Thread GitBox
vinothchandar commented on issue #1354: [WIP][HUDI-581] NOTICE need more work 
as it missing content form included 3rd party ALv2 licensed NOTICE files
URL: https://github.com/apache/incubator-hudi/pull/1354#issuecomment-593245510
 
 
   @lresende we will try to incorporate that.. 
   
   Would you be able to make a pass at the poms/jars to flag additional issues, 
we have? Our undestanding is that, as long as we list all the 3rd party (non 
ASF projects) software exhaustively from our fat jars and source dependencies, 
we should be good?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] satishkotha commented on issue #1355: [HUDI-633] limit archive file block size by number of bytes

2020-03-01 Thread GitBox
satishkotha commented on issue #1355: [HUDI-633] limit archive file block size 
by number of bytes
URL: https://github.com/apache/incubator-hudi/pull/1355#issuecomment-593244943
 
 
   @n3nash please take a look and let me know if you think this change is not 
needed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
codecov-io edited a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593244136
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@078d482`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1290/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=tree)
   
   ```diff
   @@Coverage Diff@@
   ## master#1290   +/-   ##
   =
 Coverage  ?   67.09%   
 Complexity?  223   
   =
 Files ?  333   
 Lines ?16216   
 Branches  ? 1659   
   =
 Hits  ?10880   
 Misses? 4598   
 Partials  ?  738
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=footer).
 Last update 
[078d482...ac87b91](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
codecov-io commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593244136
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=h1) 
Report
   > :exclamation: No coverage uploaded for pull request base 
(`master@078d482`). [Click here to learn what that 
means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1290/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=tree)
   
   ```diff
   @@   Coverage Diff@@
   ## master   #1290   +/-   ##
   
 Coverage  ?   0.64%   
 Complexity?   2   
   
 Files ? 287   
 Lines ?   14319   
 Branches  ?1465   
   
 Hits  ?  92   
 Misses?   14224   
 Partials  ?   3
   ```
   
   
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=footer).
 Last update 
[078d482...ac87b91](https://codecov.io/gh/apache/incubator-hudi/pull/1290?src=pr=lastupdated).
 Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken removed a comment on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
lamber-ken removed a comment on issue #1290: [HUDI-584] Relocate spark-avro 
dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593237472
 
 
   hi @bhasudha, all review comments are addressed and fixed. Thanks
   
   
![image](https://user-images.githubusercontent.com/20113411/75649979-9fd77e00-5c8f-11ea-907d-51774493c050.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HUDI-597) Enable incremental pulling from defined partitions

2020-03-01 Thread Yanjia Gary Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-597:

Description: 
For the use case that I only need to pull the incremental part of certain 
partitions, I need to do the incremental pulling from the entire dataset first 
then filtering in Spark.

If we can use the folder partitions directly as part of the input path, it 
could run faster by only load relevant parquet files.

Example:

 
{code:java}
spark.read.format("org.apache.hudi")
.option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
.option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "/year=2016/*/*/*")
.load(path)
 
{code}
 

  was:
For the use case that I only need to pull the incremental part of certain 
partitions, I need to do the incremental pulling from the entire dataset first 
then filtering in Spark.

If we can use the folder partitions directly as part of the input path, it 
could run faster by only load relevant parquet files.

Example:

 
{code:java}
spark.read.format("org.apache.hudi")
.option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
.load(path, "year=2020/*/*/*")
 
{code}
 


> Enable incremental pulling from defined partitions
> --
>
> Key: HUDI-597
> URL: https://issues.apache.org/jira/browse/HUDI-597
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the use case that I only need to pull the incremental part of certain 
> partitions, I need to do the incremental pulling from the entire dataset 
> first then filtering in Spark.
> If we can use the folder partitions directly as part of the input path, it 
> could run faster by only load relevant parquet files.
> Example:
>  
> {code:java}
> spark.read.format("org.apache.hudi")
> .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
> .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
> .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY, "/year=2016/*/*/*")
> .load(path)
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
lamber-ken commented on issue #1290: [HUDI-584] Relocate spark-avro dependency 
by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#issuecomment-593237472
 
 
   hi @bhasudha, all review comments are addressed and fixed. Thanks
   
   
![image](https://user-images.githubusercontent.com/20113411/75649979-9fd77e00-5c8f-11ea-907d-51774493c050.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
lamber-ken commented on a change in pull request #1290: [HUDI-584] Relocate 
spark-avro dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#discussion_r386211419
 
 

 ##
 File path: packaging/hudi-spark-bundle/pom.xml
 ##
 @@ -248,6 +260,13 @@
 
org.apache.hudi.
   
 
+
+  spark-shade-unbundle-avro
+  
+provided
+/
 
 Review comment:
   Good catch 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
bhasudha commented on a change in pull request #1290: [HUDI-584] Relocate 
spark-avro dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#discussion_r386190496
 
 

 ##
 File path: packaging/hudi-spark-bundle/pom.xml
 ##
 @@ -248,6 +260,13 @@
 
org.apache.hudi.
   
 
+
+  spark-shade-unbundle-avro
+  
+provided
+/
 
 Review comment:
   @lamber-ken I see a '/' here for the property 
"spark.bundle.spark.shade.prefix" if the profile is activated. Is this 
intended? Does this mean shade with empty prefix (same as not shading)? I dont 
know much about profiling either so wanted to clarify with you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] bhasudha commented on a change in pull request #1290: [HUDI-584] Relocate spark-avro dependency by maven-shade-plugin

2020-03-01 Thread GitBox
bhasudha commented on a change in pull request #1290: [HUDI-584] Relocate 
spark-avro dependency by maven-shade-plugin
URL: https://github.com/apache/incubator-hudi/pull/1290#discussion_r386190041
 
 

 ##
 File path: README.md
 ##
 @@ -71,6 +71,14 @@ The default Scala version supported is 2.11. To build for 
Scala 2.12 version, bu
 mvn clean package -DskipTests -DskipITs -Dscala-2.12
 ```
 
+### Build without spark-avro module
+
+The default hudi-jar bundles spark-avro module. To build without spark-avro 
module, build using `spark-shade-unbundle-avro` profile
 
 Review comment:
   can we also add the `--package`  instruction here to explicitly say how to 
add spark-avro if building without it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Build failed in Jenkins: hudi-snapshot-deployment-0.5 #204

2020-03-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2.40 KB...]
/home/jenkins/tools/maven/apache-maven-3.5.4/conf:
logging
settings.xml
toolchains.xml

/home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging:
simplelogger.properties

/home/jenkins/tools/maven/apache-maven-3.5.4/lib:
aopalliance-1.0.jar
cdi-api-1.0.jar
cdi-api.license
commons-cli-1.4.jar
commons-cli.license
commons-io-2.5.jar
commons-io.license
commons-lang3-3.5.jar
commons-lang3.license
ext
guava-20.0.jar
guice-4.2.0-no_aop.jar
jansi-1.17.1.jar
jansi-native
javax.inject-1.jar
jcl-over-slf4j-1.7.25.jar
jcl-over-slf4j.license
jsr250-api-1.0.jar
jsr250-api.license
maven-artifact-3.5.4.jar
maven-artifact.license
maven-builder-support-3.5.4.jar
maven-builder-support.license
maven-compat-3.5.4.jar
maven-compat.license
maven-core-3.5.4.jar
maven-core.license
maven-embedder-3.5.4.jar
maven-embedder.license
maven-model-3.5.4.jar
maven-model-builder-3.5.4.jar
maven-model-builder.license
maven-model.license
maven-plugin-api-3.5.4.jar
maven-plugin-api.license
maven-repository-metadata-3.5.4.jar
maven-repository-metadata.license
maven-resolver-api-1.1.1.jar
maven-resolver-api.license
maven-resolver-connector-basic-1.1.1.jar
maven-resolver-connector-basic.license
maven-resolver-impl-1.1.1.jar
maven-resolver-impl.license
maven-resolver-provider-3.5.4.jar
maven-resolver-provider.license
maven-resolver-spi-1.1.1.jar
maven-resolver-spi.license
maven-resolver-transport-wagon-1.1.1.jar
maven-resolver-transport-wagon.license
maven-resolver-util-1.1.1.jar
maven-resolver-util.license
maven-settings-3.5.4.jar
maven-settings-builder-3.5.4.jar
maven-settings-builder.license
maven-settings.license
maven-shared-utils-3.2.1.jar
maven-shared-utils.license
maven-slf4j-provider-3.5.4.jar
maven-slf4j-provider.license
org.eclipse.sisu.inject-0.3.3.jar
org.eclipse.sisu.inject.license
org.eclipse.sisu.plexus-0.3.3.jar
org.eclipse.sisu.plexus.license
plexus-cipher-1.7.jar
plexus-cipher.license
plexus-component-annotations-1.7.1.jar
plexus-component-annotations.license
plexus-interpolation-1.24.jar
plexus-interpolation.license
plexus-sec-dispatcher-1.4.jar
plexus-sec-dispatcher.license
plexus-utils-3.1.0.jar
plexus-utils.license
slf4j-api-1.7.25.jar
slf4j-api.license
wagon-file-3.1.0.jar
wagon-file.license
wagon-http-3.1.0-shaded.jar
wagon-http.license
wagon-provider-api-3.1.0.jar
wagon-provider-api.license

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/ext:
README.txt

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native:
freebsd32
freebsd64
linux32
linux64
osx
README.txt
windows32
windows64

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/freebsd64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux32:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/linux64:
libjansi.so

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/osx:
libjansi.jnilib

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows32:
jansi.dll

/home/jenkins/tools/maven/apache-maven-3.5.4/lib/jansi-native/windows64:
jansi.dll
Finished /home/jenkins/tools/maven/apache-maven-3.5.4 Directory Listing :
Detected current version as: 
'HUDI_home=
0.6.0-SNAPSHOT'
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-spark_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-timeline-service:jar:0.6.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but found 
duplicate declaration of plugin org.jacoco:jacoco-maven-plugin @ 
org.apache.hudi:hudi-timeline-service:[unknown-version], 

 line 58, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-utilities_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 
org.apache.hudi:hudi-utilities_${scala.binary.version}:[unknown-version], 

 line 26, column 15
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for 
org.apache.hudi:hudi-spark-bundle_2.11:jar:0.6.0-SNAPSHOT
[WARNING] 'artifactId' contains an expression but should be a constant. @ 

[jira] [Updated] (HUDI-344) Hudi Dataset Snapshot Exporter

2020-03-01 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-344:

Status: In Progress  (was: Open)

> Hudi Dataset Snapshot Exporter
> --
>
> Key: HUDI-344
> URL: https://issues.apache.org/jira/browse/HUDI-344
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>  Components: Utilities
>Reporter: Raymond Xu
>Priority: Major
>  Labels: features, pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A dataset exporter tool for snapshotting. See 
> [RFC-9|https://cwiki.apache.org/confluence/display/HUDI/RFC-9%3A+Hudi+Dataset+Snapshot+Exporter]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding InlineFileSystem to support embedding any file format as an InlineFile

2020-03-01 Thread GitBox
vinothchandar commented on a change in pull request #1176: [HUDI-430] Adding 
InlineFileSystem to support embedding any file format as an InlineFile
URL: https://github.com/apache/incubator-hudi/pull/1176#discussion_r386132436
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/inline/fs/InMemoryFileSystem.java
 ##
 @@ -0,0 +1,120 @@
+package org.apache.hudi.utilities.inline.fs;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.util.Progressable;
+
+import java.io.ByteArrayOutputStream;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+
+/**
+ * A FileSystem which stores all content in memory and returns a byte[] when 
{@link #getFileAsBytes()} is called
+ * This FileSystem is used only in write path. Does not support any read apis 
except {@link #getFileAsBytes()}.
+ */
+public class InMemoryFileSystem extends FileSystem {
 
 Review comment:
   can we move all of this code to `hudi-common` 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-01 Thread GitBox
vinothchandar commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r386132370
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
 
 Review comment:
   this is probably the biggest item we need to decide on? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-01 Thread GitBox
vinothchandar commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r383479608
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
 
 Review comment:
   This ties back to how we let the users run the examples. Another way is to 
not have a fat jar here, but just have a `run_hudi_example.sh` script just use 
the spark-bundle/utilities-bundle after hudi is built.. 
   
   This way, we don't have to also maintain this bundle separately.. Users will 
be using the bundles under `packaging` in production anyway. So just reuse them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1151: [HUDI-476] Add hudi-examples module

2020-03-01 Thread GitBox
vinothchandar commented on a change in pull request #1151: [HUDI-476] Add 
hudi-examples module
URL: https://github.com/apache/incubator-hudi/pull/1151#discussion_r383479608
 
 

 ##
 File path: hudi-examples/pom.xml
 ##
 @@ -0,0 +1,206 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  
+hudi
+org.apache.hudi
+0.5.2-SNAPSHOT
+  
+  4.0.0
+
+  hudi-examples
+  jar
+
+  
+${project.parent.basedir}
+  
+
+  
+
+  
+src/main/resources
+  
+
+
+
+  
+org.apache.maven.plugins
 
 Review comment:
   This ties back to how we let the users run the examples. Another way is to 
not have a fat jar here, but just have a `run_hudi_example.sh` script just use 
the spark-bundle/utilities-bundle after hudi is build.. 
   
   This way, we don't have to also maintain this bundle separately.. Users will 
be using the bundles under `packaging` in production anyway. So just reuse them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[incubator-hudi] branch master updated: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns (#1330)

2020-03-01 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2d04014  [HUDI-607] Fix to allow creation/syncing of Hive tables 
partitioned by Date type columns (#1330)
2d04014 is described below

commit 2d040145810b8b14c59c5882f9115698351039d1
Author: Udit Mehrotra 
AuthorDate: Sun Mar 1 10:42:58 2020 -0800

[HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date 
type columns (#1330)
---
 .../main/java/org/apache/hudi/DataSourceUtils.java | 40 +-
 hudi-spark/src/test/java/DataSourceUtilsTest.java  | 61 ++
 2 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java 
b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
index 1158fa2..a2dfe02 100644
--- a/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
+++ b/hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
@@ -18,6 +18,8 @@
 
 package org.apache.hudi;
 
+import org.apache.avro.LogicalTypes;
+import org.apache.avro.Schema;
 import org.apache.hudi.client.HoodieReadClient;
 import org.apache.hudi.client.HoodieWriteClient;
 import org.apache.hudi.client.WriteStatus;
@@ -45,6 +47,7 @@ import org.apache.spark.api.java.JavaRDD;
 import org.apache.spark.api.java.JavaSparkContext;
 
 import java.io.IOException;
+import java.time.LocalDate;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
@@ -80,7 +83,8 @@ public class DataSourceUtils {
 
   // return, if last part of name
   if (i == parts.length - 1) {
-return val;
+Schema fieldSchema = valueNode.getSchema().getField(part).schema();
+return convertValueForSpecificDataTypes(fieldSchema, val);
   } else {
 // VC: Need a test here
 if (!(val instanceof GenericRecord)) {
@@ -100,6 +104,40 @@ public class DataSourceUtils {
   }
 
   /**
+   * This method converts values for fields with certain Avro/Parquet data 
types that require special handling.
+   *
+   * Logical Date Type is converted to actual Date value instead of Epoch 
Integer which is how it is
+   * represented/stored in parquet.
+   *
+   * @param fieldSchema avro field schema
+   * @param fieldValue avro field value
+   * @return field value either converted (for certain data types) or as it is.
+   */
+  private static Object convertValueForSpecificDataTypes(Schema fieldSchema, 
Object fieldValue) {
+if (fieldSchema == null) {
+  return fieldValue;
+}
+
+if (isLogicalTypeDate(fieldSchema)) {
+  return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
+}
+return fieldValue;
+  }
+
+  /**
+   * Given an Avro field schema checks whether the field is of Logical Date 
Type or not.
+   *
+   * @param fieldSchema avro field schema
+   * @return boolean indicating whether fieldSchema is of Avro's Date Logical 
Type
+   */
+  private static boolean isLogicalTypeDate(Schema fieldSchema) {
+if (fieldSchema.getType() == Schema.Type.UNION) {
+  return fieldSchema.getTypes().stream().anyMatch(schema -> 
schema.getLogicalType() == LogicalTypes.date());
+}
+return fieldSchema.getLogicalType() == LogicalTypes.date();
+  }
+
+  /**
* Create a key generator class via reflection, passing in any configs 
needed.
* 
* If the class name of key generator is configured through the properties 
file, i.e., {@code props}, use the
diff --git a/hudi-spark/src/test/java/DataSourceUtilsTest.java 
b/hudi-spark/src/test/java/DataSourceUtilsTest.java
new file mode 100644
index 000..4fe7547
--- /dev/null
+++ b/hudi-spark/src/test/java/DataSourceUtilsTest.java
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hudi.DataSourceUtils;
+import org.junit.Test;
+
+import java.time.LocalDate;
+
+import static org.junit.Assert.assertEquals;
+
+public class DataSourceUtilsTest 

[GitHub] [incubator-hudi] vinothchandar merged pull request #1330: [HUDI-607] Fix to allow creation/syncing of Hive tables partitioned by Date type columns

2020-03-01 Thread GitBox
vinothchandar merged pull request #1330: [HUDI-607] Fix to allow 
creation/syncing of Hive tables partitioned by Date type columns
URL: https://github.com/apache/incubator-hudi/pull/1330
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Closed] (HUDI-627) Publish coverage to codecov.io

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-627.
--

Fixed via master: acf359c834bc1d9b9c4ea64d362ea20c7410c70a

> Publish coverage to codecov.io
> --
>
> Key: HUDI-627
> URL: https://issues.apache.org/jira/browse/HUDI-627
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Publish the coverage to codecov.io on every build
>  * Fix code coverage to pickup cross module testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-627) Publish coverage to codecov.io

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-627:
---
Fix Version/s: 0.5.2

> Publish coverage to codecov.io
> --
>
> Key: HUDI-627
> URL: https://issues.apache.org/jira/browse/HUDI-627
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Publish the coverage to codecov.io on every build
>  * Fix code coverage to pickup cross module testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-554) Restructure code/packages to move more code back into hudi-writer-common

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-554.

Resolution: Fixed

Fixed via master: 71170fafe77e11ea1a458a38e3395a471d94a047

> Restructure code/packages  to move more code back into hudi-writer-common
> -
>
> Key: HUDI-554
> URL: https://issues.apache.org/jira/browse/HUDI-554
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-618) Improve unit test coverage for org.apache.hudi.common.table.view. PriorityBasedFileSystemView

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-618:
---
Fix Version/s: 0.5.2

> Improve unit test coverage for org.apache.hudi.common.table.view. 
> PriorityBasedFileSystemView
> -
>
> Key: HUDI-618
> URL: https://issues.apache.org/jira/browse/HUDI-618
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit tests for all methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-618) Improve unit test coverage for org.apache.hudi.common.table.view. PriorityBasedFileSystemView

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-618.
--

> Improve unit test coverage for org.apache.hudi.common.table.view. 
> PriorityBasedFileSystemView
> -
>
> Key: HUDI-618
> URL: https://issues.apache.org/jira/browse/HUDI-618
> Project: Apache Hudi (incubating)
>  Issue Type: Sub-task
>Reporter: Ramachandran M S
>Assignee: Ramachandran M S
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add unit tests for all methods



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-636) Fix could not get sources warnings while compiling

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf resolved HUDI-636.

Fix Version/s: 0.5.2
   Resolution: Fixed

Fixed via master: cacd9a33222d28c905891362312545230b6d30b9

> Fix could not get sources warnings while compiling 
> ---
>
> Key: HUDI-636
> URL: https://issues.apache.org/jira/browse/HUDI-636
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During the voting process on rc1 0.5.1-incubating release, Justin pointed out 
> that mvn log display could not get sources warnings
>  
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> {code:java}
> [INFO] --- maven-shade-plugin:3.1.1:shade (default) @ hudi-hadoop-mr-bundle 
> ---
> [INFO] Including org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT in the shaded 
> jar.
> Downloading from aliyun: 
> http://maven.aliyun.com/nexus/content/groups/public/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from cloudera: 
> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from confluent: 
> https://packages.confluent.io/maven/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-milestone: 
> https://repo.spring.io/libs-milestone/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-release: 
> https://repo.spring.io/libs-release/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from apache.snapshots: 
> https://repository.apache.org/snapshots/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> [WARNING] Could not get sources for 
> org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT:compile
> [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7 
> from the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1 from 
> the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.6.7 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:fluent-hc:jar:4.3.2 from the 
> shaded jar.
> [INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded 
> jar.
> [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.3.6 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.3.2 from the shaded 
> jar.
> [INFO] Excluding commons-codec:commons-codec:jar:1.6 from the shaded jar.
> [INFO] Excluding org.rocksdb:rocksdbjni:jar:5.17.2 from the shaded jar.
> [INFO] Including com.esotericsoftware:kryo-shaded:jar:4.0.2 in the shaded jar.
> [INFO] Including com.esotericsoftware:minlog:jar:1.3.0 in the shaded jar.
> [INFO] Including org.objenesis:objenesis:jar:2.5.1 in the shaded jar.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-636) Fix could not get sources warnings while compiling

2020-03-01 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-636.
--

> Fix could not get sources warnings while compiling 
> ---
>
> Key: HUDI-636
> URL: https://issues.apache.org/jira/browse/HUDI-636
> Project: Apache Hudi (incubating)
>  Issue Type: Improvement
>Reporter: lamber-ken
>Assignee: lamber-ken
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During the voting process on rc1 0.5.1-incubating release, Justin pointed out 
> that mvn log display could not get sources warnings
>  
> [https://lists.apache.org/thread.html/rd3f4a72d82a4a5a81b2c6bd71e1417054daa38637ce8e07901f26f04%40%3Cgeneral.incubator.apache.org%3E]
>  
> {code:java}
> [INFO] --- maven-shade-plugin:3.1.1:shade (default) @ hudi-hadoop-mr-bundle 
> ---
> [INFO] Including org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT in the shaded 
> jar.
> Downloading from aliyun: 
> http://maven.aliyun.com/nexus/content/groups/public/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from cloudera: 
> https://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from confluent: 
> https://packages.confluent.io/maven/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-milestone: 
> https://repo.spring.io/libs-milestone/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from libs-release: 
> https://repo.spring.io/libs-release/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> Downloading from apache.snapshots: 
> https://repository.apache.org/snapshots/org/apache/hudi/hudi-common/0.5.2-SNAPSHOT/hudi-common-0.5.2-SNAPSHOT-sources.jar
> [WARNING] Could not get sources for 
> org.apache.hudi:hudi-common:jar:0.5.2-SNAPSHOT:compile
> [INFO] Excluding com.fasterxml.jackson.core:jackson-annotations:jar:2.6.7 
> from the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-databind:jar:2.6.7.1 from 
> the shaded jar.
> [INFO] Excluding com.fasterxml.jackson.core:jackson-core:jar:2.6.7 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:fluent-hc:jar:4.3.2 from the 
> shaded jar.
> [INFO] Excluding commons-logging:commons-logging:jar:1.1.3 from the shaded 
> jar.
> [INFO] Excluding org.apache.httpcomponents:httpclient:jar:4.3.6 from the 
> shaded jar.
> [INFO] Excluding org.apache.httpcomponents:httpcore:jar:4.3.2 from the shaded 
> jar.
> [INFO] Excluding commons-codec:commons-codec:jar:1.6 from the shaded jar.
> [INFO] Excluding org.rocksdb:rocksdbjni:jar:5.17.2 from the shaded jar.
> [INFO] Including com.esotericsoftware:kryo-shaded:jar:4.0.2 in the shaded jar.
> [INFO] Including com.esotericsoftware:minlog:jar:1.3.0 in the shaded jar.
> [INFO] Including org.objenesis:objenesis:jar:2.5.1 in the shaded jar.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-597) Enable incremental pulling from defined partitions

2020-03-01 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048544#comment-17048544
 ] 

leesf commented on HUDI-597:


[~garyli1019] I think we could update the DOC after cutting the 0.5.1 docs and 
merge it to 0.5.2 docs, FYI: [~bhasudha]

> Enable incremental pulling from defined partitions
> --
>
> Key: HUDI-597
> URL: https://issues.apache.org/jira/browse/HUDI-597
> Project: Apache Hudi (incubating)
>  Issue Type: New Feature
>Reporter: Yanjia Gary Li
>Assignee: Yanjia Gary Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.2
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the use case that I only need to pull the incremental part of certain 
> partitions, I need to do the incremental pulling from the entire dataset 
> first then filtering in Spark.
> If we can use the folder partitions directly as part of the input path, it 
> could run faster by only load relevant parquet files.
> Example:
>  
> {code:java}
> spark.read.format("org.apache.hudi")
> .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY,DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL)
> .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY, "000")
> .load(path, "year=2020/*/*/*")
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-hudi] XuQianJin-Stars commented on issue #1106: [HUDI-209] Implement JMX metrics reporter

2020-03-01 Thread GitBox
XuQianJin-Stars commented on issue #1106: [HUDI-209] Implement JMX metrics 
reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#issuecomment-593086199
 
 
   hi @vinothchandar @leesf Thanks, I have addressed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386096648
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.HoodieTestDataGenerator;
+import org.apache.hudi.common.model.HoodieTestUtils;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+public class TestHoodieSnapshotExporter {
+  private static String TEST_WRITE_TOKEN = "1-0-1";
+
+  private SparkSession spark = null;
+  private HoodieTestDataGenerator dataGen = null;
+  private String basePath = null;
+  private String outputPath = null;
+  private String rootPath = null;
+  private FileSystem fs = null;
+  private Map commonOpts;
+  private HoodieSnapshotExporter.Config cfg;
+  private JavaSparkContext jsc = null;
+
+  @Before
+  public void initialize() throws IOException {
+spark = SparkSession.builder()
+.appName("Hoodie Datasource test")
+.master("local[2]")
+.config("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer")
+.getOrCreate();
+jsc = new JavaSparkContext(spark.sparkContext());
+dataGen = new HoodieTestDataGenerator();
+TemporaryFolder folder = new TemporaryFolder();
+folder.create();
+basePath = folder.getRoot().getAbsolutePath();
+fs = FSUtils.getFs(basePath, spark.sparkContext().hadoopConfiguration());
+commonOpts = new HashMap();
+
+commonOpts.put("hoodie.insert.shuffle.parallelism", "4");
+commonOpts.put("hoodie.upsert.shuffle.parallelism", "4");
+commonOpts.put(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), 
"_row_key");
+commonOpts.put(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), 
"partition");
+commonOpts.put(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), 
"timestamp");
+commonOpts.put(HoodieWriteConfig.TABLE_NAME, "hoodie_test");
+
+
+cfg = new HoodieSnapshotExporter.Config();
+
+cfg.sourceBasePath = basePath;
+cfg.targetOutputPath = outputPath = basePath + "/target";
+cfg.outputFormat = "json";
+cfg.outputPartitionField = "partition";
+
+  }
+
+  @After
+  public void cleanup() throws Exception {
+if (spark != null) {
+  spark.stop();
+}
+  }
+
+  @Test
+  public void testSnapshotExporter() throws IOException {
+// Insert Operation
+List records = 
DataSourceTestUtils.convertToStringList(dataGen.generateInserts("000", 100));
+Dataset inputDF = spark.read().json(new 
JavaSparkContext(spark.sparkContext()).parallelize(records, 2));
+inputDF.write().format("hudi")
+.options(commonOpts)
+.option(DataSourceWriteOptions.OPERATION_OPT_KEY(), 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL())
+.mode(SaveMode.Overwrite)
+.save(basePath);
+long sourceCount = inputDF.count();
+
+HoodieSnapshotExporter hoodieSnapshotExporter = new 
HoodieSnapshotExporter();
+hoodieSnapshotExporter.export(spark, cfg);
+
+long targetCount = spark.read().json(outputPath).count();
+
+assertTrue(sourceCount == targetCount);
+
+// Test snapshotPrefix
+long filterCount = inputDF.where("partition == '2015/03/16'").count();
+cfg.snapshotPrefix = 

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386095927
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.DataSourceWriteOptions;
+import org.apache.hudi.common.HoodieTestDataGenerator;
+import org.apache.hudi.common.model.HoodieTestUtils;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+public class TestHoodieSnapshotExporter {
 
 Review comment:
   Any reason of not extending `HoodieCommonTestHarness` ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386095272
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
+String snapshotPrefix;
+
+@Parameter(names = {"--output-format"}, description = "e.g. Hudi or 
Parquet", required = true)
+String outputFormat;
+
+@Parameter(names = {"--output-partition-field"}, description = "A field to 
be used by Spark repartitioning")
+String outputPartitionField;
+  }
+
+  public void export(SparkSession spark, Config cfg) throws IOException {
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+FileSystem fs = FSUtils.getFs(cfg.sourceBasePath, 
jsc.hadoopConfiguration());
+
+final SerializableConfiguration serConf = new 
SerializableConfiguration(jsc.hadoopConfiguration());
+final HoodieTableMetaClient tableMetadata = new 
HoodieTableMetaClient(fs.getConf(), cfg.sourceBasePath);
+final TableFileSystemView.BaseFileOnlyView fsView = new 
HoodieTableFileSystemView(tableMetadata,
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants());
+// Get the latest commit
+Option latestCommit =
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+if (!latestCommit.isPresent()) {
+  LOG.warn("No commits present. Nothing to snapshot");
+  return;
+}
+final String latestCommitTimestamp = latestCommit.get().getTimestamp();
+LOG.info(String.format("Starting to snapshot latest version files which 
are also no-late-than %s.",
+latestCommitTimestamp));
+
+List partitions = FSUtils.getAllPartitionPaths(fs, 
cfg.sourceBasePath, false);
+if (partitions.size() > 0) {
+  List dataFiles = new ArrayList<>();
+
+  if 

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386094957
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
+String snapshotPrefix;
+
+@Parameter(names = {"--output-format"}, description = "e.g. Hudi or 
Parquet", required = true)
+String outputFormat;
+
+@Parameter(names = {"--output-partition-field"}, description = "A field to 
be used by Spark repartitioning")
+String outputPartitionField;
+  }
+
+  public void export(SparkSession spark, Config cfg) throws IOException {
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+FileSystem fs = FSUtils.getFs(cfg.sourceBasePath, 
jsc.hadoopConfiguration());
+
+final SerializableConfiguration serConf = new 
SerializableConfiguration(jsc.hadoopConfiguration());
+final HoodieTableMetaClient tableMetadata = new 
HoodieTableMetaClient(fs.getConf(), cfg.sourceBasePath);
+final TableFileSystemView.BaseFileOnlyView fsView = new 
HoodieTableFileSystemView(tableMetadata,
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants());
+// Get the latest commit
+Option latestCommit =
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+if (!latestCommit.isPresent()) {
+  LOG.warn("No commits present. Nothing to snapshot");
+  return;
+}
+final String latestCommitTimestamp = latestCommit.get().getTimestamp();
+LOG.info(String.format("Starting to snapshot latest version files which 
are also no-late-than %s.",
+latestCommitTimestamp));
+
+List partitions = FSUtils.getAllPartitionPaths(fs, 
cfg.sourceBasePath, false);
+if (partitions.size() > 0) {
+  List dataFiles = new ArrayList<>();
+
+  if 

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386093867
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,213 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.DataFrameWriter;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path"}, description = "Base path for 
the source Hudi dataset to be snapshotted", required = true)
+String sourceBasePath = null;
+
+@Parameter(names = {"--target-base-path"}, description = "Base path for 
the target output files (snapshots)", required = true)
+String targetOutputPath = null;
+
+@Parameter(names = {"--snapshot-prefix"}, description = "Snapshot prefix 
or directory under the target base path in order to segregate different 
snapshots")
+String snapshotPrefix;
+
+@Parameter(names = {"--output-format"}, description = "e.g. Hudi or 
Parquet", required = true)
+String outputFormat;
+
+@Parameter(names = {"--output-partition-field"}, description = "A field to 
be used by Spark repartitioning")
+String outputPartitionField;
+  }
+
+  public void export(SparkSession spark, Config cfg) throws IOException {
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+FileSystem fs = FSUtils.getFs(cfg.sourceBasePath, 
jsc.hadoopConfiguration());
+
+final SerializableConfiguration serConf = new 
SerializableConfiguration(jsc.hadoopConfiguration());
+final HoodieTableMetaClient tableMetadata = new 
HoodieTableMetaClient(fs.getConf(), cfg.sourceBasePath);
+final TableFileSystemView.BaseFileOnlyView fsView = new 
HoodieTableFileSystemView(tableMetadata,
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants());
+// Get the latest commit
+Option latestCommit =
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+if (!latestCommit.isPresent()) {
+  LOG.warn("No commits present. Nothing to snapshot");
+  return;
+}
+final String latestCommitTimestamp = latestCommit.get().getTimestamp();
+LOG.info(String.format("Starting to snapshot latest version files which 
are also no-late-than %s.",
+latestCommitTimestamp));
+
+List partitions = FSUtils.getAllPartitionPaths(fs, 
cfg.sourceBasePath, false);
+if (partitions.size() > 0) {
+  List dataFiles = new ArrayList<>();
+
+  if 

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386096182
 
 

 ##
 File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/DataSourceTestUtils.java
 ##
 @@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.TestRawTripPayload;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Test utils for data source tests.
+ */
+public class DataSourceTestUtils {
+
+  public static Option convertToString(HoodieRecord record) {
+try {
+  String str = ((TestRawTripPayload) record.getData()).getJsonData();
+  str = "{" + str.substring(str.indexOf("\"timestamp\":"));
+  // Remove the last } bracket
+  str = str.substring(0, str.length() - 1);
 
 Review comment:
   This assumes what data column exists in `TestRawTripPayload`, which may 
easily break. I would suggest 
   1. making this a private test utils to the Exporter test class as it is very 
specific to it (not generic enough to be a standalone test utils) and 
   2. try to make the source data work for the case instead of wrangling with 
json string representation


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] meijies commented on issue #143: Tracking ticket for folks to be added to slack group

2020-03-01 Thread GitBox
meijies commented on issue #143: Tracking ticket for folks to be added to slack 
group
URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-593080033
 
 
   Could you please add me to Slack :) email is meijie.w...@gmail.com? Thank you


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1106: [HUDI-209] Implement JMX metrics reporter

2020-03-01 Thread GitBox
codecov-io edited a comment on issue #1106: [HUDI-209] Implement JMX metrics 
reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#issuecomment-593074963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=h1) 
Report
   > Merging 
[#1106](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/0dc8e493aa1658910a3519df3941278d9d072c18?src=pr=desc)
 will **decrease** coverage by `0.3%`.
   > The diff coverage is `5.21%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1106/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#1106  +/-   ##
   
   - Coverage 67.08%   66.77%   -0.31% 
 Complexity  223  223  
   
 Files   333  334   +1 
 Lines 1620716291  +84 
 Branches   1657 1662   +5 
   
   + Hits  1087310879   +6 
   - Misses 4598 4676  +78 
 Partials736  736
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/hudi/metrics/MetricsReporterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXJGYWN0b3J5LmphdmE=)
 | `46.15% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...va/org/apache/hudi/metrics/MetricsJmxReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzSm14UmVwb3J0ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/hudi/metrics/MetricsGraphiteReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzR3JhcGhpdGVSZXBvcnRlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `83.84% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `100% <100%> (+25%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...va/org/apache/hudi/config/HoodieMetricsConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZU1ldHJpY3NDb25maWcuamF2YQ==)
 | `59.37% <100%> (+4.53%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `70.27% <100%> (+1.69%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | ... and [5 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1106: [HUDI-209] Implement JMX metrics reporter

2020-03-01 Thread GitBox
codecov-io edited a comment on issue #1106: [HUDI-209] Implement JMX metrics 
reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#issuecomment-593074963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=h1) 
Report
   > Merging 
[#1106](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/0dc8e493aa1658910a3519df3941278d9d072c18?src=pr=desc)
 will **decrease** coverage by `66.44%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1106/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1106   +/-   ##
   
   - Coverage 67.08%   0.63%   -66.45% 
   + Complexity  223   2  -221 
   
 Files   333 288   -45 
 Lines 16207   14394 -1813 
 Branches   16571468  -189 
   
   - Hits  10873  92-10781 
   - Misses 4598   14299 +9701 
   + Partials736   3  -733
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0% <ø> (-100%)` | `0 <0> (ø)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `0% <0%> (-75%)` | `0 <0> (ø)` | |
   | 
[...rg/apache/hudi/metrics/MetricsReporterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXJGYWN0b3J5LmphdmE=)
 | `0% <0%> (-46.16%)` | `0 <0> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/MetricsJmxReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzSm14UmVwb3J0ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/hudi/metrics/MetricsGraphiteReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzR3JhcGhpdGVSZXBvcnRlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...va/org/apache/hudi/config/HoodieMetricsConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZU1ldHJpY3NDb25maWcuamF2YQ==)
 | `0% <0%> (-54.84%)` | `0 <0> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `0% <0%> (-83.85%)` | `0 <0> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `0% <0%> (-68.58%)` | `0 <0> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | ... and [297 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=footer).
 Last update 
[0dc8e49...57b4522](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=lastupdated).
 Read the [comment 

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi Dataset Snapshot Exporter

2020-03-01 Thread GitBox
xushiyan commented on a change in pull request #1360: [HUDI-344][RFC-09] Hudi 
Dataset Snapshot Exporter
URL: https://github.com/apache/incubator-hudi/pull/1360#discussion_r386093576
 
 

 ##
 File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
 ##
 @@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import com.beust.jcommander.JCommander;
+import com.beust.jcommander.Parameter;
+
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.fs.Path;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.SerializableConfiguration;
+import org.apache.hudi.common.model.HoodiePartitionMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.table.HoodieTimeline;
+import org.apache.hudi.common.table.TableFileSystemView;
+import org.apache.hudi.common.table.timeline.HoodieInstant;
+import org.apache.hudi.common.table.view.HoodieTableFileSystemView;
+import org.apache.hudi.common.util.FSUtils;
+import org.apache.hudi.common.util.Option;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.Column;
+import org.apache.spark.sql.SaveMode;
+import org.apache.spark.sql.SparkSession;
+
+import scala.Tuple2;
+import scala.collection.JavaConversions;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Export the latest records of Hudi dataset to a set of external files (e.g., 
plain parquet files).
+ */
+
+public class HoodieSnapshotExporter {
+  private static final Logger LOG = 
LogManager.getLogger(HoodieSnapshotExporter.class);
+
+  public static class Config implements Serializable {
+@Parameter(names = {"--source-base-path", "-sbp"}, description = "Base 
path for the source Hudi dataset to be snapshotted", required = true)
+String basePath = null;
+
+@Parameter(names = {"--target-base-path", "-tbp"}, description = "Base 
path for the target output files (snapshots)", required = true)
+String outputPath = null;
+
+@Parameter(names = {"--snapshot-prefix", "-sp"}, description = "Snapshot 
prefix or directory under the target base path in order to segregate different 
snapshots")
+String snapshotPrefix;
+
+@Parameter(names = {"--output-format", "-of"}, description = "e.g. Hudi or 
Parquet", required = true)
+String outputFormat;
+
+@Parameter(names = {"--output-partition-field", "-opf"}, description = "A 
field to be used by Spark repartitioning")
+String outputPartitionField;
+  }
+
+  public void export(SparkSession spark, Config cfg) throws IOException {
+String sourceBasePath = cfg.basePath;
+String targetBasePath = cfg.outputPath;
+String snapshotPrefix = cfg.snapshotPrefix;
+String outputFormat = cfg.outputFormat;
+String outputPartitionField = cfg.outputPartitionField;
+JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
+FileSystem fs = FSUtils.getFs(sourceBasePath, jsc.hadoopConfiguration());
+
+final SerializableConfiguration serConf = new 
SerializableConfiguration(jsc.hadoopConfiguration());
+final HoodieTableMetaClient tableMetadata = new 
HoodieTableMetaClient(fs.getConf(), sourceBasePath);
+final TableFileSystemView.BaseFileOnlyView fsView = new 
HoodieTableFileSystemView(tableMetadata,
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants());
+// Get the latest commit
+Option latestCommit =
+
tableMetadata.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
+if (!latestCommit.isPresent()) {
+  LOG.warn("No commits present. Nothing to snapshot");
+  return;
+}
+final String latestCommitTimestamp = latestCommit.get().getTimestamp();
+LOG.info(String.format("Starting to snapshot latest version files which 
are also 

[GitHub] [incubator-hudi] codecov-io edited a comment on issue #1106: [HUDI-209] Implement JMX metrics reporter

2020-03-01 Thread GitBox
codecov-io edited a comment on issue #1106: [HUDI-209] Implement JMX metrics 
reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#issuecomment-593074963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=h1) 
Report
   > Merging 
[#1106](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/0dc8e493aa1658910a3519df3941278d9d072c18?src=pr=desc)
 will **decrease** coverage by `0.29%`.
   > The diff coverage is `5.4%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1106/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree)
   
   ```diff
   @@ Coverage Diff @@
   ## master#1106 +/-   ##
   ===
   - Coverage 67.08%   66.79%   -0.3% 
 Complexity  223  223 
   ===
 Files   333  334  +1 
 Lines 1620716287 +80 
 Branches   1657 1661  +4 
   ===
   + Hits  1087310879  +6 
   - Misses 4598 4672 +74 
 Partials736  736
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `100% <ø> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...rg/apache/hudi/metrics/MetricsReporterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXJGYWN0b3J5LmphdmE=)
 | `46.15% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...va/org/apache/hudi/metrics/MetricsJmxReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzSm14UmVwb3J0ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/hudi/metrics/MetricsGraphiteReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzR3JhcGhpdGVSZXBvcnRlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `83.84% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `100% <100%> (+25%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...va/org/apache/hudi/config/HoodieMetricsConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZU1ldHJpY3NDb25maWcuamF2YQ==)
 | `59.37% <100%> (+4.53%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `70.27% <100%> (+1.69%)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh)
 | `79.79% <0%> (-1.02%)` | `8% <0%> (ø)` | |
   | ... and [5 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=footer).
 Last update 

[GitHub] [incubator-hudi] codecov-io commented on issue #1106: [HUDI-209] Implement JMX metrics reporter

2020-03-01 Thread GitBox
codecov-io commented on issue #1106: [HUDI-209] Implement JMX metrics reporter
URL: https://github.com/apache/incubator-hudi/pull/1106#issuecomment-593074963
 
 
   # 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=h1) 
Report
   > Merging 
[#1106](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=desc) 
into 
[master](https://codecov.io/gh/apache/incubator-hudi/commit/0dc8e493aa1658910a3519df3941278d9d072c18?src=pr=desc)
 will **decrease** coverage by `66.44%`.
   > The diff coverage is `0%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/incubator-hudi/pull/1106/graphs/tree.svg?width=650=VTTXabwbs2=150=pr)](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #1106   +/-   ##
   
   - Coverage 67.08%   0.63%   -66.45% 
   + Complexity  223   2  -221 
   
 Files   333 288   -45 
 Lines 16207   14390 -1817 
 Branches   16571467  -190 
   
   - Hits  10873  92-10781 
   - Misses 4598   14295 +9697 
   + Partials736   3  -733
   ```
   
   
   | [Impacted 
Files](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=tree) | 
Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | 
[.../java/org/apache/hudi/metrics/MetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXIuamF2YQ==)
 | `0% <ø> (-100%)` | `0 <0> (ø)` | |
   | 
[...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh)
 | `0% <0%> (-75%)` | `0 <0> (ø)` | |
   | 
[...rg/apache/hudi/metrics/MetricsReporterFactory.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzUmVwb3J0ZXJGYWN0b3J5LmphdmE=)
 | `0% <0%> (-46.16%)` | `0 <0> (ø)` | |
   | 
[...va/org/apache/hudi/metrics/MetricsJmxReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzSm14UmVwb3J0ZXIuamF2YQ==)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...g/apache/hudi/metrics/MetricsGraphiteReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzR3JhcGhpdGVSZXBvcnRlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (ø)` | :arrow_down: |
   | 
[...va/org/apache/hudi/config/HoodieMetricsConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZU1ldHJpY3NDb25maWcuamF2YQ==)
 | `0% <0%> (-54.84%)` | `0 <0> (ø)` | |
   | 
[...java/org/apache/hudi/config/HoodieWriteConfig.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29uZmlnL0hvb2RpZVdyaXRlQ29uZmlnLmphdmE=)
 | `0% <0%> (-83.85%)` | `0 <0> (ø)` | |
   | 
[...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=)
 | `0% <0%> (-68.58%)` | `0 <0> (ø)` | |
   | 
[...ava/org/apache/hudi/metrics/JmxReporterServer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9KbXhSZXBvcnRlclNlcnZlci5qYXZh)
 | `0% <0%> (ø)` | `0 <0> (?)` | |
   | 
[...che/hudi/common/table/timeline/dto/LogFileDTO.java](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL2R0by9Mb2dGaWxlRFRPLmphdmE=)
 | `0% <0%> (-100%)` | `0% <0%> (ø)` | |
   | ... and [297 
more](https://codecov.io/gh/apache/incubator-hudi/pull/1106/diff?src=pr=tree-more)
 | |
   
   --
   
   [Continue to review full report at 
Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=continue).
   > **Legend** - [Click here to learn 
more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute  (impact)`, `ø = not affected`, `? = missing data`
   > Powered by 
[Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=footer).
 Last update 
[0dc8e49...318ced8](https://codecov.io/gh/apache/incubator-hudi/pull/1106?src=pr=lastupdated).
 Read the [comment