[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

2021-01-28 Thread GitBox


pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566617430



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##
@@ -285,7 +289,14 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
 
   private def mergeRowWithLog(curRow: InternalRow, curKey: String) = {
 val historyAvroRecord = 
serializer.serialize(curRow).asInstanceOf[GenericRecord]
-
logRecords.get(curKey).getData.combineAndGetUpdateValue(historyAvroRecord, 
tableAvroSchema)
+if (preCombineField != null) {
+  val payloadProps = new Properties()

Review comment:
   Good suggestion! I will refactor this code later.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jiangjiguang commented on a change in pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox


jiangjiguang commented on a change in pull request #2505:
URL: https://github.com/apache/hudi/pull/2505#discussion_r566617008



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestQuickstartUtils.java
##
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi;
+
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.junit.jupiter.MockitoExtension;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+@ExtendWith(MockitoExtension.class)
+public class TestQuickstartUtils {
+
+  @Test
+  public void testGenerateUpdates() throws Exception {
+QuickstartUtils.DataGenerator dataGenerator = new 
QuickstartUtils.DataGenerator();
+assertEquals(dataGenerator.generateInserts(10).size(), 10);
+assertEquals(dataGenerator.generateUpdates(10).size(), 10);

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2021-01-28 Thread GitBox


vinothchandar commented on issue #2149:
URL: https://github.com/apache/hudi/issues/2149#issuecomment-769607086


   @toninis this is kind of weird, given the snippet that has the constructor. 
the class seems to be there in the build. 
   do you have a branch where you have the code stashed? We can open a new 
issue or JIRA and work through this cc @nsivabalan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-01-28 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-864:

Labels: user-support-issues  (was: )

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Roland Johann
>Priority: Major
>  Labels: user-support-issues
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
> 

[jira] [Commented] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-01-28 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274169#comment-17274169
 ] 

Vinoth Chandar commented on HUDI-864:
-

[~shivnarayan] there are issues like these, that have no component, but we are 
missing to capture. 

We need a full sweep of JIRA. no easy go.

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Roland Johann
>Priority: Major
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 

[GitHub] [hudi] vinothchandar edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-28 Thread GitBox


vinothchandar edited a comment on issue #2100:
URL: https://github.com/apache/hudi/issues/2100#issuecomment-769600129


   > I feel that the independent timeline service may be helpful in identifying 
hudi tables in a cluster.
   
   @cdmikechen This is actually very interesting to me too. Can we start a 
DISCUSS thread around this? we can also embed the Hive MetaStore instance 
within and make it very simple for everyone to deploy 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] jiangjiguang commented on a change in pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox


jiangjiguang commented on a change in pull request #2505:
URL: https://github.com/apache/hudi/pull/2505#discussion_r566599679



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java
##
@@ -176,14 +175,17 @@ public HoodieRecord generateUpdateRecord(HoodieKey key, 
String randomString) thr
  * @return list of hoodie record updates
  */
 public List generateUpdates(Integer n) throws IOException {
-  String randomString = generateRandomString();
-  List updates = new ArrayList<>();
-  for (int i = 0; i < n; i++) {
-HoodieKey key = existingKeys.get(rand.nextInt(numExistingKeys));
-HoodieRecord record = generateUpdateRecord(key, randomString);
-updates.add(record);
+  if (numExistingKeys == 0) {
+throw new IllegalArgumentException("Data must have been written before 
performing the update operation");

Review comment:
   sure, i do it

##
File path: 
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestQuickstartUtils.java
##
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi;
+
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.junit.jupiter.MockitoExtension;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+@ExtendWith(MockitoExtension.class)
+public class TestQuickstartUtils {
+
+  @Test
+  public void testGenerateUpdates() throws Exception {
+QuickstartUtils.DataGenerator dataGenerator = new 
QuickstartUtils.DataGenerator();
+assertEquals(dataGenerator.generateInserts(10).size(), 10);
+assertEquals(dataGenerator.generateUpdates(10).size(), 10);

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-28 Thread GitBox


vinothchandar commented on issue #2100:
URL: https://github.com/apache/hudi/issues/2100#issuecomment-769600129


   > I feel that the independent timeline service may be helpful in identifying 
hudi tables in a cluster.
   @cdmikechen This is actually very interesting to me too. Can we start a 
DISCUSS thread around this? we can also embed the Hive MetaStore instance 
within and make it very simple for everyone to deploy 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2502: [HUDI-1555] Remove isEmpty to improve clustering execution performance

2021-01-28 Thread GitBox


codecov-io edited a comment on pull request #2502:
URL: https://github.com/apache/hudi/pull/2502#issuecomment-769063418


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2502?src=pr=h1) Report
   > Merging 
[#2502](https://codecov.io/gh/apache/hudi/pull/2502?src=pr=desc) (f903c85) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/c8ee40f8ae34607072a27d4e7ccb21fc4df13ca1?el=desc)
 (c8ee40f) will **increase** coverage by `0.08%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2502/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2502?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master#2502  +/-   ##
   
   + Coverage 50.18%   50.27%   +0.08% 
   - Complexity 3051 3120  +69 
   
 Files   419  430  +11 
 Lines 1893119565 +634 
 Branches   1948 2004  +56 
   
   + Hits   9501 9836 +335 
   - Misses 8656 8925 +269 
   - Partials774  804  +30 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.49% <ø> (+<0.01%)` | `0.00 <ø> (ø)` | |
   | hudiflink | `33.03% <0.00%> (+33.03%)` | `0.00 <0.00> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `65.85% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2502?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[.../apache/hudi/operator/InstantGenerateOperator.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9JbnN0YW50R2VuZXJhdGVPcGVyYXRvci5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00 <0.00> (ø)` | |
   | 
[...i/common/model/OverwriteWithLatestAvroPayload.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL092ZXJ3cml0ZVdpdGhMYXRlc3RBdnJvUGF5bG9hZC5qYXZh)
 | `60.00% <0.00%> (-4.71%)` | `10.00% <0.00%> (ø%)` | |
   | 
[...e/hudi/common/table/log/HoodieLogFormatWriter.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9Ib29kaWVMb2dGb3JtYXRXcml0ZXIuamF2YQ==)
 | `78.12% <0.00%> (-1.57%)` | `26.00% <0.00%> (ø%)` | |
   | 
[...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==)
 | `56.09% <0.00%> (ø)` | `38.00% <0.00%> (+1.00%)` | |
   | 
[...main/java/org/apache/hudi/HoodieFlinkStreamer.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9Ib29kaWVGbGlua1N0cmVhbWVyLmphdmE=)
 | | | |
   | 
[...ache/hudi/operator/StreamWriteOperatorFactory.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9TdHJlYW1Xcml0ZU9wZXJhdG9yRmFjdG9yeS5qYXZh)
 | `0.00% <0.00%> (ø)` | `0.00% <0.00%> (?%)` | |
   | 
[...he/hudi/operator/event/BatchWriteSuccessEvent.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9ldmVudC9CYXRjaFdyaXRlU3VjY2Vzc0V2ZW50LmphdmE=)
 | `100.00% <0.00%> (ø)` | `4.00% <0.00%> (?%)` | |
   | 
[.../org/apache/hudi/operator/StreamWriteFunction.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS9vcGVyYXRvci9TdHJlYW1Xcml0ZUZ1bmN0aW9uLmphdmE=)
 | `80.55% <0.00%> (ø)` | `14.00% <0.00%> (?%)` | |
   | 
[.../org/apache/hudi/util/RowDataToAvroConverters.java](https://codecov.io/gh/apache/hudi/pull/2502/diff?src=pr=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1Jvd0RhdGFUb0F2cm9Db252ZXJ0ZXJzLmphdmE=)
 | `42.05% <0.00%> (ø)` | `8.00% <0.00%> (?%)` | |
   | 

[GitHub] [hudi] codecov-io edited a comment on pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-01-28 Thread GitBox


codecov-io edited a comment on pull request #2496:
URL: https://github.com/apache/hudi/pull/2496#issuecomment-768170324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on pull request #2419: [HUDI-1421] Improvement of failure recovery for HoodieFlinkStreamer.

2021-01-28 Thread GitBox


wangxianghu commented on pull request #2419:
URL: https://github.com/apache/hudi/pull/2419#issuecomment-769568185


   @loukey-lj sorry for the delay, please fix the conficts first



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] wangxianghu commented on a change in pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox


wangxianghu commented on a change in pull request #2505:
URL: https://github.com/apache/hudi/pull/2505#discussion_r566572544



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java
##
@@ -176,14 +175,17 @@ public HoodieRecord generateUpdateRecord(HoodieKey key, 
String randomString) thr
  * @return list of hoodie record updates
  */
 public List generateUpdates(Integer n) throws IOException {
-  String randomString = generateRandomString();
-  List updates = new ArrayList<>();
-  for (int i = 0; i < n; i++) {
-HoodieKey key = existingKeys.get(rand.nextInt(numExistingKeys));
-HoodieRecord record = generateUpdateRecord(key, randomString);
-updates.add(record);
+  if (numExistingKeys == 0) {
+throw new IllegalArgumentException("Data must have been written before 
performing the update operation");

Review comment:
   can we throws `HoodieException` here

##
File path: 
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestQuickstartUtils.java
##
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi;
+
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.junit.jupiter.MockitoExtension;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+@ExtendWith(MockitoExtension.class)
+public class TestQuickstartUtils {
+
+  @Test
+  public void testGenerateUpdates() throws Exception {
+QuickstartUtils.DataGenerator dataGenerator = new 
QuickstartUtils.DataGenerator();
+assertEquals(dataGenerator.generateInserts(10).size(), 10);
+assertEquals(dataGenerator.generateUpdates(10).size(), 10);

Review comment:
   is is possible to add a test against the case where exception is thrown ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-28 Thread GitBox


vinothchandar commented on issue #1829:
URL: https://github.com/apache/hudi/issues/1829#issuecomment-769566404


   0.7.0 is out! 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io edited a comment on pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox


codecov-io edited a comment on pull request #2505:
URL: https://github.com/apache/hudi/pull/2505#issuecomment-769557231


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=h1) Report
   > Merging 
[#2505](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=desc) (a1ab5a8) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/c4afd179c1983a382b8a5197d800b0f5dba254de?el=desc)
 (c4afd17) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2505/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2505   +/-   ##
   =
   + Coverage 50.18%   69.43%   +19.24% 
   + Complexity 3050  357 -2693 
   =
 Files   419   53  -366 
 Lines 18931 1930-17001 
 Branches   1948  230 -1718 
   =
   - Hits   9500 1340 -8160 
   + Misses 8656  456 -8200 
   + Partials775  134  -641 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/engine/TaskContextSupplier.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9UYXNrQ29udGV4dFN1cHBsaWVyLmphdmE=)
 | | | |
   | 
[...sioning/clean/CleanMetadataV1MigrationHandler.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvY2xlYW4vQ2xlYW5NZXRhZGF0YVYxTWlncmF0aW9uSGFuZGxlci5qYXZh)
 | | | |
   | 
[...e/hudi/exception/HoodieDeltaStreamerException.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZURlbHRhU3RyZWFtZXJFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...e/hudi/common/util/collection/RocksDBBasedMap.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9Sb2Nrc0RCQmFzZWRNYXAuamF2YQ==)
 | | | |
   | 
[...apache/hudi/common/util/ParquetReaderIterator.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvUGFycXVldFJlYWRlckl0ZXJhdG9yLmphdmE=)
 | | | |
   | 
[...a/org/apache/hudi/common/bloom/InternalFilter.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0ludGVybmFsRmlsdGVyLmphdmE=)
 | | | |
   | 
[.../org/apache/hudi/io/storage/HoodieHFileReader.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vc3RvcmFnZS9Ib29kaWVIRmlsZVJlYWRlci5qYXZh)
 | | | |
   | 
[...ache/hudi/cli/commands/ArchivedCommitsCommand.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0FyY2hpdmVkQ29tbWl0c0NvbW1hbmQuamF2YQ==)
 | | | |
   | 
[...e/hudi/exception/HoodieRecordMissingException.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZVJlY29yZE1pc3NpbmdFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=)
 | | | |
   | ... and [355 
more](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact 

[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-01-28 Thread GitBox


vinothchandar commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r566564182



##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##
@@ -188,41 +196,51 @@ private synchronized void openFileSliceIfNeeded() throws 
IOException {
 
 // Load the schema
 Schema schema = 
HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema());
-logRecordScanner = new 
HoodieMetadataMergedLogRecordScanner(metaClient.getFs(), metadataBasePath,
-logFilePaths, schema, latestMetaInstantTimestamp, 
MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
+HoodieMetadataMergedLogRecordScanner logRecordScanner = new 
HoodieMetadataMergedLogRecordScanner(metaClient.getFs(),
+metadataBasePath, logFilePaths, schema, 
latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
 spillableMapDirectory, null);
 
 LOG.info("Opened metadata log files from " + logFilePaths + " at instant " 
+ latestInstantTime
 + "(dataset instant=" + latestInstantTime + ", metadata instant=" + 
latestMetaInstantTimestamp + ")");
 
 metrics.ifPresent(metrics -> 
metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR, timer.endTimer()));
+
+if (metadataConfig.enableReuse()) {
+  // cache for later reuse
+  cachedBaseFileReader = baseFileReader;
+  cachedLogRecordScanner = logRecordScanner;
+}
+
+return Pair.of(baseFileReader, logRecordScanner);
   }
 
-  private void closeIfNeeded() {
+  private void closeIfNeeded(Pair readers) {

Review comment:
   I prefer the previous approach for readability.  i.e have it just close 
the member variables. If you disagree, may be we can chat about why you think 
this is better for reading. I did face this issue when I originally read the 
code here

##
File path: 
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##
@@ -188,41 +196,51 @@ private synchronized void openFileSliceIfNeeded() throws 
IOException {
 
 // Load the schema
 Schema schema = 
HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema());
-logRecordScanner = new 
HoodieMetadataMergedLogRecordScanner(metaClient.getFs(), metadataBasePath,
-logFilePaths, schema, latestMetaInstantTimestamp, 
MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
+HoodieMetadataMergedLogRecordScanner logRecordScanner = new 
HoodieMetadataMergedLogRecordScanner(metaClient.getFs(),
+metadataBasePath, logFilePaths, schema, 
latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
 spillableMapDirectory, null);
 
 LOG.info("Opened metadata log files from " + logFilePaths + " at instant " 
+ latestInstantTime
 + "(dataset instant=" + latestInstantTime + ", metadata instant=" + 
latestMetaInstantTimestamp + ")");
 
 metrics.ifPresent(metrics -> 
metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR, timer.endTimer()));
+
+if (metadataConfig.enableReuse()) {

Review comment:
   We are sort of creating two code paths again for reuse and non-reuse. 
Can we please go back to just always initializing the member variable here 
always? 
   then `closeIfNeeded()` can continue to work with members alone. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox


codecov-io commented on pull request #2505:
URL: https://github.com/apache/hudi/pull/2505#issuecomment-769557231


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=h1) Report
   > Merging 
[#2505](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=desc) (a1ab5a8) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/c4afd179c1983a382b8a5197d800b0f5dba254de?el=desc)
 (c4afd17) will **decrease** coverage by `40.49%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2505/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2505   +/-   ##
   
   - Coverage 50.18%   9.68%   -40.50% 
   + Complexity 3050  48 -3002 
   
 Files   419  53  -366 
 Lines 189311930-17001 
 Branches   1948 230 -1718 
   
   - Hits   9500 187 -9313 
   + Misses 86561730 -6926 
   + Partials775  13  -762 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2505?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2505/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] garyli1019 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

2021-01-28 Thread GitBox


garyli1019 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566552361



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##
@@ -285,7 +289,14 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
 
   private def mergeRowWithLog(curRow: InternalRow, curKey: String) = {
 val historyAvroRecord = 
serializer.serialize(curRow).asInstanceOf[GenericRecord]
-
logRecords.get(curKey).getData.combineAndGetUpdateValue(historyAvroRecord, 
tableAvroSchema)
+if (preCombineField != null) {
+  val payloadProps = new Properties()

Review comment:
   we are creating a new `Properties` in every call, can we put this 
outside?

##
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java
##
@@ -108,7 +108,7 @@ public String createTable(
 
 final HoodieTableType tableType = HoodieTableType.valueOf(tableTypeStr);
 HoodieTableMetaClient.initTableType(HoodieCLI.conf, path, tableType, name, 
archiveFolder,
-payloadClass, layoutVersion);
+payloadClass, null, layoutVersion);

Review comment:
   can we add a new method `initTableType` to handle all the null being 
added?

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##
@@ -50,7 +50,8 @@ case class HoodieMergeOnReadTableState(tableStructSchema: 
StructType,
requiredStructSchema: StructType,
tableAvroSchema: String,
requiredAvroSchema: String,
-   hoodieRealtimeFileSplits: 
List[HoodieMergeOnReadFileSplit])
+   hoodieRealtimeFileSplits: 
List[HoodieMergeOnReadFileSplit],
+   preCombineField: String)

Review comment:
   can we make this field `option` instead of using `null`?

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: 
SQLContext,
   private val tableStructSchema = 
AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = 
getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField
+if (fieldFromTableConfig != null) {
+  fieldFromTableConfig
+} else if 
(optParams.contains(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY)) {

Review comment:
   can we use the `HoodieTableConfig` instead? or somehow translate all 
precombine field options into one place and deprecate others. Using the write 
option while reading sounds a bit odd. 

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: 
SQLContext,
   private val tableStructSchema = 
AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = 
getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField

Review comment:
   `preCombineFieldFromTableConfig` sounds better?

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##
@@ -18,6 +18,8 @@
 
 package org.apache.hudi
 
+import java.util.Properties

Review comment:
   nit: this import should be in the next group

##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: 
SQLContext,
   private val tableStructSchema = 
AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = 
getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField
+if (fieldFromTableConfig != null) {

Review comment:
   If the field does not exist, will this be an empty string or null?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1523) Avoid excessive mkdir calls when creating new files

2021-01-28 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274130#comment-17274130
 ] 

Vinoth Chandar commented on HUDI-1523:
--

I don't think we need a config here. if should be safe to do a if statment.

> Avoid excessive mkdir calls when creating new files
> ---
>
> Key: HUDI-1523
> URL: https://issues.apache.org/jira/browse/HUDI-1523
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Balaji Varadarajan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.8.0
>
>
> https://github.com/apache/hudi/issues/2423



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1111) Highlight Hudi guarantees in documentation section of website

2021-01-28 Thread Vinoth Chandar (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274127#comment-17274127
 ] 

Vinoth Chandar commented on HUDI-:
--

yes . please


> Highlight Hudi guarantees in documentation section of website 
> --
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> [https://github.com/apache/hudi/issues/1795]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] vinothchandar commented on pull request #1975: [HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API

2021-01-28 Thread GitBox


vinothchandar commented on pull request #1975:
URL: https://github.com/apache/hudi/pull/1975#issuecomment-769525257


   @lw309637554 could you review this as well? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific partition

2021-01-28 Thread GitBox


vinothchandar commented on pull request #2452:
URL: https://github.com/apache/hudi/pull/2452#issuecomment-769518711


   @n3nash can you also please review this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2431: [HUDI-1526] Translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-28 Thread GitBox


vinothchandar commented on pull request #2431:
URL: https://github.com/apache/hudi/pull/2431#issuecomment-769518155


   @nsivabalan @zhedoubushishi  also to review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (bc0325f -> 23f2ef3)

2021-01-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from bc0325f  [HUDI-1522] Add a new pipeline for Flink writer (#2430)
 add 23f2ef3  [HUDI-623] Remove UpgradePayloadFromUberToApache (#2455)

No new revisions were added by this update.

Summary of changes:
 .../adhoc/UpgradePayloadFromUberToApache.java  | 118 -
 1 file changed, 118 deletions(-)
 delete mode 100644 
hudi-utilities/src/main/java/org/apache/hudi/utilities/adhoc/UpgradePayloadFromUberToApache.java



[GitHub] [hudi] vinothchandar merged pull request #2455: [HUDI-623] Remove UpgradePayloadFromUberToApache

2021-01-28 Thread GitBox


vinothchandar merged pull request #2455:
URL: https://github.com/apache/hudi/pull/2455


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2455: [HUDI-623] Remove UpgradePayloadFromUberToApache

2021-01-28 Thread GitBox


vinothchandar commented on pull request #2455:
URL: https://github.com/apache/hudi/pull/2455#issuecomment-769517737


   Its been a while. So okay to drop this now. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-28 Thread GitBox


vinothchandar commented on pull request #2475:
URL: https://github.com/apache/hudi/pull/2475#issuecomment-769517183


   @zhedoubushishi @umehrot2 could you please take a first pass



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-28 Thread GitBox


vinothchandar commented on a change in pull request #2476:
URL: https://github.com/apache/hudi/pull/2476#discussion_r566524170



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
##
@@ -96,19 +94,21 @@
   private static final Logger LOG = LogManager.getLogger(UtilHelpers.class);
 
   public static Source createSource(String sourceClass, TypedProperties cfg, 
JavaSparkContext jssc,
-SparkSession sparkSession, SchemaProvider 
schemaProvider, HoodieDeltaStreamerMetrics metrics) throws IOException {
-
+SparkSession sparkSession, SchemaProvider 
schemaProvider,
+HoodieDeltaStreamerMetrics metrics) throws 
IOException {
 try {
-  if (JsonKafkaSource.class.getName().equals(sourceClass)
-  || AvroKafkaSource.class.getName().equals(sourceClass)) {
+  try {

Review comment:
   I am not sure if handling this via exception is making this more 
readable. wdyt? 
   For e.g the current impl makes it clear, in what case a certain constructor 
is called,. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

2021-01-28 Thread GitBox


vinothchandar commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-769513904


   cc @nsivabalan to also triage



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-01-28 Thread GitBox


vinothchandar commented on pull request #2500:
URL: https://github.com/apache/hudi/pull/2500#issuecomment-769513621


   cc @vburenin could you please review this as well? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-57) [UMBRELLA] Support ORC Storage

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-57:

Labels: pull-request-available  (was: pull-request-available 
user-support-issues)

> [UMBRELLA] Support ORC Storage
> --
>
> Key: HUDI-57
> URL: https://issues.apache.org/jira/browse/HUDI-57
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Mani Jindal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://github.com/uber/hudi/issues/68]
> https://github.com/uber/hudi/issues/155



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-89) Clean up placement, naming, defaults of HoodieWriteConfig

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-89:

Labels:   (was: user-support-issues)

> Clean up placement, naming, defaults of HoodieWriteConfig
> -
>
> Key: HUDI-89
> URL: https://issues.apache.org/jira/browse/HUDI-89
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Code Cleanup, Usability, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>
> # Rename HoodieWriteConfig to HoodieClientConfig 
>  # Move bunch of configs from  CompactionConfig to StorageConfig 
>  # Introduce new HoodieCleanConfig
>  # Should we consider lombok or something to automate the 
> defaults/getters/setters
>  # Consistent name of properties/defaults 
>  # Enforce bounds more strictly 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-274) Consolidate all scripts under top level scripts directory

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-274:
-
Labels: starter  (was: starter user-support-issues)

> Consolidate all scripts under top level scripts directory
> -
>
> Key: HUDI-274
> URL: https://issues.apache.org/jira/browse/HUDI-274
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: starter
>
> Before we do this, let us revisit one more time if this is ideal. It has 
> pros/cons. Moving to one place makes it easy to find but the script should 
> assume the inter-directory structure. Also, each sub-module is not contained 
> entirely as the script is in different place
> This came up in a code-review discussion : 
> https://github.com/apache/incubator-hudi/pull/918#discussion_r327904862
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-259) Hadoop 3 support for Hudi writing

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-259:
-
Status: Open  (was: New)

> Hadoop 3 support for Hudi writing
> -
>
> Key: HUDI-259
> URL: https://issues.apache.org/jira/browse/HUDI-259
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: Wenning Ding
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Sample issues
>  
> [https://github.com/apache/incubator-hudi/issues/735]
> [https://github.com/apache/incubator-hudi/issues/877#issuecomment-528433568] 
> [https://github.com/apache/incubator-hudi/issues/898]
>  
> https://github.com/apache/hudi/issues/1776 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-318) Update Migration Guide to Include Delta Streamer

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-318:
-
Labels: doc  (was: doc user-support-issues)

> Update Migration Guide to Include Delta Streamer
> 
>
> Key: HUDI-318
> URL: https://issues.apache.org/jira/browse/HUDI-318
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Yanjia Gary Li
>Priority: Minor
>  Labels: doc
>
> [http://hudi.apache.org/migration_guide.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-395:
-
Labels:   (was: user-support-issues)

> hudi does not support scheme s3n when wrtiing to S3
> ---
>
> Key: HUDI-395
> URL: https://issues.apache.org/jira/browse/HUDI-395
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: newbie, Spark Integration, Usability
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: rui feng
>Assignee: sivabalan narayanan
>Priority: Major
>
> When I use Hudi to create a hudi table then write to s3, I used below maven 
> snnipet which is recommended by [https://hudi.apache.org/s3_hoodie.html]
> 
>  org.apache.hudi
>  hudi-spark-bundle
>  0.5.0-incubating
> 
> 
>  org.apache.hadoop
>  hadoop-aws
>  2.7.3
> 
> 
>  com.amazonaws
>  aws-java-sdk
>  1.10.34
> 
> and add the below configuration:
> sc.hadoopConfiguration.set("fs.defaultFS", "s3://niketest1")
>  sc.hadoopConfiguration.set("fs.s3.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3n.impl", 
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>  sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "x")
>  sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xx")
>  sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "x")
>  
> my spark version is spark-2.4.4-bin-hadoop2.7 and when I run below
> {color:#FF}df.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Overwrite).save(hudiTablePath).{color}
> val hudiOptions = Map[String,String](
>  HoodieWriteConfig.TABLE_NAME -> "hudi12",
>  DataSourceWriteOptions.OPERATION_OPT_KEY -> 
> DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL,
>  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "rider",
>  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> 
> DataSourceWriteOptions.MOR_STORAGE_TYPE_OPT_VAL)
> val hudiTablePath = "s3://niketest1/hudi_test/hudi12"
> the exception occur:
> j{color:#FF}ava.lang.IllegalArgumentException: 
> BlockAlignedAvroParquetWriter does not support scheme s3n{color}
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.getHoodieScheme(HoodieWrapperFileSystem.java:109)
>  at 
> org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.convertToHoodiePath(HoodieWrapperFileSystem.java:85)
>  at 
> org.apache.hudi.io.storage.HoodieParquetWriter.(HoodieParquetWriter.java:57)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.newParquetStorageWriter(HoodieStorageWriterFactory.java:60)
>  at 
> org.apache.hudi.io.storage.HoodieStorageWriterFactory.getStorageWriter(HoodieStorageWriterFactory.java:44)
>  at org.apache.hudi.io.HoodieCreateHandle.(HoodieCreateHandle.java:70)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:137)
>  at 
> org.apache.hudi.func.CopyOnWriteLazyInsertIterable$CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteLazyInsertIterable.java:125)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:38)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:120)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> Is anyone can tell me what's cause this exception, I tried to use 
> org.apache.hadoop.fs.s3.S3FileSystem to replace 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem for the conf "fs.s3.impl", 
> but other exception occur and it seems org.apache.hadoop.fs.s3.S3FileSystem 
> fit hadoop 2.6.
>  
> Thanks advance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-849) Turn on incremental Syncing by default for DeltaStreamer and spark streaming cases

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-849:
-
Labels:   (was: user-support-issues)

> Turn on incremental Syncing by default for DeltaStreamer and spark streaming 
> cases
> --
>
> Key: HUDI-849
> URL: https://issues.apache.org/jira/browse/HUDI-849
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-984) Support Hive 1.x out of box

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-984:
-
Labels:   (was: user-support-issues)

> Support Hive 1.x out of box
> ---
>
> Key: HUDI-984
> URL: https://issues.apache.org/jira/browse/HUDI-984
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> With 0.5.0, Hudi is using 2.x as part of its compile time dependency and 
> works with Hive 2.x servers out of the box.
> We need similar support for Hive 1.x as it is still being used.
> 1. Hive 1.x servers can run queries with Hudi table
> 2. Hive Sync must happen successfully between Hudi tables and Hive 1.x 
> services
>  
> Important Note: Hive 1.x has 2 classes of versions:
>  # pre 1.2.0
>  # 1.2.0 and later
> We had earlier found out that those 2 classes are not compatible with each 
> other unfortunately. CDH version of Hive used to have pre 1.2.0. We need to 
> look at the feasibility, cost and impact of supporting of one or more of this 
> class.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-893) Add spark datasource V2 reader support for Hudi tables

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-893:
-
Labels:   (was: user-support-issues)

> Add spark datasource V2 reader support for Hudi tables
> --
>
> Key: HUDI-893
> URL: https://issues.apache.org/jira/browse/HUDI-893
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nan Zhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1088) hive version 1.1.0 integrated with hudi,select * from hudi_table error in HUE

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1088:
--
Labels:   (was: user-support-issues)

> hive version 1.1.0 integrated with hudi,select * from hudi_table error in HUE
> -
>
> Key: HUDI-1088
> URL: https://issues.apache.org/jira/browse/HUDI-1088
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
> Environment: Hive version 1.1.0、hudi-0.5.3、Cloudera manager 5.14.4
>Reporter: wangmeng
>Priority: Major
>
> * Hue执行语句:select * from hudi_table where
>  * inputformat:set 
> hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat;
>  * 异常信息
> Driver stacktrace:
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1457)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1445)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1444)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>  at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1444)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
>  at scala.Option.foreach(Option.scala:236)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1668)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1627)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1616)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>  Caused by: java.lang.RuntimeException: Error processing row: 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>  at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>  at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
>  at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
>  at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>  at org.apache.spark.scheduler.Task.run(Task.scala:89)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:446)
>  at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
>  at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1127) Handling late arriving Deletes

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1127:
--
Labels:   (was: user-support-issues)

> Handling late arriving Deletes
> --
>
> Key: HUDI-1127
> URL: https://issues.apache.org/jira/browse/HUDI-1127
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer, Writer Core
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Major
> Fix For: 0.8.0
>
>
> Recently I was working on a [PR|https://github.com/apache/hudi/pull/1704] to 
> enhance OverwriteWithLatestAvroPayload class to consider records in storage 
> when merging. Briefly, this class will ignore older updates if the record in 
> storage is the latest one ( based on the Precombine field). 
> Based on this, the expectation is that we handle any write operation that 
> should be dealt with the same way - if they are older they should be ignored. 
> While at this, I identified that we cannot handle all Deletes the same way. 
> This is because we process deletes in two ways mainly -
>  * by adding and enabling a metadata field  `_hoodie_is_deleted` to our in 
> the original record and sending it as an UPSERT operation.
>  * by using an empty payload using the EmptyHoodieRecordPayload and sending 
> the write as a DELETE operation. 
> While the former has ordering field and can be processed as expected (older 
> deletes will be ignored), the later does not have any ordering field to 
> identify if its an older delete or not and hence will let the older delete to 
> go through.
> Just opening this issue to track this gap. We would need to identify what is 
> the right choice here and fix as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1269) Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1269:
--
Labels: pull-request-available user-support-issues  (was: 
pull-request-available)

> Make whether the failure of connect hive affects hudi ingest process 
> configurable
> -
>
> Key: HUDI-1269
> URL: https://issues.apache.org/jira/browse/HUDI-1269
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: wangxianghu
>Assignee: liujinhui
>Priority: Minor
>  Labels: pull-request-available, user-support-issues
>
> Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of 
> hudi to hive failed, the job is still running.
> I think we can add a switch to control the job behavior(fail or keep running) 
> when kafka to hudi is ok, while hudi to hive failed, leave the choice to 
> user. since ingesting data to hudi and sync to hive is a complete task in 
> some scenes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1269) Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1269:
--
Labels: pull-request-available  (was: pull-request-available 
user-support-issues)

> Make whether the failure of connect hive affects hudi ingest process 
> configurable
> -
>
> Key: HUDI-1269
> URL: https://issues.apache.org/jira/browse/HUDI-1269
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: wangxianghu
>Assignee: liujinhui
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, In an ETL pipeline(eg, kafka -> hudi -> hive), If the process of 
> hudi to hive failed, the job is still running.
> I think we can add a switch to control the job behavior(fail or keep running) 
> when kafka to hudi is ok, while hudi to hive failed, leave the choice to 
> user. since ingesting data to hudi and sync to hive is a complete task in 
> some scenes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1271) Add utility scripts to perform Restores

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1271:
--
Labels:   (was: user-support-issues)

> Add utility scripts to perform Restores
> ---
>
> Key: HUDI-1271
> URL: https://issues.apache.org/jira/browse/HUDI-1271
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Utilities
>Reporter: Balaji Varadarajan
>Assignee: Nishith Agarwal
>Priority: Major
> Fix For: 0.8.0
>
>
> We need to expose commands for performing restores.
> We have similar scripts for cleaner : 
> org.apache.hudi.utilities.HoodieCleaner
> We need to add something similar for restores.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1272) Add utility scripts to manage Savepoints

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1272:
--
Labels:   (was: user-support-issues)

> Add utility scripts to manage Savepoints
> 
>
> Key: HUDI-1272
> URL: https://issues.apache.org/jira/browse/HUDI-1272
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: CLI, Utilities
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> We need to expose commands for manging savepoints.
> We have similar scripts for cleaner : 
> org.apache.hudi.utilities.HoodieCleaner
> We need to add something similar for restores.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1292) [Umbrella] RFC-15 : File Listing and Query Planning Optimizations

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1292:
--
Labels: pull-request-available  (was: pull-request-available 
user-support-issues)

> [Umbrella] RFC-15 : File Listing and Query Planning Optimizations 
> --
>
> Key: HUDI-1292
> URL: https://issues.apache.org/jira/browse/HUDI-1292
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration, Writer Core
>Reporter: Vinoth Chandar
>Assignee: Prashant Wason
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This is the umbrella ticket that tracks the overall implementation of RFC-15



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1341) hudi cli command such as rollback 、bootstrap support spark sql implement

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1341:
--
Labels:   (was: user-support-issues)

> hudi cli command such as rollback 、bootstrap support spark sql  implement
> -
>
> Key: HUDI-1341
> URL: https://issues.apache.org/jira/browse/HUDI-1341
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>
> now rollback 、bootstrap ... command need to use hudi CLI. Some user more like 
> use spark
>  sql or spark code API. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1280) Add tool to capture earliest or latest offsets in kafka topics

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1280:
--
Labels:   (was: user-support-issues)

> Add tool to capture earliest or latest offsets in kafka topics 
> ---
>
> Key: HUDI-1280
> URL: https://issues.apache.org/jira/browse/HUDI-1280
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Trevorzhang
>Priority: Major
> Fix For: 0.8.0
>
>
> For bootstrapping cases using spark.write(), we need to capture offsets from 
> kafka topic and use it as checkpoint for subsequent read from Kafka topics.
>  
> [https://github.com/apache/hudi/issues/1985]
> We need to build this integration for smooth transition to deltastreamer.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1296) Implement Spark DataSource using range metadata for file/partition pruning

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1296:
--
Labels:   (was: user-support-issues)

> Implement Spark DataSource using range metadata for file/partition pruning
> --
>
> Key: HUDI-1296
> URL: https://issues.apache.org/jira/browse/HUDI-1296
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Priority: Major
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1342) hudi-dla-sync support modify table properties

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1342:
--
Labels:   (was: user-support-issues)

> hudi-dla-sync support modify table properties
> -
>
> Key: HUDI-1342
> URL: https://issues.apache.org/jira/browse/HUDI-1342
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>
> hudi-dla-sync support modify table properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1371) Implement Spark datasource by fetching file listing from metadata table

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1371:
--
Labels:   (was: user-support-issues)

> Implement Spark datasource by fetching file listing from metadata table
> ---
>
> Key: HUDI-1371
> URL: https://issues.apache.org/jira/browse/HUDI-1371
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Udit Mehrotra
>Priority: Blocker
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1355) Allowing multipleSourceOrdering fields for doing the preCombine on payload

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1355:
--
Labels: patch starter  (was: patch starter user-support-issues)

> Allowing multipleSourceOrdering fields for doing the preCombine on payload
> --
>
> Key: HUDI-1355
> URL: https://issues.apache.org/jira/browse/HUDI-1355
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Common Core, Utilities
>Reporter: Bala Mahesh Jampani
>Priority: Major
>  Labels: patch, starter
> Fix For: 0.8.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Hi,
> I have come across the use case where some of the incoming events have same 
> timestamps for the insert and update event. In this case I want to depend on 
> the other field for ordering. In simple terms, if the primary sort ties, i 
> want to do secondary sort based on other field, if that too ties, go to the 
> other field etc.,. it would be good if hudi has this functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1413) Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1413:
--
Labels:   (was: user-support-issues)

> Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync
> --
>
> Key: HUDI-1413
> URL: https://issues.apache.org/jira/browse/HUDI-1413
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Usability
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> GH issue : https://github.com/apache/hudi/issues/2270



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-55) Investigate support for bucketed tables ala Hive #74

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-55:

Labels:   (was: user-support-issues)

> Investigate support for bucketed tables ala Hive #74
> 
>
> Key: HUDI-55
> URL: https://issues.apache.org/jira/browse/HUDI-55
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Hive Integration
>Reporter: Vinoth Chandar
>Priority: Major
>
> https://github.com/uber/hudi/issues/74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-74) Improve compaction support in HoodieDeltaStreamer & CLI

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-74:

Labels:   (was: user-support-issues)

> Improve compaction support in HoodieDeltaStreamer & CLI
> ---
>
> Key: HUDI-74
> URL: https://issues.apache.org/jira/browse/HUDI-74
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: CLI, DeltaStreamer
>Reporter: Balaji Varadarajan
>Priority: Major
>
> Currently, the only way to safely schedule and execute a compaction which 
> will preserve checkpoints is through inline compaction. But this is diasbled 
> by default for HoodieDeltaStreamer. 
>  
> Also, the other option to schedule a compaction is through Hoodie CLI. We 
> need to support option to copy last delta-instant's extra-metadata  to make 
> this a viable option for working with DeltaStreamer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-151) Fix Realtime queries on Hive on Spark engine

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-151:
-
Status: In Progress  (was: Open)

> Fix Realtime queries on Hive on Spark engine
> 
>
> Key: HUDI-151
> URL: https://issues.apache.org/jira/browse/HUDI-151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Minor
>  Labels: pull-request-available, user-support-issues
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and 
> HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution 
> and lifetime of a mapper task needed for Hive on Spark. Our theory is that 
> due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not 
> handling empty list correctly, the ParquetRecordReaderWrapper ends up adding 
> the same column ids multiple times which ultimately breaks the query. We need 
> to find why RO view works fine but RT doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-151) Fix Realtime queries on Hive on Spark engine

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-151.
--
Fix Version/s: 0.5.2
   Resolution: Fixed

[~nishith29]: please reopen if the issue still persists

> Fix Realtime queries on Hive on Spark engine
> 
>
> Key: HUDI-151
> URL: https://issues.apache.org/jira/browse/HUDI-151
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Hive Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Minor
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.5.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and 
> HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution 
> and lifetime of a mapper task needed for Hive on Spark. Our theory is that 
> due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not 
> handling empty list correctly, the ParquetRecordReaderWrapper ends up adding 
> the same column ids multiple times which ultimately breaks the query. We need 
> to find why RO view works fine but RT doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-280) Integrate Hudi to bigtop

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-280:
-
Labels:   (was: user-support-issues)

> Integrate Hudi to bigtop
> 
>
> Key: HUDI-280
> URL: https://issues.apache.org/jira/browse/HUDI-280
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Vinoth Chandar
>Assignee: leesf
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-310:
-
Labels:   (was: user-support-issues)

> DynamoDB/Kinesis Change Capture using Delta Streamer
> 
>
> Key: HUDI-310
> URL: https://issues.apache.org/jira/browse/HUDI-310
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: Vinoth Chandar
>Assignee: Suneel Marthi
>Priority: Major
>
> The goal here is to do CDC from DynamoDB and then have it be ingested into S3 
> as a Hudi dataset 
> Few resources: 
>  # DynamoDB Streams 
> [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html]
>   provides change capture logs in Kinesis. 
>  # Walkthrough 
> [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.KCLAdapter.Walkthrough.html]
>  Code [https://github.com/awslabs/dynamodb-streams-kinesis-adapter] 
>  # Spark Streaming has support for reading Kinesis streams 
> [https://spark.apache.org/docs/2.4.4/streaming-kinesis-integration.html] one 
> of the many resources showing how to change the Spark Kinesis example code to 
> consume dynamodb stream   
> [https://medium.com/@ravi72munde/using-spark-streaming-with-dynamodb-d325b9a73c79]
>  # In DeltaStreamer, we need to add some form of KinesisSource that returns a 
> RDD with new data everytime `fetchNewData` is called 
> [https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/Source.java]
>   . DeltaStreamer itself does not use Spark Streaming APIs
>  # Internally, we have Avro, Json, Row sources that extract data in these 
> formats. 
> Open questions : 
>  # Should this just be a KinesisSource inside Hudi, that needs to be 
> configured differently or do we need two sources: DynamoDBKinesisSource (that 
> does some DynamoDB Stream specific setup/assumptions) and a plain 
> KinesisSource. What's more valuable to do , if we have to pick one. 
>  # For Kafka integration, we just reused the KafkaRDD in Spark Streaming 
> easily and avoided writing a lot of code by hand. Could we pull the same 
> thing off for Kinesis? (probably needs digging through Spark code) 
>  # What's the format of the data for DynamoDB streams? 
>  
>  
> We should probably flesh these out before going ahead with implementation? 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-396) Provide an documentation to describe how to use test suite

2021-01-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274075#comment-17274075
 ] 

sivabalan narayanan commented on HUDI-396:
--

[~yanghua]: we already have a readme. do you think there are gaps in the 
readme? 

> Provide an documentation to describe how to use test suite
> --
>
> Key: HUDI-396
> URL: https://issues.apache.org/jira/browse/HUDI-396
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: vinoyang
>Assignee: wangxianghu
>Priority: Major
>  Labels: user-support-issues
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-360) Add github stale action workflow for issue management

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-360:
-
Labels:   (was: user-support-issues)

> Add github stale action workflow for issue management
> -
>
> Key: HUDI-360
> URL: https://issues.apache.org/jira/browse/HUDI-360
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Gurudatt Kulkarni
>Assignee: Gurudatt Kulkarni
>Priority: Major
>
> Add a GitHub action for closing stale (90 days) issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-619:
-
Labels:   (was: user-support-issues)

> Investigate and implement mechanism to have hive/presto/sparksql queries 
> avoid stitching and return null values for hoodie columns 
> ---
>
> Key: HUDI-619
> URL: https://issues.apache.org/jira/browse/HUDI-619
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration, Presto Integration, Spark Integration
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> This idea is suggested by Vinoth during RFC review. This ticket is to track 
> the feasibility and implementation of it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-648:
-
Labels:   (was: user-support-issues)

> Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction 
> writes
> 
>
> Key: HUDI-648
> URL: https://issues.apache.org/jira/browse/HUDI-648
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Vinoth Chandar
>Priority: Major
>
> We would like a way to hand the erroring records from writing or compaction 
> back to the users, in a separate table or log. This needs to work generically 
> across all the different writer paths.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-767:
-
Labels:   (was: user-support-issues)

> Support transformation when export to Hudi
> --
>
> Key: HUDI-767
> URL: https://issues.apache.org/jira/browse/HUDI-767
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: Raymond Xu
>Priority: Major
> Fix For: 0.8.0
>
>
> Main logic described in 
> https://github.com/apache/incubator-hudi/issues/1480#issuecomment-608529410
> In HoodieSnapshotExporter, we could extend the feature to include 
> transformation when --output-format hudi, using a custom Transformer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-824) Register hudi-spark package with spark packages repo for easier usage of Hudi

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-824.
--
Fix Version/s: 0.5.2
   Resolution: Fixed

> Register hudi-spark package with spark packages repo for easier usage of Hudi
> -
>
> Key: HUDI-824
> URL: https://issues.apache.org/jira/browse/HUDI-824
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Vinoth Govindarajan
>Priority: Minor
>  Labels: user-support-issues
> Fix For: 0.5.2
>
>
> At the moment, to be able to use Hudi with spark, users have to do the 
> following : 
>  
> {{spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>   --jars `ls 
> packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` 
> \
>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> {{}}
> {{Ideally, we want to be able to use Hudi as follows :}}
>  
> {{spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages 
> org.apache.hudi:hudi-spark-bundle: \
>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-824) Register hudi-spark package with spark packages repo for easier usage of Hudi

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-824:
-
Status: In Progress  (was: Open)

> Register hudi-spark package with spark packages repo for easier usage of Hudi
> -
>
> Key: HUDI-824
> URL: https://issues.apache.org/jira/browse/HUDI-824
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Vinoth Govindarajan
>Priority: Minor
>  Labels: user-support-issues
>
> At the moment, to be able to use Hudi with spark, users have to do the 
> following : 
>  
> {{spark-2.4.4-bin-hadoop2.7/bin/spark-shell \
>   --jars `ls 
> packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.11-*.*.*-SNAPSHOT.jar` 
> \
>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}
> {{}}
> {{Ideally, we want to be able to use Hudi as follows :}}
>  
> {{spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages 
> org.apache.hudi:hudi-spark-bundle: \
>   --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels:   (was: user-support-issues)

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-829:
-
Labels: user-support-issues  (was: )

> Efficiently reading hudi tables through spark-shell
> ---
>
> Key: HUDI-829
> URL: https://issues.apache.org/jira/browse/HUDI-829
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Spark Integration
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: user-support-issues
>
> [~uditme] Created this ticket to track some discussion on read/query path of 
> spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some 
> of your queries are slower due to some sequential activity performed by spark 
> when interacting with Hudi tables (even with 
> spark.sql.hive.convertMetastoreParquet which can give you the same data 
> reading speed and all the vectorization benefits). Is this slowness observed 
> during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-865) Improve Hive Syncing by directly translating avro schema to Hive types

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-865:
-
Labels: pull-request-available starter  (was: pull-request-available 
starter user-support-issues)

> Improve Hive Syncing by directly translating avro schema to Hive types
> --
>
> Key: HUDI-865
> URL: https://issues.apache.org/jira/browse/HUDI-865
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Hive Integration
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>  Labels: pull-request-available, starter
>
> With the current code in master and proposed improvements with  
> [https://github.com/apache/incubator-hudi/pull/1559,|https://github.com/apache/incubator-hudi/pull/1559]
> Hive Sync integration would resort to the following translations for finding 
> table schema
>  Avro-Schema to Parquet-Schema to Hive Schema transformations
> We need to implement logic to skip the extra hop to parquet schema when 
> generating hive schema. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-873) kafka connector support hudi sink

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-873:
-
Labels:   (was: user-support-issues)

> kafka  connector support hudi sink
> --
>
> Key: HUDI-873
> URL: https://issues.apache.org/jira/browse/HUDI-873
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Utilities
>Reporter: liwei
>Assignee: liwei
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-914) support different target data clusters

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-914:
-
Labels:   (was: user-support-issues)

> support different target data clusters
> --
>
> Key: HUDI-914
> URL: https://issues.apache.org/jira/browse/HUDI-914
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently hudi-DeltaStreamer does not support writing to different target 
> clusters. The specific scenarios are as follows: Generally, Hudi tasks run on 
> an independent cluster. If you want to write data to the target data cluster, 
> you generally rely on core-site.xml and hdfs-site.xml; sometimes you will 
> encounter different targets. The data cluster writes data, but the cluster 
> running the hudi task does not have the core-site.xml and hdfs-site.xml of 
> the target cluster. Although specifying the namenode IP address of the target 
> cluster can be written, this loses HDFS high availability, so I plan to Use 
> the contents of the core-site.xml and hdfs-site.xml files of the target 
> cluster as configuration items and configure them in the 
> dfs-source.properties or kafka-source.properties file of Hudi.
> Is there a better way to solve this problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1024) Document S3 related guide and tips

2021-01-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274071#comment-17274071
 ] 

sivabalan narayanan commented on HUDI-1024:
---

[~uditme]: Can you ask one of aws folks to take this up. Def would be good for 
the community. 

> Document S3 related guide and tips
> --
>
> Key: HUDI-1024
> URL: https://issues.apache.org/jira/browse/HUDI-1024
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Raymond Xu
>Priority: Minor
>  Labels: documentation, user-support-issues
> Fix For: 0.8.0
>
>
> Create a section in docs website for Hudi on S3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1020) Making timeline server as an external long running service and extending it to be able to plugin business metadata

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1020:
--
Labels:   (was: user-support-issues)

> Making timeline server as an external long running service and extending it 
> to be able to plugin business metadata 
> ---
>
> Key: HUDI-1020
> URL: https://issues.apache.org/jira/browse/HUDI-1020
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Bhavani Sudha
>Priority: Major
>
> Based on the description in the mailing thread - 
> [https://www.mail-archive.com/dev@hudi.apache.org/msg02917.html] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on issue #2367: [SUPPORT] Seek error when querying MOR Tables in GCP

2021-01-28 Thread GitBox


nsivabalan commented on issue #2367:
URL: https://github.com/apache/hudi/issues/2367#issuecomment-769488067


   @stackfun : have you encountered the issue reported here: 
https://issues.apache.org/jira/browse/HUDI-1063
   if not, would you mind responding to it if you know the fix/workaround. 
thanks for your assistance :) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1066) Provide way to provide all versions of a given set of records in incremental/snapshot queries

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1066:
--
Labels:   (was: user-support-issues)

> Provide way to provide all versions of a given set of records in 
> incremental/snapshot queries
> -
>
> Key: HUDI-1066
> URL: https://issues.apache.org/jira/browse/HUDI-1066
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Priority: Major
>
> Use case described here 
> [https://github.com/apache/hudi/issues/1675]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1081) Document AWS Hudi integration

2021-01-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274069#comment-17274069
 ] 

sivabalan narayanan commented on HUDI-1081:
---

[~uditme]: Do you think you can take this up. 

> Document AWS Hudi integration
> -
>
> Key: HUDI-1081
> URL: https://issues.apache.org/jira/browse/HUDI-1081
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs, Usability
>Reporter: Bhavani Sudha
>Assignee: Bhavani Sudha
>Priority: Minor
>  Labels: documentation, user-support-issues
> Fix For: 0.8.0
>
>
> Often times AWS Hudi users seek documentation on setting up Hudi and 
> integrating Hive megastore and GLUE configurations. This has been one of the 
> popular thread in Slack. It would serve well if documented.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-1111) Highlight Hudi guarantees in documentation section of website

2021-01-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17274068#comment-17274068
 ] 

sivabalan narayanan commented on HUDI-:
---

[~vinoth]: Can I take a stab at this ? 

> Highlight Hudi guarantees in documentation section of website 
> --
>
> Key: HUDI-
> URL: https://issues.apache.org/jira/browse/HUDI-
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Balaji Varadarajan
>Assignee: leesf
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> [https://github.com/apache/hudi/issues/1795]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1114) Explore Spark Structure Streaming for Hudi Dataset

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1114:
--
Labels:   (was: user-support-issues)

> Explore Spark Structure Streaming for Hudi Dataset
> --
>
> Key: HUDI-1114
> URL: https://issues.apache.org/jira/browse/HUDI-1114
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: Spark Integration
>Reporter: Yanjia Gary Li
>Priority: Minor
>
> [https://github.com/apache/hudi/issues/1839]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1116) Support time travel using timestamp type

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1116:
--
Labels:   (was: user-support-issues)

> Support time travel using timestamp type
> 
>
> Key: HUDI-1116
> URL: https://issues.apache.org/jira/browse/HUDI-1116
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: liwei
>Priority: Major
>
>  
> {{Currently, we use commit time to mimic time-travel queries. We need ability 
> to handle time-travel with a proper timestamp provided.}}
> {{}}
> {{For e:g: }}
> {{spark.read  .format(“hudi”).option(“timestampAsOf”, 
> “2019-01-01”).load(“/path/to/my/table”)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1195) Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and HoodieAvroUtils.bytesToAvro

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1195.
---
Resolution: Duplicate

https://issues.apache.org/jira/browse/HUDI-1128 and 
https://issues.apache.org/jira/browse/HUDI-1129

> Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and 
> HoodieAvroUtils.bytesToAvro
> 
>
> Key: HUDI-1195
> URL: https://issues.apache.org/jira/browse/HUDI-1195
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Writer Core
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> RDD's schema is used in HoodieAvroUtils.avroToBytes, where as to convert back 
> from bytes to Avro, we use schema provided in property file. If schema got 
> evolved, there will be mismatch. 
> More details: 
> [https://github.com/apache/hudi/issues/1971]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1195) Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and HoodieAvroUtils.bytesToAvro

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1195:
--
Status: In Progress  (was: Open)

> Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and 
> HoodieAvroUtils.bytesToAvro
> 
>
> Key: HUDI-1195
> URL: https://issues.apache.org/jira/browse/HUDI-1195
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Writer Core
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> RDD's schema is used in HoodieAvroUtils.avroToBytes, where as to convert back 
> from bytes to Avro, we use schema provided in property file. If schema got 
> evolved, there will be mismatch. 
> More details: 
> [https://github.com/apache/hudi/issues/1971]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1195) Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and HoodieAvroUtils.bytesToAvro

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1195:
--
Status: Open  (was: New)

> Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and 
> HoodieAvroUtils.bytesToAvro
> 
>
> Key: HUDI-1195
> URL: https://issues.apache.org/jira/browse/HUDI-1195
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Writer Core
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> RDD's schema is used in HoodieAvroUtils.avroToBytes, where as to convert back 
> from bytes to Avro, we use schema provided in property file. If schema got 
> evolved, there will be mismatch. 
> More details: 
> [https://github.com/apache/hudi/issues/1971]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1201) HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1201:
--
Labels:   (was: user-support-issues)

> HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset 
> when commit files do not have checkpoint
> -
>
> Key: HUDI-1201
> URL: https://issues.apache.org/jira/browse/HUDI-1201
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Trevorzhang
>Priority: Major
> Fix For: 0.8.0
>
>
> [https://github.com/apache/hudi/issues/1985]
>  
> It would be easier for user to just specify deltastreamer to read from 
> earliest offset instead  of implementing -initial-checkpoint-provider or 
> passing raw kafka checkpoints when the table was initially bootstrapped 
> through spark.write().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1212) GDPR: Support deletions of records on all versions of Hudi dataset

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1212:
--
Labels:   (was: user-support-issues)

> GDPR: Support deletions of records on  all versions of Hudi dataset
> ---
>
> Key: HUDI-1212
> URL: https://issues.apache.org/jira/browse/HUDI-1212
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Incremental Pull, Writer Core
>Reporter: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> Incremental Pull should also stop returning the record on historical  datset 
> when we delete them from latest snapshot.
>  
> Context from Mailing list email :
>  
> Hello,
> I am Siva's colleague and I am working on the problem below as well.
> I would like to describe what we are trying to achieve with Hudi as well as 
> our current way of working and our GDPR and "Right To Be Forgotten " 
> compliance policies.
> Our requirements :
> - We wish to apply a strict interpretation of the RTBF.  In other words, when 
> we remove a person's data, it should be throughout the historical data and 
> not just the latest snapshot.
> - We wish to use Hudi to reduce our storage requirements using upserts and 
> don't want to have duplicates between commits.
> - We wish to retain history for persons who have not requested to be 
> forgotten and therefore we do not want to delete commit files from the 
> history as some have proposed.
> We have tried a couple of solutions, but so far without success :
> - replay the data omitting the data of the persons who have requested to be 
> forgotten.  We wanted to manipulate the commit times to rebuild the history.
> We found that we couldn't manipulate the commit times and retain the history.
> - replay the data omitting the data of the persons who have requested to be 
> forgotten, but writing to a date-based partition folder using the 
> "partitionpath" parameter.
> We found that commits using upserts between the partitionpath folders, do not 
> ignore data that is unchanged between 2 commit dates as when using the 
> default commit file system, so we will not save on our storage or speed up 
> our  processing using this technique.
> So basically we would like to find a way to apply a strict RTBF, GDPR, 
> maintain history and time-travel (large history) and save storage space using 
> Hudi.
> Can anyone see a way to achieve this?
> Kind Regards,
> David Rosalia
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-28 Thread GitBox


codecov-io edited a comment on pull request #2485:
URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=h1) Report
   > Merging 
[#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=desc) (8d2ff66) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2485/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=tree)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2485   +/-   ##
   =
   + Coverage 50.18%   69.43%   +19.24% 
   + Complexity 3050  357 -2693 
   =
 Files   419   53  -366 
 Lines 18931 1930-17001 
 Branches   1948  230 -1718 
   =
   - Hits   9500 1340 -8160 
   + Misses 8656  456 -8200 
   + Partials775  134  -641 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...apache/hudi/common/engine/TaskContextSupplier.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9UYXNrQ29udGV4dFN1cHBsaWVyLmphdmE=)
 | | | |
   | 
[...a/org/apache/hudi/cli/commands/CommitsCommand.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0NvbW1pdHNDb21tYW5kLmphdmE=)
 | | | |
   | 
[...pache/hudi/hadoop/HoodieColumnProjectionUtils.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL0hvb2RpZUNvbHVtblByb2plY3Rpb25VdGlscy5qYXZh)
 | | | |
   | 
[.../apache/hudi/common/config/SerializableSchema.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2NvbmZpZy9TZXJpYWxpemFibGVTY2hlbWEuamF2YQ==)
 | | | |
   | 
[...apache/hudi/common/fs/inline/InLineFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9JbkxpbmVGaWxlU3lzdGVtLmphdmE=)
 | | | |
   | 
[...apache/hudi/common/util/collection/RocksDBDAO.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvY29sbGVjdGlvbi9Sb2Nrc0RCREFPLmphdmE=)
 | | | |
   | 
[...common/table/view/AbstractTableFileSystemView.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3ZpZXcvQWJzdHJhY3RUYWJsZUZpbGVTeXN0ZW1WaWV3LmphdmE=)
 | | | |
   | 
[.../apache/hudi/exception/TableNotFoundException.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1RhYmxlTm90Rm91bmRFeGNlcHRpb24uamF2YQ==)
 | | | |
   | 
[.../apache/hudi/common/fs/ConsistencyGuardConfig.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0NvbnNpc3RlbmN5R3VhcmRDb25maWcuamF2YQ==)
 | | | |
   | 
[...va/org/apache/hudi/common/model/CleanFileInfo.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0NsZWFuRmlsZUluZm8uamF2YQ==)
 | | | |
   | ... and [355 
more](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree-more) | |
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1267) Additional Metadata Details for Hudi Transactions

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1267:
--
Labels: features  (was: features user-support-issues)

> Additional Metadata Details for Hudi Transactions
> -
>
> Key: HUDI-1267
> URL: https://issues.apache.org/jira/browse/HUDI-1267
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Ashish M G
>Priority: Major
>  Labels: features
> Fix For: 0.8.0
>
>
> Whenever following scenarios happen :
>  # Custom Datasource ( Kafka for instance ) -> Hudi Table
>  # Hudi -> Hudi Table
>  # s3 -> Hudi Table
> Following metadata need to be captured :
>  # Table Level Metadata
>  * 
>  ** Operation name ( record level ) like Upsert, Insert etc for last 
> operation performed on the row
>  # Transaction Level Metadata ( This will be logged on Hudi Level and not 
> Table Level )
>  ** Source ( Kafka Topic Name / S3 url for source data in case of s3 etc )
>  ** Target Hudi Table Name
>  ** Last transaction time ( last commit time )
> Basically , point (1) collects all details on table level  and point (2) 
> collects all the transactions happened on Hudi Level
> Point(1) would be just a column addition for operation type
> Eg for Point (2) :  Suppose we had an ingestion from Kafka topic 'A' to Hudi 
> table 'ingest_kafka' and another ingestion from RDBMS table ( 'tableA' ) 
> through Sqoop to Hudi Table 'RDBMSingest' then the metadata captured would be 
> :
>  
> |Source|Timestamp|Transaction Type|Target|
> |Kafka - 'A'|XX|UPSERT|ingest_kafka|
> |RDBMS - 'tableA'|XX|INSERT|RDBMSingest|
>  
> The Transaction Details Table in Point (2) should be available as a separate 
> common table which can be queried as Hudi Table or stored as parquet which 
> can be queried from Spark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HUDI-1278) Need a generic payload class which can skip late arriving data based on specific fields

2021-01-28 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17270965#comment-17270965
 ] 

sivabalan narayanan edited comment on HUDI-1278 at 1/28/21, 11:59 PM:
--

[~shenhong] : Can you clarify the requirements for this ? Is it different from 
what the newly introduced 
[payload|https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java]
 can do. 

 


was (Author: shivnarayan):
[~vbalaji]: Can you clarify the requirements for this ? Is it different from 
what the newly introduced 
[payload|https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java]
 can do. 

 

> Need a generic payload class which can skip late arriving data based on 
> specific fields
> ---
>
> Key: HUDI-1278
> URL: https://issues.apache.org/jira/browse/HUDI-1278
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: DeltaStreamer, Spark Integration
>Reporter: Balaji Varadarajan
>Assignee: shenh062326
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Context : 
> [https://lists.apache.org/thread.html/rd5d805d29c2f704d8ff2729457d27bca42e890bc01fc8e5e1f1943e3%40%3Cdev.hudi.apache.org%3E]
> We need to implement a Payload class (like OverwriteWithLatestAvroPayload) 
> which will skip late arriving data.
> Notes:
>  # combineAndGetUpdateValue() would need work
>  # The ordering needs to be specified based on 1 or more fields and should be 
> configurable.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1297:
--
Labels:   (was: user-support-issues)

> [Umbrella] Revamp Spark Datasource support using Spark 3 APIs
> -
>
> Key: HUDI-1297
> URL: https://issues.apache.org/jira/browse/HUDI-1297
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
> Fix For: 0.8.0
>
>
> Yet to be fully scoped out
> But high level, we want to 
>  * Add SQL support for MERGE, DELETE etc
>  * First class support for streaming reads/writes via structured streaming
>  * Row based reader/writers all the way
>  * Support for File/Partition pruning using Hudi metadata tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1297:
--
Comment: was deleted

(was: oops. noticed this is an umbrella ticket. )

> [Umbrella] Revamp Spark Datasource support using Spark 3 APIs
> -
>
> Key: HUDI-1297
> URL: https://issues.apache.org/jira/browse/HUDI-1297
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: user-support-issues
> Fix For: 0.8.0
>
>
> Yet to be fully scoped out
> But high level, we want to 
>  * Add SQL support for MERGE, DELETE etc
>  * First class support for streaming reads/writes via structured streaming
>  * Row based reader/writers all the way
>  * Support for File/Partition pruning using Hudi metadata tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1290) Implement Debezium avro source for Delta Streamer

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1290:
--
Labels:   (was: user-support-issues)

> Implement Debezium avro source for Delta Streamer
> -
>
> Key: HUDI-1290
> URL: https://issues.apache.org/jira/browse/HUDI-1290
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: Balaji Varadarajan
>Assignee: Balaji Varadarajan
>Priority: Major
> Fix For: 0.8.0
>
>
> We need to implement transformer and payloads for seamlessly pulling change 
> logs emitted by debezium in Kafka. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1362) Make deltastreamer support full overwrite

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1362:
--
Labels:   (was: user-support-issues)

> Make deltastreamer support full overwrite
> -
>
> Key: HUDI-1362
> URL: https://issues.apache.org/jira/browse/HUDI-1362
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: DeltaStreamer
>Reporter: liujinhui
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1546) Fix hive sync tool path in website documentation

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1546.
---
Resolution: Duplicate

https://issues.apache.org/jira/browse/HUDI-1379

> Fix hive sync tool path in website documentation
> 
>
> Key: HUDI-1546
> URL: https://issues.apache.org/jira/browse/HUDI-1546
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> https://github.com/apache/hudi/issues/2480



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1546) Fix hive sync tool path in website documentation

2021-01-28 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1546:
--
Status: In Progress  (was: Open)

> Fix hive sync tool path in website documentation
> 
>
> Key: HUDI-1546
> URL: https://issues.apache.org/jira/browse/HUDI-1546
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: 0.8.0
>Reporter: sivabalan narayanan
>Priority: Major
>  Labels: user-support-issues
>
> https://github.com/apache/hudi/issues/2480



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-28 Thread GitBox


codecov-io edited a comment on pull request #2485:
URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=h1) Report
   > Merging 
[#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=desc) (8d2ff66) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc)
 (e302c6b) will **decrease** coverage by `40.49%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2485/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2485   +/-   ##
   
   - Coverage 50.18%   9.68%   -40.50% 
   + Complexity 3050  48 -3002 
   
 Files   419  53  -366 
 Lines 189311930-17001 
 Branches   1948 230 -1718 
   
   - Hits   9500 187 -9313 
   + Misses 86561730 -6926 
   + Partials775  13  -762 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2485?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2485/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] nsivabalan commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-28 Thread GitBox


nsivabalan commented on issue #2323:
URL: https://github.com/apache/hudi/issues/2323#issuecomment-769433405


   got it. Hudi is looking to add record level indexing in next release, and 
global lookup should become lot faster with that. Hopefully it helps you. Can 
we close this ticket if you don't have any more questions? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-01-28 Thread GitBox


nsivabalan commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-769423274


   Can you try this config 
   "hoodie.datasource.write.table.type" and set it to MERGE_ON_READ



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codecov-io commented on pull request #2506: [HUDI-1557] Make Flink write pipeline write task scalable

2021-01-28 Thread GitBox


codecov-io commented on pull request #2506:
URL: https://github.com/apache/hudi/pull/2506#issuecomment-769390677


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2506?src=pr=h1) Report
   > Merging 
[#2506](https://codecov.io/gh/apache/hudi/pull/2506?src=pr=desc) (ed3f2f8) 
into 
[master](https://codecov.io/gh/apache/hudi/commit/bc0325f6ea0a734f106f21a2fcd4ead413a6cf7b?el=desc)
 (bc0325f) will **decrease** coverage by `40.57%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2506/graphs/tree.svg?width=650=150=pr=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2506?src=pr=tree)
   
   ```diff
   @@ Coverage Diff  @@
   ## master   #2506   +/-   ##
   
   - Coverage 50.26%   9.68%   -40.58% 
   + Complexity 3119  48 -3071 
   
 Files   430  53  -377 
 Lines 195651930-17635 
 Branches   2004 230 -1774 
   
   - Hits   9835 187 -9648 
   + Misses 89251730 -7195 
   + Partials805  13  -792 
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.75%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2506?src=pr=tree) | Coverage Δ 
| Complexity Δ | |
   |---|---|---|---|
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | 
[...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | 
[...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | 
[...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | 
[...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | 
[...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh)
 | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | 
[...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2506/diff?src=pr=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=)
 | `0.00% <0.00%> 

[GitHub] [hudi] satishkotha commented on a change in pull request #2502: [HUDI-1555] Remove isEmpty to improve clustering execution performance

2021-01-28 Thread GitBox


satishkotha commented on a change in pull request #2502:
URL: https://github.com/apache/hudi/pull/2502#discussion_r566378842



##
File path: 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestStructuredStreaming.scala
##
@@ -243,17 +243,24 @@ class TestStructuredStreaming extends 
HoodieClientTestBase {
 val f2 = Future {
   inputDF1.coalesce(1).write.mode(SaveMode.Append).json(sourcePath)
   // wait for spark streaming to process one microbatch
-  val currNumCommits = waitTillAtleastNCommits(fs, destPath, 1, 120, 5)
+  var currNumCommits = waitTillAtleastNCommits(fs, destPath, 1, 120, 5)
   assertTrue(HoodieDataSourceHelpers.hasNewCommits(fs, destPath, "000"))
 
   inputDF2.coalesce(1).write.mode(SaveMode.Append).json(sourcePath)
   // wait for spark streaming to process second microbatch
-  waitTillAtleastNCommits(fs, destPath, currNumCommits + 1, 120, 5)
-  assertEquals(2, HoodieDataSourceHelpers.listCommitsSince(fs, destPath, 
"000").size())
-
-  // check have more than one file group
-  this.metaClient = new HoodieTableMetaClient(fs.getConf, destPath, true)
-  assertTrue(getLatestFileGroupsFileId(partitionOfRecords).size > 1)
+  currNumCommits = waitTillAtleastNCommits(fs, destPath, currNumCommits + 
1, 120, 5)
+  // for inline clustering, clustering may be complete along with 2nd 
commit
+  if (HoodieDataSourceHelpers.allCompletedCommitsCompactions(fs, 
destPath).getCompletedReplaceTimeline().countInstants() > 0) {

Review comment:
   @lw309637554  It seems like  this test has a race condition. PTAL and 
confirm my fix is reasonable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on a change in pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-01-28 Thread GitBox


prashantwason commented on a change in pull request #2496:
URL: https://github.com/apache/hudi/pull/2496#discussion_r566356415



##
File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
##
@@ -415,17 +420,18 @@ public static boolean isLogFile(Path logPath) {
 return matcher.find() && logPath.getName().contains(".log");
   }
 
+  public static boolean isDataFile(Path path) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] prashantwason commented on pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-01-28 Thread GitBox


prashantwason commented on pull request #2496:
URL: https://github.com/apache/hudi/pull/2496#issuecomment-769315395


   > High level question , should we always buffer or make this configurable 
for HDFS only?
   
   I don't have idea about other file systems and their inherent buffering. You 
can decide. I did not see an easy way to restrict this as 
HoodieWrapperFileSystem currently does not take any properties.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: Travis CI build asf-site

2021-01-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 69456a6  Travis CI build asf-site
69456a6 is described below

commit 69456a6f9607c426f7fbb191e6d9caa51d695939
Author: CI 
AuthorDate: Thu Jan 28 16:39:35 2021 +

Travis CI build asf-site
---
 content/community.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/content/community.html b/content/community.html
index 879fc85..a65407d 100644
--- a/content/community.html
+++ b/content/community.html
@@ -403,6 +403,12 @@ Committers are chosen by a majority vote of the Apache 
Hudi https://www
   PMC, Committer
   vinoyang
 
+
+  https://avatars.githubusercontent.com/lw309637554; 
alt="liway" style="max-width: 100px;" align="middle" />
+  https://github.com/lw309637554;>Wei Li
+  Committer
+  liway
+
   
 
 



  1   2   >